HeLa update

As of today, Henrieta Lacks’ family has agreed to have the HeLa genome published under a restricted access model (similar to the one used, for example, by dbGaP). You can read about the negotiations in Nature, where the genome paper itself is also published. I was impressed with the fact that Francis Collins talked to the family directly, took the time to explain the science, and gave them a choice on whether the data should be published or not. I hope it sets a precedent for more consent-aware science practice generally.

19-year-old woman sets up programming school in Nairobi

Martha Chumo is a woman who doesn’t give up easily. A self-taught programmer from Nairobi, Kenya, earlier on this year she overcame some long odds and raised almost $6000 via an IndieGoGo campaign to fund a trip to New York Hacker School. Her motivation was simple – to learn to be the best programmer she can be.

The US consulate in Kenya wasn’t convinced though. Despite a flurry of support letters including those from Hacker School, AdaCamp, the GNOME Outreach Program for Women, the Codecademy community manager, and her mentor from the Apache Deltacloud Project, she was deemed ineligible for a visa. The reason? She’s an unmarried woman with no kids. According to Martha, “The consular said, and I quote, “You need social ties to Kenya. And now that you’re an adult, your parents don’t count”.  This basically translates to I’m ineligible because I am an unmarried 19 year old with no kids, not in her 3rd year of University, and has not worked for a company or the government for many years.”

Not one to give up easily, Martha quickly came up with a different plan: if she couldn’t go to New York for Hacker School, she’d bring Hacker School to Kenya instead! This is how the dream of a Dev School in Nairobi was born. Most people in Kenya don’t have easy access to computers, so in order to be accessible to everyone who wants to learn, the school itself needed to be free, and to provide equipment such as laptops, electricity and Internet access. This brought on a new round of frenzied fundraising – this time with a more ambitious goal of raising $50,000 for school setup.

Not only will the new school provide much-needed learning opportunities for software development, but it also plans to actively address the gender gap in the profession. Being set up and run by a woman is a great start with that, and Martha is as passionate about teaching as she is about programming: “Best part about coding? Teaching and giving back to the community. My friends and I are planning classes to get even more Kenyan women excited about coding.” As a female programmer who has trained and continues to train other women, I can’t help but salute the initiative.

You can find out more about Martha (@NjeriChelimo) and the Nairobi Dev school at their IndieGoGo campaign page: http://www.indiegogo.com/projects/nairobi-dev-school

Trimming Illumina sequencing adapters

I’ve been trying to get my head around how Illumina sequencing adapters work, so that I can trim them from my sequencing data accurately. So what with being quite into educational videos these days, first I watched this video:

There was also a really helpful article about them from Tufts University, which you can check out here: http://genomics.med.tufts.edu/documents/protocols/TUCF_Understanding_Illumina_TruSeq_Adapters.pdf

One of the things I was trying to find out was, for example, whether you cut the adapters from the 5′ end or the 3′ end. From this, I got the impression that, provided your library is decent quality and you’ve done the size selection right (the example in the video used sizes of 200-300bp), you shouldn’t need to trim the 3′ end – with a 100bp sequence read, your sequence should never reach the adapter on the other end. And indeed, this is exactly what I observe in our lab’s RNA-seq data – there’s an occasional adapter trace on the 5′ end, but after that, it’s good quality sequence with no duplications or GC bias that would suggest adapter contamination on the other side. Adapter trimming for data like this is pretty much optional, I think – a lot of mappers can deal with it and just discard the non-matching sequence.

However, I’ve also been dealing with a type of data that has very different properties – data from an RNA bisulfite experiment. For anyone unfamiliar with it  – this is an experiment used to determine the cytosine-5 methylation patterns in DNA or RNA, by applying bisulfite treatment that converts cytosine to uracil. Getting data on RNA methylation is exciting, because this is an area that hasn’t been explored all that much, but it also has some major difficulties in terms of the experiment itself. RNA is already relatively unstable at the best of times, and bisulfite treatment is a harsh treatment that introduces several types of chemical degradation. As a result, by the time it got as far as the sequencing, the library didn’t need to be size-selected, as it was already in the 50-140 bp range due to the RNA having broken apart in the course of the procedure. With such short sequence lengths, the 3′ adapter gets sequenced in more than a minority of cases, and as such, adapter trimming becomes a really essential step. And indeed in the FastQC step, I observed very high levels of sequence duplication for the indexed primers, confirming this. However, I still didn’t quite feel I got my head completely around how adapters work, and what was happening with my data.

Next, I looked at this tutorial from ARK-Genomics, which I thought was really helpful. It talks about how FastQC guesses what the contaminants are, how it might get it wrong, and how you sometimes need to do a little bit of detective work to figure out what’s actually happening to your sample. A further look at my duplicate sequences told me that there was an RNA RT primer in there, and then that there was indeed something funny happening around there, that I couldn’t quite understand.

When I say ‘something funny’, the problem with adapter contamination in general is as follows:

You have your sequence of interest, for example a string like “This is my data of interest.” However, around the edges of it, you have some other sequences you have to trim away. After a careful internet search, imagine that you conclude your sequence looks like this:

TreeCatThis is my data of interest.CatTree

So you conclude that based on the methods used, you should cut off instances of Cat to return your sequence of interest. However, then as a sanity check you write a script that parses your raw sequence looking for matches to all adapters potentially used in the experiment, and you come across:

FruitBatThis is my data of interest.FruitBat

You spend an embarrassing amount of time cursing Fruit Bat and searching more online forums, and then go and talk to the experimentalist who provided you with the list of adapters used in the first place. And to cut a long story short – turns out that adapters used for small RNAs are different from general sequencing adapters (specifically, the universal adapter isn’t used for one thing). And that’s why you should always talk to experimentalists straight away. And sanity check your data. Preferably in that order.

Anyway, much as I appreciate the ability to regale biologists with witty adapter tales in pubs, I also thought that this would be useful to write about, in case anyone is encountering similar issues they can’t make sense of. In the end I made a diagram of how small RNAs are processed for sequencing – here it is in all its glory:

Illumina sequencing adapters for small RNAs

Also, my adapters are now successfully trimmed away (using cutadapt, if anyone’s wondering). Victory.

The HeLa genome and why it matters

Over the last few weeks, a storm of criticism erupted around the publication of the HeLa genome, leading a few days ago to EMBL taking the genome sequence offline. Among others, people who brought the issue to light included Rebecca Skloot in her excellent NY Times Review, and Jonathan Eisen (@phylogenomics), whose blog also includes a summary of the comments and background to the story. As well as dealing with the implications of the HeLa genome study, the story also started a broader discussion of the way scientists and the media address ethical implications of research.

For anyone unfamiliar with the story of HeLa cells, I would highly recommend Rebecca Skloot’s book ‘The Immortal Life of Henrietta Lacks‘. HeLa cells are cervical cancer cells which were taken from a poor black woman, Henrietta Lacks, without her knowledge. While Henrietta herself died in 1951, her cells became the first immortal cell line, and went on to become one of the most widely used cell lines in medicine, and part of a multimillion dollar industry.

Undeniably these cells have changed the world, leading to a huge range of major medical discoveries, ranging from vaccines to techniques such as cloning and IVF. However, as Skloot discusses in her book, the history of exploitation and violation does not end with Henrietta. Scientists later came back to perform experiments on her descendants without their knowledge, and also used the cells in a range of non-consensual human experiments. A particularly horrific example was that scientists injected HeLa cells into people without their knowledge to see what happens – in 1954, a cancer researcher (Chester Southam) injected them into the arm of a woman already hospitalized with leukaemia, and left the HeLa tumours to grow for weeks before removing them. The same experiment was then performed first on a series of cancer patients (persumably based on the fact that Southam ‘conveniently’ had access to them), and was later performed on healthy people too, for which he used volunteer prisoners in the Ohio State Penitentiary. There were 3 doctors who refused to help with Southam’s research based on ethical concerns, which believe it or not, first only led to the doctors in question being accused of being “overly sensitive because of their Jewish ancestry”, but later led to the trials being publicly exposed as, in their words, being “illegal, immoral and deplorable”. Rather than being an isolated incident, this story fits into a backdrop of a long history of non-consensual medical experiments, performed on groups of oppressed and underprivileged populations.

In 1966, Henry Beecher published a study detailing a range of examples of ethical misconduct in experimental medicine, of which Southam’s study was just one example. In his conclusions, arguably still very much relevant, he notes: “The question rises them, about valuable data that have been improperly obtained. It is my view that such material should not be published. There is a practical aspect to the matter: failure to obtain publication would discourage unethical experimentation. How many would carry out such experimentation if they knew its results would never be published? Even though suppression of such data (by not publishing it) would constitute a loss to medicine, in a specific localized sense, this loss, it seems, would be less important than the far reaching moral loss to medicine if the data thus obtained were to be published.”

I find this point valuable in two different senses. First of all, ethical misconduct is wrong, and even if it seems like it’s beneficial in the short term, the global state of science that emerges from such misconduct, the “far reaching moral loss”, is not worth the perceived temporary benefits. I’d also encourage people to remember that the term ‘benefits’ is a relative one – while the advances in medicine may benefit the already well-off and privileged parts of society, these ‘benefits’ arise from the violence and exploitation perpetrated to the people these studies were done to. The second point that I thought was important is that unethical practices can be challenged at multiple levels. Scientists – don’t work on unethical projects. Publishers – don’t publish them. Funding bodies – don’t fund them. Legislators – put checkpoints into place to prevent them from happening. Everyone – if you are aware that unethical projects are taking place, speak up about them. While regulation has improved over the years, and we can but hope that the experiments such as those performed in the 50s and 60s are no longer possible, this is by no means the end of the ethical struggles of biology. In fact, as the science races on, with the study of moral implications of it lagging to catch up, arguably our struggles with bioethics have only just begun.

In the context of HeLa cells, there’s no doubt in my mind that publishing the HeLa genome sequence without family consent is wrong. The genome contains large amounts of personal information, relevant to both Henrietta and her descendants, and that information can have an immediate and practical impact on them – the fact that the study should not have been performed without their permission seems obvious. However, in his article at Genomes Unzipped, Joe Pickrell correctly points out that in reality, the HeLa genome has been available for years because of the different genomic datasets that have already been published about it, and that singling out and only focusing on the authors of the HeLa genome paper avoids dealing with the systematic failure of the scientific community as a whole to deal with genomic privacy.

I’d go a step further and say that it highlights a systematic failure to fully address the broader issue of consent, and that Beecher’s conclusions from 1966 are still very much relevant in 2013. I find it questionable whether non-consensually obtained human tissue samples have any place in research at all. I think communicating with Henrietta’s living relatives about the science is the right thing to do, but I also think that the one person who had a clear, full right to give permission for the use of her own tissues died in 1951. Though when it comes to genetic information, even with the full permission of the person in question, the question of whether family permission is required on top of this is still an unresolved one. I also think that the consent needs to be updated as the science advances, as some of the uses of people’s tissue samples simply could not have been foreseen at the time the samples were obtained. EMBL’s genome scandal highlights that the need for even the most basic level of consent hasn’t sunk in, but that also more broadly, the scientific community doesn’t seem do be doing enough to create a consent-aware, ethical research culture.

Given how quickly we can sequence genomes, it’s not clear to me why we need to continue using these at all. Madeleine Price Ball of the Personal Genome Project points out that about a dozen “well-consented” cell lines already exist, and that a hundred or so more are forthcoming in the near future. I salute the initiative and their commitment to an open consent process, and I also believe that this should be the gold standard for all science. It is unethical and irresponsible to continue partaking in science whose very foundations are non-consensual and exploitative, especially given that ethical alternatives are available.

Being scientists puts us in a position of some privilege and some power, especially given that modern biology appears to be on the brink of shaping the medical landscape of the 21st century. It is horrific, unethical and wrong for modern science to proceed using the bodies of black women who were given no choice. As scientists, it is our moral duty to stop engaging in and facilitating non-consensual science, and to work to create alternatives. We have the power to choose what we work on and how we work on it, and it is up to us to create a system where consent matters, and people matter. Together we can make it happen.