When Eric Lander and Craig Venter stood before a crowd of reporters in Washington D.C. on June 2nd, 2000 to announce that their respective groups had completed a draft of the completed human genome, it was heralded as one of the greatest scientific discoveries of the past century.
By Chris Probert | Staff Writer | SQ Volume 11 (2012-2013)
The Human Genome Project (and the genome projects of other model organisms) has transformed biology; we have now boldly entered the “genomic era,” where one can peruse the entire DNA sequences of many organisms from their internet browser. But even before the first champagne bottles had been uncorked in celebration of this milestone of international scientific collaboration, it became clear that an organism’s sequence of DNA bases, taken alone, does not provide information about when, where, or how genes are expressed in patterns that allow the creation of that organism.
A Crucial Part to Parsing the Human Genome
Given these large unanswered questions, the National Institutes of Health organized another large-scale genomics project: the Encyclopedia Of DNA Elements (ENCODE). The project’s broad goal is to identify all DNA elements in the human genome and determine their roles in regulating gene expression. Dr. Joe Ecker, an ENCODE participant and a professor at the Salk Institute, likens the information the project seeks to a “recipe” that he hopes will show how cells take “the raw ingredients” of genomic DNA, and turn them into “masterpieces” of differentiated tissues, organs, and entire organisms. By Dr. Ecker’s metaphor, the Human Genome Project took the first step towards this goal by providing us with the “ingredients list,” or genome. ENCODE now seeks to understand how epigenetic “recipes” control where and when we express our “ingredients list” of genes.
The ENCODE project’s broad goal of understanding the “recipes” of gene expression has been broken into many sub-projects, each of which seeks to understand the role of one particular process. Dr. Ecker’s group has been focused on the process of methylation, or the reversible addition of a –CH3 group to Cytosine bases in a DNA sequence.
Using new, high-throughput sequencing approaches, the group first began studying the process in Arabidopsis thaliana, a plant related to mustard that is a common model organism, where they found large-scale differences in methylation both between different tissues within one plant and between different individual plants. They followed up this study with research in human samples that showed genomic methylation on a base pair by base pair scale. Their results demonstrated that, similarly to plants, the extent of variation in methylation between humans is much larger than previously thought, and that the presence or absence of methylation at particular loci strongly correlates to differences in gene expression. The group have termed this information about methylation at various loci and its effects on gene expression a “methylome” (think “methyl” + “genome”), and made it publically available to browse and download via the Salk website.
As methylation does not change the DNA sequence of an organism, it is “epigenetic,” or a form of heritable variation that is not reflected in just the coding sequence of a DNA molecule. Sequencing a DNA molecule, even with modern DNA sequencers, does not provide any information on the level of methylation within the molecule. As part of their contribution to the ENCODE project, the Ecker lab has been investigating ways to determine which particular cytosine bases within a DNA strand are or are not methylated, providing data with base pair resolution. The group’s main approach uses ChIP-Seq, a combination of Chromatin Immuno Preciptation and high throughput Sequencing.
The process relies upon antibodies specific to methlyated cytosine, which attach to the bases and allow their separation and detection with DNA sequencers. Dr. Ecker’s lab continues to test other novel methods for assaying methylation in DNA, which have led to the methylomes of many cell types published by other ENCODE participants.
Not So Junky After All
Several discoveries have already been made by ENCODE researchers. One example is how ENCODE data have shattered notions about the roles of non-protein coding DNA. From the earliest drafts of eukaryotic genome assemblies, scientists have been puzzled by the large part of the genome (80-95%, depending on the organism) that does not fall on protein-coding regions. Some scientists have gone as far as labeling these regions of the genome “junk DNA.” But through high-throughput sequencing and screening techniques, ENCODE has shown that a staggering 75% of the genome is indeed transcribed at some, and that at least 80% of the entire genome plays a vital role in metabolic function of stem cells. In short, these findings show that “junk” DNA is not junk at all – it’s both regularly transcribed and also vital for normal cell function.
The ENCODE project is far from complete; the project’s director estimates they have less than 5% of all the data they hope to collect. But with the combination of new, high-throughput sequencing technologies and the many scientists working together to better understand the processes surrounding epigenetic regulation of gene expression, the project appears to have a bright future. Researchers envision one day being able to pinpoint events that cause epigenetic modifications that have ramifications on human health. Perhaps behavioral or phenotypic patterns in parents can help explain complicated patterns in offspring, like the onset of cancer or autism. It is clear that the story of epigenetic gene regulation is far from complete, but that many exciting discoveries lie ahead.
WRITTEN BY CHRIS PROBERT. Chris Probert is a Computer Science major from Thurgood Marshall College. He will graduate in 2014.