02-Dec-2021 - Max-Planck-Institut für molekulare Genetik

Exploring the current paradigm of gene regulation

How much tissue-specific information is contained in enhancer sequences?

How do cells know when to activate a certain gene? This information is encoded in the sequence of the DNA, but our understanding of this code is incomplete. Researchers now tested how much information can be extracted from sequence data to predict which gene is active in which tissue. 

A good storyteller knows exactly which anecdotes will bring his stories’ characters to life. By telling the right story at the right time, our genome even manages to give rise to hundreds of different cell types with characteristic life stories breathing an individual identity into every cell.

DNA snippets scattered across the genome harbor the code that directs the script of a cell’s life, successively switching genes on and off. Sequences called enhancers play an outstanding role in this process. They attract transcription factor proteins that start the expression of genes, thereby “enhancing” their activity. In some cases, they are located far away from the gene they activate.

Researchers Philipp Benner and Martin Vingron from the Max Planck Institute for Molecular Genetics (MPIMG) set out to decipher the instructions of the activation patterns in distinct cell types and embryonic tissues of the mouse.

With a series of statistical and bioinformatic analyses, the scientists identified several hundreds of tissue-specific DNA subsequences or “codewords” in enhancers that guide transcription factors, not only confirming sequences already known from other studies, but also identifying many new ones. The results have been published in several articles in NAR Genomics and Bioinformatics and the Journal of Computational Biology.

Training a model

“Today, researchers assume that all the information is in the DNA sequence, including information for specific cell types, tissues, and organs,” says Martin Vingron, Director at the MPIMG. According to the prevailing theory, transcription factor proteins recognize “codewords” in enhancers that are specific for a certain cell type, allowing the genome to tell a cell’s story by jumping to the right chapters. “We wanted to see how far this approach would take us and test its limits,” says Vingron.

The researchers developed a program that is able to identify DNA sequences that are recognized by the cell in order to activate genes in a tissue-specific way. They achieved this by training a statistical model with existing experimental data, telling it which enhancer is active in which tissue. Namely, they used sequencing data from eight tissues of the embryonic mouse like heart, lung, brain, or liver.

Learning to predict

By comparing sequence data between the tissues, the program learned to recognize sequence patterns in enhancers that are characteristic for certain tissues.

This told the researchers how much cell type-specific regulatory information is actually contained in the DNA sequence of enhancers, explains Philipp Benner, who is a postdoctoral researcher in Vingron’s lab: “The better our algorithm can classify any given enhancer, the more information it contains about the tissue or cell types that it is responsible for.”

The statistical classifiers can also identify DNA subsequences that might underlie cell type-specific gene activation. In fact, Benner found several hundred new codewords in addition to patterns that have been identified in other studies.

“Overall, we established a strong and, most importantly, an interpretable model,” says Benner.

Reaching the limits

“With our advanced methods, the predictions are promising but far from perfect”, says Vingron. “Our results indicate that we might really have only a fragmentary understanding of the actual cell type-specific regulatory code.”

It might be possible that not all the required information is contained in the DNA sequence of enhancers but is distributed elsewhere in the genome. Some cross-references in the storybook of the genome might still hide in other regulatory sequences, like promoter regions that are in close proximity to the gene itself.

Facts, background information, dossiers
More about MPI für molekulare Genetik
More about Max-Planck-Gesellschaft
  • News

    Unmuting the genome

    Hereditary diseases as well as cancers and cardiovascular diseases may be associated with a phenomenon known as genomic imprinting, in which only the maternally or paternally inherited gene is active. An international research team involving scientists at the Technical University of Munich ... more

    How the brain’s blue spot helps us focus our attention

    How can we shift from a state of inattentiveness to one of highest attention? The locus coeruleus, literally the “blue spot,” is a tiny cluster of cells at the base of the brain. As the main source of the neurotransmitter noradrenaline, it helps us control our attentional focus. Synthesizin ... more

    Animal vaccines with self-spreading viruses

    Since the first lab-modified virus capable of replication was generated in 1974, an evidence-based consensus has emerged that many changes introduced into viral genomes are likely to prove unstable if released into the environment. On this basis, many virologists would question the release ... more

  • Videos

    Epigenetics - packaging artists in the cell

    Methyl attachments to histone proteins determine the degree of packing of the DNA molecule. They thereby determine whether a gene can be read or not. In this way, environment can influence the traits of an organism over generations. more

    Biomaterials - patent solutions from nature

    Animals and plants can produce amazing materials such as spider webs, wood or bone using only a few raw materials available. How do they achieve this? And what can engineers learn from them? more

    Chaperones - folding helpers in the cell

    Nothing works without the correct form: For most proteins, there are millions of ways in which these molecules, composed of long chains of amino acids, can be folded - but only one way is the right one. Researchers in the department "Cellular Biochemistry" at the Max Planck Institute for Bi ... more

  • Research Institutes

    Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V.

    The research institutes of the Max Planck Society perform basic research in the interest of the general public in the natural sciences, life sciences, social sciences, and the humanities. In particular, the Max Planck Society takes up new and innovative research areas that German universiti ... more