Metagenomics is the study of genomic content in a complex mixture of microorganisms. The two primary goals of this approach are to characterize the organisms present in a sample and identify what roles each organism has within a specific environment. Metagenomics samples are found nearly everywhere, including several microenvironments within the human body, soil samples, extreme environments such as deep mines, and the various layers within the ocean. Until recently, technical and economic constraints limited the depth of analysis necessary to obtain a representative picture of microbial and viral communities, their metabolic profiles, and their adaptation dynamics.
Advancements in sequencing technologies, including the introduction of massively parallel pyrosequencing with the Genome Sequencer FLX System, have enabled a tremendous groundswell of research in the field of metagenomics. Dramatic improvements in throughput and the elimination of biased cloning steps have extended the field far beyond traditional 16S rRNA gene analysis. Recent publications based on Genome Sequencer FLX data, including research in microbial diversity, gene content discovery, metatranscriptomics, and viral pathogen detection, demonstrate the breadth of this growing research area. The Genome Sequencer FLX System is ideally suited for metagenomic analysis, as its long reads ensure the high degree of specificity needed to compare sequencing reads against DNA or protein databases, and to unambiguously determine species identity and gene function. This article summarizes a few of the most recent metagenomic studies.
Environmental Microbial Diversity and Gene Content Screens
Metagenomic studies have a wide range of goals, from gaining a snapshot of what is present in the environment, to the metabolic role that the microbes are performing in the environment, and how the microbial community responds to a changing environment.
A recent study explored the diversity and function of unicellular N2-fixing cyanobacteria (UCYN), a widely distributed oceanic species . Previous research suggested the existence of a unique phylogenetic group (UCYN-A) that expresses nitrogenase genes with maximum transcript abundances during the daytime rather than the evening. While this characteristic is unprecedented in all known cyanobacteria, the inability to cultivate the species limited further analysis with traditional methods. Rapid shotgun sequencing of flow cytometry-sorted 16S rRNA genes from an ocean water sample with GS FLX Titanium series chemistry (400-bp reads) generated a genomic library with 10-fold coverage of the 2- to 3-Mb UCYN-A genome. Reads were mapped using BLAST analysis, revealing the entire nitrogenase gene cluster on one assembled contig. The results demonstrated the absence of gene sequences corresponding to oxygen-evolving photosystem II and carbon fixation. The surprising outcome, achieved from a single sequencing run, is the first report of free-living cyanobacteria that are not phototrophs.
The lack of oxygenic photosynthesis in these widely distributed ocean species has implications for carbon and nitrogen cycles, the evolution of photosynthesis and nitrogen-fixation. The new findings could also have enormous implications for accurately modeling climate change. The way nitrogen in the oceans affects the capacity to absorb carbon from the atmosphere remains a central question in our understanding of the earth’s biosphere.
Human Microbial Diversity and Gene Content Screens
Currently, the most widespread metagenomics initiative is the characterization of microbial communities within the human body. The body is dependent upon interactions with these microbial organisms for a variety of known functions, including nutrient digestion and immune defense. One of the primary goals of the microbiome initiative is to determine whether there exists an identifiable ‘core microbiome’ of shared organisms or genes found in a given body habitat of all or the majority of human beings.
A recent study explored human microbial communities by sequencing fecal samples from adult monozygotic and dizygotic twin pairs and their mothers . The researchers performed 16S rRNA sequencing with the ABI 3730xl capillary sequencer to target the full-length gene and the GS FLX system to survey the gene’s V2/3 variable region and V6 hypervariable region. Taxonomic assignments using BLAST and Hugenholz annotations revealed immense microbial diversity across all individuals’ samples. In fact, no phylotype was present at more than ~0.5% abundance in all of the study samples and even the most abundant phylotypes varied greatly in their proportional representation in the sampled gut communities. The results invalidated the hypothesis of a “core microbiome” based on the relative abundance of bacterial families. When comparing the sample sequence data against functional databases the results revealed common functional categories of genes and metabolic pathways found consistently across all samples. The research suggests that a variety of bacterial species can perform the same metabolic functions.
The study also demonstrated the tremendous value of long sequencing reads for assigning species identity in metagenomic analyses. By comparing reads of varying lengths against the GS FLX Titanium series’ 400-bp reads, the investigators found that the frequency and quality of sequence assignments significantly improved as read length increased.
Viral Pathogen Detection
Metagenomic analysis has recently demonstrated a tremendous ability to detect unknown viral pathogens from samples from infected individuals. Notable studies published in the New England Journal of Medicine  and Science  used unbiased high-throughput sequencing with the Genome Sequencer FLX system to identify a new arenavirus that had been transmitted via solid-organ transplantation and to identify a pathogen potentially responsible for honey bee collapse disorder.
Another study used similar metagenomic sequencing methods to identify a new Ebola virus responsible for a large hemorrhagic fever outbreak in Uganda in November 2007 . Researchers at the Centers for Disease Control and Prevention received 29 blood samples from suspect cases from Uganda for immediate testing. Evidence of acute Ebola virus infection, a known cause of hemorrhagic fever, was negative when the samples were initially tested with highly sensitive real-time RT-PCR assays specific for all known Zaire and Sudan ebolaviruses and marburgviruses. High-throughput sequencing of patient serum RNA with the Genome Sequencer FLX system using established metagenomic methods resulted in a draft sequence of the entire viral genome.
Analysis of the results revealed that the newly discovered virus differed from the four existing Ebola virus species with approximately 32% nucleotide difference from even the closest relative. The newly identified species, named Bubdibugyo Ebola virus for the region of outbreak, is thought to be distantly related to the Côte d’Ivoire ebolavirus. The extent of divergence between known species suggests significant antigenic and pathogenicity differences among these viruses. The study has important implications for design of future diagnostic assays to monitor hemorrhagic fever and for efforts to develop effective treatments for this fatal disease.
Metagenomics is also expanding our understanding of the regulation and dynamics of expressed genes in the environment. The growing field of metatranscriptomics has been largely enabled by sequencing microbial community cDNAs with the Genome Sequencer FLX system. The appeal of metatranscriptomics is the potential to reveal real-time response of a microbial community to environmental changes. One notable study employed metagenomic sequencing and analysis to explore gene expression in microbial communities from ocean surface water samples . As expected, this study identified genes associated with photosynthesis, carbon fixation, and nitrogen acquisition, all pathways expected in an open ocean microbial environment. However, 50% of the genes identified in the study were unique, and from gene categories previously undetected in metagenomes.
Until recently, metagenomic analysis was limited by the cost, low throughput, and inherent cloning bias of the Sanger technologies. The Genome Sequencer FLX System provides a comprehensive view of metagenomic samples with high-throughput, no cloning bias, and the long read lengths required to characterize diversity and functional analysis of microbial communities. The recent surge of research in this field has led to fundamental breakthroughs in our understanding of earth’s habitats, the human ecosystem, and infectious disease.
- Zehr et al. (2008) Science 322:1110–1112
- Turnbaugh et al. (2009) Nature 457:480–484
- Palacios et al. (2008) New Engl J Med 358:991–998
- Cox-Foster DL et al. (2007) Science 318:283–287
- Towner et al. (2008) PloS Pathogens 4: e10000212
- Frias-Lopez et al. (2008) PNAS 105:3805–3810
This article was originally published in Biochemica 2/2009, pages 4-5. ©Springer Medizin Verlag 2009