Pyrosequencing of complementary DNA-PCR amplicons was used to determine comprehensive major histocompatibility complex (MHC) class-I genotypes in primates that provide essential preclinical models for studies of infectious disease, vaccine development, and transplantation. By sequence-based typing of rhesus, cynomolgus, and pig-tailed macaques, more than 500 unique MHC class-I sequences were resolved, nearly half of which had not been reported previously. The approach showed remarkable sensitivity, demonstrating that pyrosequencing is a viable method for ultra-high-throughput MHC genotyping in primates and possibly in human beings.
To study T cell responses, comprehensive major histocompatibility complex (MHC) genotyping methods are essential. Existing preclinical model organisms like the rhesus (Macaca mulatta), cynomolgus (M. fascicularis), and pig-tailed (M. nemestrina) macaque show an unprecedented complexity of their MHCs. Not only does gene content vary between macaque haplotypes, genomic sequencing also suggests that rhesus and cynomolgus macaques have at least 22 functional class-I genes transcribed at varying levels. More than 900 macaque MHC class-I sequences are currently known; robust genotyping assays are available for only a small fraction of them.
To maximize the utility of the animal models, ultra-high-throughput platform approaches for comprehensive MHC class-I genotyping are necessary. Wiseman and colleagues describe the adaptation of massively parallel pyrosequencing of cDNA-PCR amplicons for MHC genotyping of rhesus, cynomolgus, and pig-tailed macaques. As shown here, pyrosequencing using the Genome Sequencer FLX System provides a feasible approach for complete MHC class-I genotyping of all macaques used as preclinical models in biomedical research.
Materials and Methods
Samples from 92 macaques obtained from nine institutions were examined. All macaques were cared for according to the regulations and guidelines of the Institutional Care and Use Committees at their respective institutions.
Primary cDNA-PCR and pooling strategy
Total cellular RNAs were converted to cDNA using a commercially available system. Primary cDNA-PCR amplicons spanning 190 bp of exon 2 of macaque class-I sequences were generated using a commercially available high-fidelity polymerase. Each PCR primer contained one of 12 distinct 10-bp multiplex identifier (MID) tags along with adaptor sequences for the 454 Sequencing process. Primary amplicons were purified, normalized to equimolar concentrations, and pooled (groups of 12 macaques) for Genome Sequencer FLX Analysis.
emPCR amplification and pyrosequencing
emPCR amplification and pyrosequencing steps were performed with the Genome Sequencer FLX Instrument using Genome Sequencer FLX Protocols according to the manufacturer’s instructions. Each amplicon pool of 12 macaques was sequenced in one fourth of a 70 x 75 mm Standard PicoTiterPlate Device for the pilot study, and in one-sixteenth plate regions for each of the four pools in the follow-up experiment.
Image processing and base calling were performed using the Genome Sequencer FLX Software. Then high-quality sequence reads were sorted by their respective MID tags and the reads were assembled into contigs with 100% identity for each macaque. Nucleotide Basic Local Alignment Search Tool (BLASTN) analyses for the resulting contigs were performed against a custom in-house database of macaque MHC class-I sequences. To normalize transcript abundance levels between macaques, the number of sequence reads detected for each distinct class-I sequence was divided by the total number of sequence reads which formed contigs in each macaque. MHC class-I sequences not previously listed in GenBank were designated with species abbreviation and the locus to which they are most similar.
Results and Discussion
Macaque MHC genotyping by pyrosequencing
With primers based on highly conserved sequences within macaque MHC class-1A and -1B loci, a universal 190-bp cDNA-PCR amplicon was designed (Figure 1). Pyrosequencing of amplicons from 48 cynomolgus, pig-tailed, Indian-origin, and Chinese-origin rhesus macaques was performed in a single pilot run on the Genome Sequencer FLX Instrument. The amplicons were subdivided into four pools of 12 macaques distinguished by 10-bp MID tags. Wiseman and colleagues were able to acquire nearly 500,000 high-quality sequence reads containing a total of over 100 million high-quality bases. These numbers translate into an average of 9315 reads (range 7538-10769 reads) per macaque for the Indian rhesus macaque pool.
To evaluate the sensitivity of the Genome Sequencer FLX Instrument, four Mauritian cynomolgus macaques that are homozygous for well-characterized MHC haplotypes were examined. All previously described MHC class-I A and MHC class-I B sequences were observed; transcript levels ranged from 27.8% of total class-I sequence reads for Mafa-b*0440101 down to 1.4% for Mafa-B*0550101 (Figure 2a). Additionally, five novel sequences with transcript levels between 0.3% and 2.2% of total sequence reads were detected. The results for the remaining three MHC homozygous macaques as well as for eight heterozygous macaques were comparable. Each of the Mauritian MHC haplotypes carries an average of seven transcribed Mafa-B sequences plus two or three classical Mafa-A and nonclassical Mafa-E class-I sequences.
Analogous results were obtained from rhesus macaques (Figures 2 b, c). Class-I sequences of Indian-origin rhesus macaques were comparatively well characterized, whereas in a homozygous Chinese-origin rhesus macaque, four of six Mamu B-like sequences had not been reported previously. The prevalence of previously undescribed sequences was even more pronounced for the pig-tailed macaques: Of the 136 distinct MHC class-I sequences, over 100 were previously unknown MHC transcripts.
In a follow-up study, Wiseman and colleagues examined whether they could maximize the efficiency of Genome Sequencer FLX genotyping for large cohorts by reducing the depth of sequence coverage. They pyrosequenced four amplicon pools containing 12 rhesus macaques each in one of 16 regions of a 70 x 75 mm Standard PicoTiterPlate Device. Although sequencing depth was decreased by an order of magnitude to about 800 sequence reads per macaque, an average of 20.5 distinct MHC class-I sequences per macaque were identified (compared with 24.3 sequences in the pilot study). This shows that even with a modest reduction in sensitivity, Genome Sequencer FLX Analysis still provides considerably more comprehensive genotyping than existing methods.
Accuracy of pyrosequencing-based
MHC genotyping of macaques
In sequence-based genotyping, errors can easily accumulate as a result of polymerase misincorporations or sequencing artifacts. Adding a simple filtering step that requires a minimum of five (pilot) or two (follow-up) identical reads for a sequence to be included in the downstream BLASTN analysis can diminish the number of these artifacts.
More than 98.3% of the resulting filtered reads were consistent with known or previously undescribed MHC class-I sequences by BLASTN analysis (Table 1). With the filter step, the overall error rate of these data was reduced to <1.7% of the sequence reads evaluated subsequently. Excluding this low level of artifacts entails straightforward, manual editing, accomplished by intra- and intermacaque sequence comparison. Thus, the error rate in Genome Sequencer FLX Pyrosequencing is acceptably low. This filtering step was applied to all of the MHC class-I genotyping data presented here.
To exclude experimental artifacts, the researchers examined the distribution of MHC class-I sequences in pedigreed cynomolgus macaques. The relative abundance of each MHC transcript was remarkably consistent on the haplotypes shared among the offspring and their parents. Even alleles that were present in as little as 0.2% of the total class-I transcripts for these shared haplotypes were detected.
Wiseman and colleagues further examined the accuracy of the pyrosequencing methodology and analyzed Indian rhesus macaques that share a certain haplotype representing the only complete macaque genomic sequence currently available. This B11a haplotype carries 19 loci that have the potential to encode at least 14 functional gene products. Whereas cDNA cloning and Sanger sequencing identified transcripts for eight of these loci, the researchers were able to identify mRNA transcripts from at least 13 loci using the increased sensitivity of the Genome Sequencer FLX Instrument. They also consistently observed similar class-I transcript profiles for other ancestral haplotypes shared by unrelated macaques. This suggests that the Genome Sequencer FLX Analysis provides at least semiquantitative representation of relative class-I transcript levels within an individual.
Identification of high-frequency
Mamu class-I sequences
Using the Genome Sequencer FLX System, comprehensive MHC class-I genotypes and expression profiles for 68 Indian- and Chinese-origin rhesus macaques obtained from four independent sources were generated. Of 278 distinct class-I sequences detected within the rhesus macaque cohort, there were 33 distinct Mamu-A, Mamu-B and Mamu-E sequences in at least 10% of this cohort that were expressed at relatively high transcript levels (³4% of total sequences per macaque). High-frequency alleles such as these may represent high-priority targets for functional immune characterization.
Macaques provide essential preclinical models for infectious disease, vaccine, biodefense, and transplantation research. Unfortunately, they also have the most complex MHC genetics of any primate species described to date, and existing methods for MHC typing are simply inadequate. As shown here, massively parallel pyrosequencing using the Genome Sequencer FLX System can provide comprehensive and cost-effective MHC class-I genotyping, thus potentially revolutionizing the use of macaques to guide immunological studies.
Wiseman RW et al. (2009) Nat Med, doi:10.1038/nm2038