My watch list
my.bionity.com  
Login  

RealTime ready qPCR Assay Design and Configuration Portal Content

Dr. Heiko Walch
Manuel Dietrich
Roche Diagnostics GmbH
Nonnenwald 2
82377 Penzberg / Germany

Abstract

The RealTime ready Configurator is a web-based configuration and ordering portal for function tested, custom RT-qPCR assays based on Universal ProbeLibrary (UPL) technology. It contains consolidated gene annotation information derived from three key public gene annotation resources (Ensembl, NCBI, and UCSC) for human, mouse, and rat genes. The Configurator content can be queried by web forms and used to either select cataloged and already tested RT-qPCR assays or to initiate new assay development processes on demand. RealTime ready RT-qPCR assays are based on the short LNA-substituted (Locked Nucleic Acid) hydrolysis probes of the Universal ProbeLibrary, and are designed using the proprietary ProbeFinder software. For each new target gene, at least 3 assays are designed and tested. Besides standard oligo synthesis QC methods, the test procedure contains real-time RT-qPCR experiments using a universal cDNA for the respective organisms. In order to qualify, all assays must fulfill Roche in-house criteria, which are consistent with the recently published MIQE guidelines (e.g., specific amplicon shown in agarose gels, ~100% PCR efficiency +/-10%, >= 3 orders of magnitude linear range [1]). This text summarizes the basis of the Configurator’s content, the assay design process, and the final assay annotation within the Configurator.

RealTime ready Configurator Content

The content is primarily based on three key public gene annotation resources:

  1. Ensembl (version 60.37e; Nov. 2010)[2, 3, 4]
  2. Entrez Gene (Dez. 2010)[5, 6]
  3. UCSC (Dez. 2010)[7, 8]

Figure 1: Focus Panels – Collections of RealTime ready assays covering one specific topic.

All established links from gene annotation within the Configurator to identifiers from other databases are derived from these aforementioned authorities. Due to this broad informational basis, most sequence or probe identifiers can be used as queries in one of the search modes of the Configurator. A query will be internally matched to the annotated gene corpus, and if one gene transcript is covered by a RealTime ready assay, the query will be matched up with the gene and the assay ID in the search results. All information is locally stored. In order to ensure the confidentiality of the searches, no information exchanges are established with any data sources outside of the Configurator web service. All data traffic is encrypted according to common web and eCommerce standards [9].

At present, the Configurator hosts assays and annotation for three species: Homo sapiens, Mus musculus, and Rattus norvegicus. For all three organisms, the content encompasses the combined annotation of the three main databases and their attached information. The three databases follow their individual update regimes. The Configurator’s annotation content is dynamic in the sense that it is subjected to regular updates (or additional updates if annotational changes in the public databases make it necessary to renew the content).

Figure 2: Search by Pathway

Focus Panels
To facilitate a comprehensive and easy selection of targets that may be of interest in the context of a particular field, the Configurator provides pre-selected gene lists for particular research topics which can be accessed by selecting “Search by Focus Panel”. The Focus Panel lists (see Figure 1) were generated by extracting information from literature, combining other publicly available data sources, or consulting experts in the field [10, 11, 12, 13, 14].

Each list is compiled from various sources. For each of the three organisms, all targets are contained that are currently covered by a RealTime ready assay. Because new assays are added to the catalogue on a daily basis, the list content may change.

Figure 3: Individual steps in the development workflow for a new RealTime ready RT-qPCR assay.

Pathway Maps
The interactive pathway maps in the Configurator are based on our own literature work and publicly available sources. Currently, the pathways show canonical signaling and interaction networks. The gene symbols that can be selected for assay search in a particular pathway are not exclusively reflecting official names. In many cases, we have chosen symbols that are more commonly used in the literature. For example, the official gene symbol MAPK14 is clearly less often used in the literature than one of its ambiguous aliases, p38. To account for these ambiguities, the pathway maps contain an additional feature that expands the selected symbol to include the most common aliases and official names in the search. In addition, if (for example) one pathway node represents a group or family of different targets, the individual unique members are also used as search terms.

RealTime ready RT-qPCR assays can be used to monitor and assess changes on a transcriptional level. The majority of the signal modulation in interaction networks, as depicted in our or other public pathway databases, is not based on transcriptional regulation, but rather on protein-protein interaction and modification. In most pathways, these interactions have transcriptional consequences nevertheless. We try to address this by not only providing the pathway nodes themselves, but also the target genes of the individual signaling cascades. As for the other searches, the pathway search is performed within the context of the selected organisms and by gene symbol or alias.

Assay Design and Annotation

Each RealTime ready assay is designed according to the following steps:

  1. Target Selection
  2. Transcript Selection
  3. Assay Design
  4. Assay Evaluation & Selection
  5. Assay Annotation
  6. RealTime ready Assay Release

Figure 4: Assay Visualization for the RealTime ready assay #101040 PTEN (ENSG00000171862). The introns are shown as grey lines connecting the exons visualized as boxes. The CDS is filled and the UTR’s are framed in light blue. The exon sizes are scaled according to the legend in the lower right. The intron sizes are standardized to put an emphasis on the exons and possible differences in splice variants.

The target selection is supported by the various search functionalities of the Configurator. After an order is received in the system, a suitable design sequence is defined. In the next step, the sequence and its annotation (exon/intron structure, SNPs,…) is used in a ProbeFinderbased assay design. For new custom assays, at least 3 individual designs are produced and tested for assay performance. The assays finally selected must fulfill strict quality criteria as specified in Grepl et al. [15]. New assays are subjected to our mapping pipeline, and the new assay is subsequently released and added to the Configurator.

Figure 5: Tabular mapping information for the RealTime ready assay #101040 PTEN (ENSG00000171862). All known variants from Ensembl and RefSeq are listed. If the transcript contains both primer and the probe binding site, the start and end positions for that particular assay are shown. Transcripts without start/end annotation are missing at least one of the assay components binding sites.

Target Definition
As of spring 2009, Ensembl, NCBI (EntrezGene), and UCSC have rooted their gene-based services on one common gene build released from the Genome Reference Consortium (currently human Build 37 February 2009 [18]). Because they use different annotation pipelines, and despite the fact that they are using the same genome or coordinate system, the three authorities sometimes reach different conclusions for individual gene annotations. From a pragmatic perspective, every scientist has an individual scientific and historical background, and a resulting preference for one of these authorities. In order to try and compensate for the different flavors, and to avoid adding further complexity, we explicitly waived the option to add our own genome or transcriptome annotation. Instead, we focused on using and combining the wealth of well annotated publicly available databases in the RealTime ready Configurator wherever possible. One particular gene that is crossreferenced between the different databases will be shown as a single entry in the Configurator. Access to the original information from the different sources is available via hyperlinks from all assay detail pages. The annotated information can be browsed and used to identify the gene of interest. When a RealTime ready assay is already available for one particular gene, the mapping information for the individual transcript identifiers used by the different authorities is shown in detail (see Figure 4 and Figure 5).

If no assay is currently available for the gene, we identify a suitable design transcript sequence based on the gene annotations. Our primary annotation resource for gene and transcript annotation is Ensembl.

The first step in every RT-qPCR assay design process is identifying the correct target gene. The different search functionalities in the Configuration Portal work on the aforementioned combined data and annotation corpus. Therefore, most publicly available identifiers are known to the portal and can be used to identify the respective gene (see Table 1).

Identifier Description Examples
Assay ID Assay IDs as provided in the Configuration Portal 100.000+
Affymetrix Affymetrix probe set 202443_x_at,
Agilent Agilent probe set IDs A_23_P200792
CCDS Consensus Coding Sequence CCDS43905
EMBL / Genbank / DDBJ IDs provided by one of the three authorities 11275978
Ensembl Ensembl IDs ENSG00000134250
Entrez Gene Entrez Gene UIDs 4853
HAVANA IDs provided by Human and Vertebrate Analysis and Annotation effort OTTHUMT00000055087
HGNC /MGI / RGD Human Genome Organization Gene Nomenclature Committee Names and IDs
Mouse Genome Informatics / Rat Genome Database Names and IDs
NOTCH1
Illumina Illumina Probe IDs ILMN_1729161
IPI International Protein Index IPI00412982
MIM Online Mendelian Inheritance in Man 109730
miRBase miRBase Sequence IDs MI0000060
NimbleGen NimbleGen Probe IDs NM_024408P09940
PDB Protein Database IDs 1PB5
RefSeq RefSeq Nucleotide or Protein IDs NM_017617
PubMed ID Literature database maintained by the National Center for Biotechnology Information 20823234
Rfam Database of RNA Families
UCSC UCSC IDs uc003olp
UniGene UniGene Cluster ID Hs.495473
UniProt / SwissProt / TrEMBL UniProt protein ID B7WP15_HUMAN

Table 1: Available identifiers qualified for the batch search input files. The RealTime ready Configurator has two keyword search entry points. The keyword search itself works with up to five different search phrases combined with ‘AND’ or ‘OR’. The batch search is capable of processing and searching up to 384 different IDs from the sources depicted above and provided via one single uploaded ID file.

Figure 6: Graphical representation of the annotated transcripts from Ensembl genome browser for the human gene MAPK14 (p38alpha; ENSG00000112062). From 9 different annotated transcripts in Ensembl, ENST00000229794 (MAPK14-002 marked with asterisk) was chosen as the representative or “canonical” transcript for the assay design.

Transcript Selection
Ensembl, Entrez Gene (NCBI), and UCSC are the three key public gene annotation resources that build the content of the RealTime ready Configurator and are used for target definition and design of RealTime ready assays [2, 3, 4, 5, 6, 7, 8, 16, 17]. Once the target gene of the selected species is defined, we try to identify the most prominent transcript that may serve as a kind of “canonical” representation of the target gene (see Figure 6). Most of the human protein coding genes are structurally organized in exons and introns [19]. Alternative splicing of the primary mRNA is one of the key foundations of transcriptional and functional plasticity of the human genome [19, 20, 21]. In order to design RealTime ready assays that can be used in a broad range of applications and for multiple different sample materials, we apply an automated procedure to identify design transcript sequences that show a widespread distribution and are referenced from different annotation sources. The identified transcript sequence is then submitted to a ProbeFinder-based UPL RT-qPCR assay design [24].

Assay Design

UPL Technology
All RealTime ready assays are based on UPL technology [25, 26]. Normal RT-qPCR hydrolysis probes (~20–25 bp) are highly specific and therefore need to be designed and validated for each individual target. UPL probes are shorter (8–9 bp), and the probe binding is stabilized by LNAs [29, 30]. Due to their relative shortness, each of the 165 available different UPL probes can be reused for literally thousands of different target sequences while maintaining good amplicon specificity [26, 27, 28].

Specific UPL subsets for individual organisms are available; the probes in these sets are specifically selected to ensure optimum transcriptome coverage for each organism using only 90 different UPL probes [26, 27]. For example, the human UPL subset covers more than 600.000 probe binding sites in the transcriptome (and, as a result, potential RT-qPCR assays). In order to achieve maximum coverage and flexibility regarding the assay position in the transcript, we use all 165 available probes for RealTime ready RT-qPCR assay design, regardless of the gene source organism.

For more information regarding UPL, see [25, 26, 27, 28, 35].

ProbeFinder Design
The ProbeFinder RT-qPCR assay design is a multistep process developed to return optimal primer/probe combinations for a given target sequence [26,28]. The core design algorithm of the UPL assay design center [24] is based on the widely used Primer3 design software [22, 23]. Default parameters for the design are set to:

Length 18–27 bp (20 bp optimum)
Tm 59–61°C (60°C optimum)
Amplicon 60–150 bp
Intron Spanning Yes

Table 2: ProbeFinder Assay Design Defaults.

These restrictions form one of the basic prerequisites for all RealTime ready assays to be able to be combined and used with the same LightCycler® 480 Instrument run protocol. Intron spanning assay designs are preferred for all targets. By convention, an assay is defined as intron spanning if at least one annotated intron is either covered by or contained between the primer binding sites. This definition implies that primer binding sites do not necessarily span exon/exon borders.

Figure 7: Design amplicon lengths of tested and selected assays. The design-dependent minimum and maximum are 60 and 150 bp, respectively. 50% of all amplicons are equal to or shorter than 75 bp and 99% are equal to or shorter than 132 bps.

When using the RealTime ready standard LightCycler® Instrument runtime protocol, the extension time limits the achievable amplicon sizes. As a consequence, any residual genomic DNA contained in the cDNA preparation will not contribute to the RT-qPCR amplification results if the introns are large enough. In the case of shorter introns, additional products can be easily detected by additional distinct bands in agarose gel analysis. During our in-house quality assurance procedures, all RealTime ready assays are tested experimentally in the lab and subjected to gel analysis. Within our standard experimental setup, all assays show distinct bands and no nonspecific side products. In general, designing intron spanning assays is considered good practice for RT-qPCR assays (and might be the only way to discriminate mRNA from DNA signals in cases where residual genomic DNA cannot be avoided).

Besides the benefit of short extension times, the short amplicons are also suited for working in experimental setups involving starting material with lower RNA quality, as in the case of formalin-fixed paraffin-embedded tissue [32, 33]. Amplicons with more than 300 bp are known to be problematic in these setups. The average design amplicon size of RealTime ready assays is ~81 bp, with 99% of all assays having amplicons equal to or shorter than 132 bp (see Figure 7). In an individual experiment with one particular sample material (and hence one particular splice variant mixture), other isoforms may be present that result in different amplicon sizes.

Figure 8: Assay position relative to the cDNA length 0–100%. The right-skewed distribution is primarily due to the fact that the assays are designed to be intron spanning and the intron density within eukaryotic cDNAs has a 5’bias.

Design Selection
The ProbeFinder RT-qPCR assay design algorithm identifies all possible UPL-based assays for one particular sequence and subjects them to an internal scoring scheme. Besides the standard primer design rules that are part of the Primer3 algorithm ProbeFinder adds various improvements (e.g. in-silico check for primer binding sites, ’SNP avoidance‘ based on ensembl annotation, Primer/Probe complementarity, Intron Size, etc.) Following these ranked results, we manually select at least three different assays for each target (see Figure 8). If possible, assays are selected to span different introns. No further explicit restrictions are applied during the design selection.

It is known that intron distribution is biased toward the 5’end of the genes in most eukaryotic organisms [31]. RealTime ready assays are, in general, designed as intron spanning assays. Therefore, the distribution of the assays compared to the cDNA length partly follows this biological phenomenon, as seen in Figure 9.

After selecting the assays, the PCR primers are synthesized in-house, quality tested, and stored in an automated Roche production facility. In order to ensure the best reproducibility, the plating and processing of the primers during our internal assay test workflow is performed using the same primer lots and in the same production facility that is used for the final custom assay and panel production.

In Silico Mapping and Annotation

Figure 9: ProbeFinder assay design solutions for the preferred transcript from MAPK14 (ENST00000229794). In the first testing round, 3 assays (from the 37 different possible assays shown here) are selected and tested in Roche laboratories in order to establish a new RealTime ready RT-qPCR assay for MAPK14. If all three assays fail to pass, another testing round with different designs is initiated.

After the RealTime ready assays are successfully tested and qualified in the lab, the primer and probe sequences are used to fully integrate and annotate the assays with the public content available on the Configurator. We apply an in-house developed BLAST-based [34] in silico method to identify all possible splice variants that could be detected by a RealTime ready assay. Transcript variants get assigned to the assay if both primers and the probe binding motif match the sequence of the particular isoform. Following this first mapping round, the transcripts are deconvoluted into the individual exons. The combined exon-to-genome (Ensembl and UCSC) and amplicon-to-exon mappings are used to position the primer/probe onto the genome coordinate system. This data is then used to generate the assay visualizations on the assay details pages. The images visualize the different assays and their relations to individual exons of splice variants of one particular gene. Compared to genome browsers, we use a slightly different, exonfocused approach for our assay/transcript visualizations. Due to the fact that the median intron length is ~25 times longer (3.4 kb) than the median exon length (0.13 kb) [2], a normal scaled visualization of the exon/intron structure quite often leads to very narrow exons and large frame-filling introns. For our RealTime ready RT-qPCR assays, the exon coverage is the important information. Standardizing the intron sizes in the visualization results in a more intuitive overview of the different exons utilized in individual splice variants.

The positions of the resulting mapping are partly shown at the top of every assay details page (see Figure 4). In addition, detailed amplicon–to-transcript mapping information for all target transcripts from RefSeq and Ensembl are provided in a tabular format in the mapped transcript section (see Figure 5). In accordance with the MIQE guidelines [1], detailed assay primer and probe information is available for download in the “My Orders” section of the Configurator.

Summary

The gene and transcript annotational content of the RealTime ready Configurator offers a combined view of three of the main gene annotation authorities. The positional information of the lab-tested RealTime ready assays regarding transcript or exon coverage is presented in a graphical overview and tabular format. Additional content (including Focus Lists and pathway maps) provides a convenient and browsable entry point for target gene selection for cataloged as well as new assays that are developed on demand.

Abbreviations (S1)

LNA Locked Nucleic Acids
UPL Universal ProbeLibrary
MIQE Minimum Information for Publication of
Quantitative Real-Time PCR Experiments
Tm Melting temperature for oligonucleotides
NCBI National Center for Biotechnology Information
UCSC University of California, Santa Cruz
UTR Untranslated region of mRNAs

Literature and Links

  1. S.A. Bustin et al.: The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem. 2009 Apr; 55(4):611–22.
  2. T. Hubbard et al.: The Ensembl genome database project; Nucleic Acids Res. 2002 30(1):38–41.
  3. P. Flicek et al.: Ensembl’s 10th year; Nucleic Acids Research 2010 38:D557–D562.
  4. http://www.ensembl.org/Help/Permalink?url=http%3A %2F%2FNov2010.archive.ensembl.org
  5. D. Maglott et al.: Entrez Gene: gene-centered information at NCBI; Nucleic Acids Res. 2011 Jan; 39:D52–7.
  6. http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
  7. P.A. Fujita et al.: The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2010 Oct 18.
  8. D. Karolchik et al.: The UCSC Table Browser data retrieval tool Nucleic Acids Res. 2004 Jan; 32:D493–6. http://genome.ucsc.edu/index.html
  9. Advanced Encryption Standard (AES) http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
  10. I. Vastrik et al.: Reactome: a knowledge base of biologic pathways and processes. Genome Biol. 2007; 8(3):R39.
  11. L. Matthews et al.: Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2009 Jan; 37:D619–22.
  12. M. Ashburner et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000 May; 25(1):25–9.
  13. S. Hunter et al.: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009; 37:D224–228.
  14. C.F. Schaefer et al.: PID: The Pathway Interaction Database. Nucleic Acids Res. 2009; 37, D674–9.
  15. U. Grepl & R. Mauritz: RealTime ready – Function Tested qPCR Assays based on the Universal ProbeLibrary technology. Biochemica 2009 No.4.
  16. K.D. Pruit et al.: NCBI Reference Sequences: current status, policy and new initiatives; Nucleic Acids Res. 2009 Jan; 37:D32–6.
  17. http://www.ncbi.nlm.nih.gov/projects/RefSeq/
  18. http://www.ncbi.nlm.nih.gov/projects/genome/ assembly/grc/index.shtml
  19. E.S. Lander et al.: Initial sequencing and analysis of the human genome; Nature 2001. 409, 860–921.
  20. K. Hadas et al.: Alternative splicing and evolution: diversification, exon definition and function; Nature Reviews Genetics. May 2010 11, 345–355.
  21. E.V. Kriventseva et al.: Increase of functional diversity by alternative splicing; Trends in Genetics. 2003. Volume 19, Issue 3.
  22. S. Rozen & H.J. Skaletsky: Primer3 on the WWW for general users and for biologist programmers. Bioinformatics Methods and Protocols: Methods in Molecular Biology. Totowa, NJ , Humana Press; 2000. pp. 365–386.
  23. T. Koressaar & M. Remm: Enhancements and modifications of primer design program Primer3. Bioinformatics. 2007; 23:1289–1291. doi: 10.1093/bioinformatics/ btm091.
  24. UPL Assay Design Center: https://qpcr.probefinder.com/roche3.html
  25. Universal Probe Library Special Interest Site: https://www.roche-applied-science.com/sis/rtpcr/upl/ index.js
  26. P. Mouritzen et al.: The ProbeLibrary™ – Expression profiling 99% of all human genes using only 90 dual-labeled real-time PCR Probes. Biotechniques 2004 37:492–495.
  27. R. Mauritz et al.: Universal ProbeLibrary Set: one Transcriptome – One Kit. Biochemica 2005 No.2.
  28. P. Mouritzen et al.: ProbeLibrary: A new method for faster design and execution of quantitative real-time PCR. Nature Methods 2005 Vol.2 No.4.
  29. A.A. Koshkin et al.: LNA (locked nucleic acids). Synthesis of the adenine, cytosine, guanine, 5-methylcytosine, thymine, and uracil bicyclonucleoside monomers, oligomerisation and unprecedented nucleic acid recognition. Tetrahedon 1998 54, 3607–3630.
  30. P. Mouritzen et al.: Single Nucleotide polymorphism genotyping using locked nucleic acid (LNA). Exprt Rev. Mol. Diagn. 2003 3(1), 27–38.
  31. A. Sakurai et al.: On biased distribution of introns in various eukaryotes. Gene Volume 300, Issues 1–2, 30 October 2002, 89–95.
  32. C. Paska et al.: Effect of formalin, acetone, and RNAlater fixatives on tissue preservation and different size amplicons by real-time PCR from paraffin-embedded tissue. Diagn Mol Pathol 2004. 13:234–240.
  33. E.A. Takano et al.: A multiplex endpoint RT-PCR assay for quality assessment of RNA extracted from formalin-fixed paraffin-embedded tissues; BMC Biotechnol. 2010, 10: 89.
  34. S.F. Altschul et al.: Basic local alignment search tool. J. Mol. Biol. 1990; 215:403–410.
  35. Universal ProbeLibrary, Guide to Successful PCR Assays:
    http://www.roche-applied-science.com/sis/rtpcr/upl/upl_ docs/universal_probelibrary.pdf

For life science research only.
Not for use in diagnostic procedures.

LIGHTCYCLER, NIMBLEGEN, and REALTIME READY are trademarks of Roche.
Exiqon, LNA, ProbeFinder and ProbeLibrary are registered trademarks of Exiqon A/S, Vedbaek, Denmark.
Other brands or product names are trademarks of their respective holders.

Facts, background information, dossiers
  • Assay Designs
  • Intron
  • PCR
  • Agilent Technologies
  • University of California
  • National Center for…
  • RT-qPCR
  • RT-qPCR assays
  • cDNA
  • qPCR
  • qPCR assays
  • Universal ProbeLibrary
  • Ensembl
  • locked nucleic acid
  • PCR primers
  • Entrez Gene
More about Roche Diagnostics
Your browser is not current. Microsoft Internet Explorer 6.0 does not support some functions on Chemie.DE