Speeding up sequence alignment across the tree of life

A sequence search engine for a new era of conservation genomics

09-Apr-2021 - Germany

High-Performance Sterilizing-Grade Filtration for Solvents and Oily Formulations

Screening and Purification for Large Biomolecules in Multi-Well Plate Design

A team of researchers from the Max Planck Institutes of Developmental Biology in Tübingen and the Max Planck Computing and Data Facility in Garching develops new search capabilities that will allow to compare the biochemical makeup of different species from across the tree of life. Its combination of accuracy and speed is hitherto unrivalled.

Humans share many sequences of nucleotides that make up our genes with other species – with pigs in particular, but also with mice and even bananas. Accordingly, some proteins in our bodies – strings of amino acids assembled according to the blueprint of the genes – can also be the same as (or similar to) some proteins in other species. These similarities might sometimes indicate that two species have a common ancestry, or they may simply come about if the evolutionary need for a certain feature or molecular function happens to arise in the two species.

Beating the gold standard of comparative genomics research

But of course, finding out what you share with a pig or a banana can be a monumental task; the search of a database with all the information about you, the pig, and the banana is computationally quite involved. Researchers are expecting that the genomes of more than 1.5 million eukaryotic species – that includes all animals, plants, and mushrooms – will be sequenced within the next decade. “Even now, with only hundreds of thousand genomes available (mostly representing small genomes of bacteria and viruses), we are already looking at databases with up to 370 million sequences. Most current search tools would simply be impracticable and take too long to analyze data of the magnitude that we are expecting in the near future,” explains Hajk-Georg Drost, Computational Biology group leader in the Department of Molecular Biology of the Max Planck Institute of Developmental Biology in Tübingen.
“For a long time, the gold standard for this kind of analyses used to be a tool called BLAST,” recalls Drost. “If you tried to trace how a protein was maintained by natural selection or how it developed in different phylogenetic lineages, BLAST gave you the best matches at this scale. But it is foreseeable that at some point the databases will grow too large for comprehensive BLAST searches.”

Finding the needle in the haystack – but quickly!

At the core of the problem is a tradeoff between speed versus sensitivity: just like you will miss some small or well-hidden Easter eggs if you scan a room only briefly, speeding up the search for similarities of protein sequences in a database typically comes with downside of missing some of the less obvious matches.
“This is why some time ago, we started to devise the DIAMOND algorithm, in the hope that it would allow us to deal with large datasets in a reasonable amount of time,” remembers Benjamin Buchfink, collaborator and PhD student in Drost’s research group who has been developing DIAMOND since 2013. “It did, but it also came with a downside: it couldn’t pick up some of the more distant evolutionary relationships.” That means that while the original DIAMOND may have been sensitive enough to detect a given human amino acid sequence in a chimpanzee, it may have been blind to the occurrence of a similar sequence in an evolutionary more remote species.

A powerful tool for future research

While being useful for studying material that was directly extracted from environmental samples, other research goals require more sensitive tools than the original DIAMOND search algorithm. The team of researchers from Tübingen and Garching was now able to modify and extend DIAMOND to make it as sensitive as BLAST while maintaining its superior speed: with the improved DIAMOND, researchers will be able to do comparative genomics research with the accuracy of BLAST at an 80- to 360-fold computational speedup. “In addition, DIAMOND enables researchers to perform alignments with BLAST-like sensitivity on a supercomputer, a high-performance computing cluster, or the Cloud in a truly massively parallel fashion, making extremely large-scale sequence alignments possible in tractable time,” adds Klaus Reuter, collaborator from the Max Planck Computing and Data Facility.” Some queries that would have taken other tools two months on a supercomputer can be accomplished in several hours with the new DIAMOND infrastructure. “Considering the exponential growth of the number of available genomes, the speed and accuracy of DIAMOND are exactly what modern genomics will need to learn from the entire collection of all genomes rather than having to focus only on a smaller number of particular species due to a lack of sensitive search capacity,” Drost predicts. The team is thus convinced that the full advantages of DIAMOND will become apparent in the years to come.

Original publication

"Sensitive tree-of-life scale protein alignments using DIAMOND"; Nature Methods; Apr. 7 2021,

https://www.bionity.com/en/news/1170554/speeding-up-sequence-alignment-across-the-tree-of-life.html

Original publication

"Sensitive tree-of-life scale protein alignments using DIAMOND"; Nature Methods; Apr. 7 2021,

Topics

genomes genomics databases search engines genome analysis sequencing

Show all

Organizations

MPI für Entwicklungsbiologie Max-Planck-Gesellschaft

Gentle Counterflow Centrifugation for Superior Cell Processing Results

Holistic Multi-Column Chromatography Solution

Automated Filter Integrity Testing With Advanced Quality Risk Management for GMP Compliance

So close that even
molecules turn red...

NIR spectrometer manufacturer

Last viewed contents

AlcaSynn Pharmaceuticals GmbH - Innsbruck, Austria

Go to page

More from the department science Subscribe to newsletter

Speeding up sequence alignment across the tree of life

A sequence search engine for a new era of conservation genomics

Beating the gold standard of comparative genomics research

Finding the needle in the haystack – but quickly!

A powerful tool for future research

Original publication

Why European Colonization Drove the Blue Antelope to Extinction

Other news from the department science

Like a miniature lunar rocket: Researchers develop modular nanorobot

Crossbreeding old chicken breeds with hybrids improves animal welfare and egg production

2026 Future Insight Prize Goes to Spear’s Vasilis Ntziachristos

Social inequality is linked to faster biological aging

Inducing cell death in pancreatic cancer cells

25-year study: Sugar-sweetened beverages from childhood significantly increase high blood pressure risk

New polymorph of indomethacin discovered – a rare event in pharmaceutical research

Tailor-made functionalized gelatin – manufactured with reproducible results

Lab-on-a-Chip platform shows how immune cells attack cancer cells

New research helps understand how a long, healthy lifespan may be passed down across generations

Secondhand smoke can leave cancer-causing cadmium in the body

Chemists achieve breakthrough: Editing molecules instead of rebuilding them

A nasal spray reaches a woman's brain differently depending on the week

Known copper compound shows activity against Alzheimer’s-typical protein deposits

New method enables accurate sequencing of short peptides hidden in food and human body

Light switch makes cancer vulnerable to attack

Mini-Brains from Patient Cells Point to Vitamin B3 as Treatment for Rare Childhood Disease

AI helps scientists design better biochar catalysts for removing antibiotic pollution

Researchers find fructose sends a weaker “I’m full” signal to the brain than glucose

AI fast-forwards molecular simulations by 10,000-fold

Most read news

It may not just be what’s in ultra-processed foods, but how they’re made

New drug could slow the development of Alzheimer’s

New antibiotics discovered to treat multi-resistant germs

Cytospire Therapeutics announces oversubscribed £61 million Series A financing

Egg consumption is associated with a lower risk of Alzheimer’s Disease

Mini-Brains from Patient Cells Point to Vitamin B3 as Treatment for Rare Childhood Disease

Miltenyi Biotec expands Cologne production site for clinical reagents

Insect larvae as a screening tool

First European biotech with CAR-T and LNP technology under one roof

Fewer animal experiments thanks to virtual mouse

Daily glass of 100% fruit juice could help support mental wellbeing

The Bacterial Savings Account

More news from our other portals

Festo is cutting approximately 1,300 jobs in Germany

Future Foods Lab: Nomad Foods advances two startups to concept development

Detecting heavy metals in soil and water: New method for on-site analysis

Focused Energy secures US$240 million: the world’s first laser fusion power plant is set to be built in Germany

New research finds that almost all plant-based meat alternatives contain mycotoxins

Why doesn't coffee taste like caffeine?

Atomic reshuffle paves way for record-breaking catalysts for hydrogen production

Less hunger, more environmental problems?

Holography meets spectroscopy: Ultrafast microscopy method for optical processes

Water splitting catalyst creates hydrogen at low temperatures

Nordzucker is revising its beet pricing model and investing €160 million in its factories

Common structural analysis of interfacial water is inadequate, according to a new study

German plastics recycling on the brink of collapse

Nestlé to acquire smart food pioneer yfood to accelerate the brand’s international expansion

For the first time, researchers are peering inside record-breaking superconductors

Metso introduces an advanced lithium carbonate process to support battery materials production

Green light for Arla Foods and DMK Group merger ​

Pyrolysis oil instead of crude oil: Faster fluorine analysis reduces the risk for refineries

Frequency combs: the key to the next generation of spectroscopy

PFAS detection in minutes rather than weeks: deep-tech start-up Grapheal secures €2.5 million in EU funding

Cooking plastics into oil

Carbon dimer: precision measurement delivers new record value

Magnetic field during catalyst synthesis triples ammonia yield

Making Chemistry Greener: The 2026 Gerhard Ertl Lecture Award goes to Professor Marc Koper

So close that even molecules turn red...

Last viewed contents

AlcaSynn Pharmaceuticals GmbH - Innsbruck, Austria

Green light for Arla Foods and DMK Group merger

So close that even
molecules turn red...