Simulating large biomolecules with quantum precision

New AI model pushes the boundaries in terms of universality, efficiency, accuracy and scalability

03-Sep-2025
AI-generated image

Symbol image

An international team of researchers from the Berlin Institute for the Foundations of Learning and Data (BIFOLD) at TU Berlin, the University of Luxembourg and Google DeepMind has developed a new machine learning foundation model that is able to simulate molecules of all kinds with quantum mechanical accuracy. The results have now been published in the Journal of the American Chemical Society (JACS). The new method, called SO3LR, combines the latest developments in neural network design with physical laws and was trained with a specially curated data set of four million different molecular structures. As a result, this model is not only able to model complex biomolecules, such as proteins, sugar molecules or cell membranes, but can also simulate a wide variety of molecules without having to be retrained. This universally applicable model thus paves the way for accelerated drug development and a deeper understanding of molecular biology in the future.

Molecular dynamics (MD) simulations make it possible to understand and predict the behavior of molecules. They allow the description of molecular interactions over time and provide insights into their structure, dynamics and function. The exact simulation of the interaction of large biomolecules could, for example, make it possible to develop new drugs without having to carry out time-, material- and cost-intensive experiments beforehand.

Improving the accuracy and applicability of these simulations has a long tradition in computer-aided physics and chemistry. For decades, researchers have been faced with a fundamental conflict of objectives: the methods were either fast, but only approximate and not transferable to different molecules, or extremely accurate, but computationally extremely complex and expensive. This conflict of objectives has so far limited high-precision simulations to small systems with a few hundred atoms. However, large and complex biomolecules or proteins can contain many tens of thousands of atoms, which limited the ability to accurately model and understand fundamental dynamic processes such as protein folding or cell organization.

In recent years, AI-based models have begun to bridge this gap between approximate (classical) methods and highly accurate (quantum mechanical) methods. Despite great progress, two key challenges remain: the scalability of these approaches to biomolecules of realistic size and universal modeling in a single model. The biggest obstacle to the application of previous models for large and complex molecules has been the lack of consideration of quantum mechanical effects over large distances. Simply put, atoms in a molecule interact not only with their immediate neighbors, but also with distant atoms. The larger the molecule, the more important these long-range effects become. Without these long-range interactions, life as we know it would not be possible, as biomolecules would not be able to function.

The new SO3LR model overcomes these challenges and pushes the boundaries in terms of efficiency, accuracy, scalability and universality in the simulation of organic molecules. The researchers achieved this by taking a hybrid approach to the design of SO3LR: the complex task of calculating quantum mechanical interactions between atoms is split into two complementary components. A fast and highly accurate machine learning model learns the complex quantum mechanical multi-particle interactions at short and medium distances. In parallel, universal, physically based equations precisely describe the pairwise interactions over long distances.

"Reliable simulations at the biomolecular scale depend on these long-range interactions, which is why they are anchored in the design of SO3LR," explains Adil Kabylda from the University of Luxembourg, who led the project. "This allows our model to focus its strong learning capacity on capturing the complex quantum effects that traditional models have so far missed," adds Dr. Thorben Frank, postdoc at the BIFOLD Institute. The second challenge that had to be solved was the universal applicability of a model to a wide variety of molecules. To achieve this, the team created an extensive and diverse dataset of over 4 million carefully curated molecular structures, which SO3LR used to learn how to accurately describe the great diversity of molecules in nature. For the first time, this model can simulate a wide variety of large molecules - without having to be retrained in advance.

The breakthrough of the model lies in its universality

To demonstrate the capabilities of SO3LR, the research team carried out a series of sophisticated simulations for all four main types of biomolecules found in nature. For example, they simulated large proteins in an explicit water environment, including the plant crambin protein and a complex glycoprotein. They also investigated a lipid-POPC bilayer, a model system for human cell membranes.

"The key breakthrough of SO3LR lies in its universality. Instead of having to go through a lengthy process of data generation and subsequent training for each new molecule, we provide a single, directly applicable model. This saves researchers the time-consuming and computationally intensive preparation steps and allows direct testing of hypotheses with quantum mechanical accuracy," says Prof. Klaus-Robert Müller, BIFOLD Co-Director. "SO3LR represents a decisive step in this direction. By combining machine learning with physical principles, we open the door to modeling realistic biological processes with quantum precision - with profound implications for the molecular understanding of health and disease as well as for the development of the next generation of drugs," says Prof. Alexandre Tkatchenko from the University of Luxembourg, summarizing the significance of the work.

At a time when AI models are increasingly in the hands of private companies, this team of international scientists has decided to make the model and its underlying datasets openly available to the scientific community to accelerate further progress in this field.

Note: This article has been translated using a computer system without human intervention. LUMITOS offers these automatic translations to present a wider range of current news. Since this article has been translated with automatic translation, it is possible that it contains errors in vocabulary, syntax or grammar. The original article in German can be found here.

Original publication

Other news from the department science

Most read news

More news from our other portals