Helmholtz AI project call showcase: ProFiLe - Better prediction of protein structure and function with AI

In biological sciences, the structure of a molecule is often linked to its function. Understanding how different 3D shapes are formed could bring light to the most basic processes shaping life. But first, we must find out how each component of the molecule and their combination is contributing to the final shape.

How can AI models help unravel the relation between a molecule’s composition and its structure? Read in this week’s Helmholtz AI project showcase how researchers at the German Aerospace Center (DLR) and the Karlsruhe Institute of Technology (KIT) are working jointly on this challenge.

Could you introduce yourself, giving your affiliation, area of work, and of course, the project title?

My name is Philipp Knechtges. I am a postdoc at the HPC department of the Institute for Software Technology at the German Aerospace Center (DLR). I am a trained physicist, but have also worked on “classical” high performance computing (HPC) applications like Computational Fluid Dynamics in the past, and I am now leading a group of scientists at DLR concerned with questions around Scientific Machine Learning.

The ProFiLe project --with its question on how to solve a previously difficult/intractable scientific problem using machine learning methods -- of course fits the scope of the group perfectly. In general, with my background in physics I find scientific modelling in the intersection of classical physics-based modelling and data-based modelling quite fascinating, which is usually encompassed by not less fascinating HPC and numerical challenges. The ProFiLe project is headed by Dr. Achim Basermann from DLR, in collaboration with Alexander Schug from KIT.

In simple words, what specifically is your project about? And, how and why do you think it is a high risk, high gain endeavour?

Initially ProFiLe was about protein structure prediction. Proteins are molecules present in all living organisms, involved in many crucial processes essential to life. Simply said a protein is a chain composed of smaller building blocks, the amino acids. Much of the biological function of proteins is determined by how their chain of amino acids folds in space, this means, the three-dimensional structure of the protein. This poses the question: if I know the sequence of amino acids, can I predict the structure of the protein?  And by predicting the structure, can I also predicts its function?

Figure 1 Structure of the protein PDB 5w9f

With proteins being a crucial building block of life in general, this question is of course of great relevance, which makes any progress in that area a high gain endeavour. Very much at the beginning of the project we also had to learn the hard way that this is a high risk endevaour, with Google releasing Alpha Fold2, which essentially had incorporated a huge chunk of our ideas. On the upside, this had shown we were on the right track.

Since Google is certainly not our weight class, we had to adjust plans to tackle a similar problem but foregoing of the protein struggle: RNA structure prediction. RNA is the key group of molecules in charge of protein formation, and thus, their structure and functions are also essential for living organisms. Like proteins, RNA is composed of smaller blocks, called nucleotides, forming a chain that folds in space and relates to the molecule’s function. Only that this time, the molecule’s function is creating a certain protein. Therefore, the RNA structure prediction task is not less interesting than protein’s, and even more challenging given the fact that fewer data is available.

How important has the Helmholtz AI funding and platform been to carry out this project?

The Helmholtz AI funding has been more than essential. For once, the German Aerospace Center does (to my knowledge) usually not conduct research in computational biology. I am by no means an expert in protein/RNA folding. However, Helmholtz AI allowed me to collaborate with the domain experts at KIT on this subject (namely Alexander Schug), allowing both sides to bring their respective expertise to the table. Furthermore, in our work we kept close contact to the Helmholtz AI local units at KIT (Markus Götz) and JSC (Stefan Kesselheim). Last but not least, having such a compute-intensive project we definitely benefited from the HAICORE cluster, on which most of our simulations were conducted.

Any other comments you wish to add? 

In the vein of credit where credit is due, I want to thank my marvelous colleagues in this project: Fabrice von der Lehr, Oskar Taubert, Daniel Coquelin, Alina Bazarova. They are the unsung heroes that keep the project going against all adversities.