AI Solving Protein Structures: How Significant is the Impact?

As the day dawned on 30 November 2020, the biennial experiment competition to predict protein structures, Critical Assessment of Protein Structure Prediction (CASP), announced that Google-owned DeepMind’s latest version of their algorithm AlphaFold, has solved ‘the problem’. But what was the problem that the teams in CASP had to specifically solve? It was determining the 3D structure of a protein-based on its 1D amino acid sequence. AlphaFold2 had the most accurate prediction in the competition involving around 100 teams. This outcome has a widespread real-world impact.

A protein’s ability to carry out its function depends largely on its 3D structure. But why is knowing protein structures so important? Because it helps us understand how they function, what goes wrong when they don’t form the right structure and what medicines can be used to manipulate the function of that protein. Methods like cryo-EM, X-ray crystallography, NMR etc. already exist to aid researchers answer this problem but major drawbacks plague them. They are time-consuming and expensive. For example, estimates of human proteins range from 60000 to 400000 and the total number of existing proteins are estimated to be over 6 million but the largest protein database, PDB, contains just 118000 proteins.

Is the problem of predicting protein structures really solved then? The answer is much more complex than just a simple yes or no. The models predicted by AlphaFold2 had a median score of 92.4 Gross Distance Test (GDT) with an average error of 1.6 Angstroms which is comparable to the size of an atom. GDT reflects the similarity of the predicted structure to an experimentally determined protein structure and a score of more than 90 implies that the predicted structure is close enough to that determined experimentally. The performance of the algorithm went down a bit to 87 GDT when modeling the most difficult proteins but despite that, it was better than the next best team’s prediction method by 25 points. Additionally, AlphaFold2 performed way better than its previous version in the last CASP held in 2018.

Undoubtedly, this is a breakthrough achievement. But the claim of having ‘solved’ the problem should be taken with a pinch of salt. Protein function relies on very minute details like working on a sub-angstrom scale. To give an idea about the scale, a human hair is 500000 angstroms on average. The mean error of the model is 1.6 angstroms which makes it probable that the predicted structures can’t be used to predict the function reliably. Another assumption we tend to make is that protein structure is a ‘single’ entity while it is quite the opposite. Proteins are flexible molecules and they tend to change structures based on temperatures. While alphafold2 does predict protein structures with a high level of accuracy but it doesn’t take into account the other factors which might influence the structure and hence affect the protein function. This might explain why it could predict around two-thirds of the structures with a high accuracy score of 90 GDT.

To sum up, the progress made by AlphaFold2 is stupendous but far from ‘solving’ the problem. One of the reasons for that is the ‘problem statement’ itself is not well-defined and there is a subtlety in the way one defines protein-folding. One of the key takeaways which many structural biologists will attest to is that many of the predicted structures can’t be used as molecular replacements for the experimentally determined structures Moreover, structural biologists are not going to be replaced by AI-driven modelers yet, but this breakthrough will certainly aid them in finding missing building blocks for large protein complexes and speeding up the process of determining protein structures.

Image credit: Pixabay
Editor: Sayantan Chakraborty

Visit our “News & Views” section for latest developments in AI-powered digital healthcare.

Leave a Reply