
AI decodes unknown proteins, paving way for new cancer therapies
What's the story
Artificial Intelligence (AI) is advancing rapidly in the field of fundamental biology, especially in protein analysis.
Two novel AI tools, InstaNovo and InstaNovo+, have been created by scientists to detect proteins that are often missed by existing detection methods.
The study detailing this work was published recently in Nature Machine Intelligence.
These tools could pave the way for better cancer therapies, understanding diseases, and unexplained animal abilities.
Protein significance
Proteins are the final product of an organism's DNA
Proteins are the final product of an organism's DNA blueprint, dictating what cells actually produce and what they do.
However, deviations from this genetic blueprint are common, either due to post-production alterations or errors in the process.
These unexpected proteins, commonly called "hidden" proteins, have long posed challenges for scientists when it comes to identification and analysis.
Innovative solutions
InstaNovo and InstaNovo+: AI tools for protein research
InstaNovo and InstaNovo+ are revolutionary AI models aimed at decoding the genetic identity of unstudied proteins in bulk.
Benjamin Neely, a chemist at the National Institute of Standards and Technology in Gaithersburg, US, called the tools a major step toward "the holy grail" of protein research.
He highlighted their potential power in uncovering hidden elements of biological samples that are often missed by conventional methods.
Tool functionality
InstaNovo and InstaNovo+ are designed to identify hidden proteins
InstaNovo mimics OpenAI's GPT-4 transformer model. It is trained to read the peaks and valleys of a protein's "fingerprint," plotted through mass spectroscopy.
InstaNovo+ works more like an AI image generator. It progressively removes noise from the initial data to create a clear picture of the protein.
Essentially, InstaNovo translates this fingerprint into a string of likely amino acids, which can be used to reconstruct and identify hidden proteins.
Performance comparison
AI tools outperform traditional methods in complex trials
The new AI models have shown promise in various tests, particularly excelling in complex trials such as sequencing human immune proteins.
These proteins are notoriously difficult to analyze with standard methods due to their small size and unique amino acid composition.
The researchers found that InstaNovo identifies about three times more candidate protein segments than traditional database searching, while InstaNovo+ identifies about six times more.
Potential applications
AI tools could help in understanding cancer-related protein functions
Amanda Smythers, a chemist at Dana-Farber Cancer Institute in Boston, was interested in using these AI models for protein analysis.
She imagines using the tools to investigate questions such as why pancreatic cancer often results in rapid muscle wasting and fatigue.
The AI models could potentially flag proteins produced by cancer cells or disruptions in normal protein function in non-cancer cells that may cause these issues.
Model limitations
AI models have limitations that need to be addressed
Despite their potential, the new AI models are not without limitations.
The authors of the study estimate a false positive rate of around 5%, suggesting that the outputs from these AI tools require additional verification.
William Noble, a computer scientist and proteomics researcher at the University of Washington in Seattle, highlighted that determining how best to evaluate these AI tools remains an open question.