Proteins are the molecules that get the job done in nature, and an entire industry is springing up around successfully modifying and manufacturing them for various uses. But doing it is time consuming and messy; Cradle aims to change that with an AI-powered tool that tells scientists what new structures and sequences will make a protein do what they want. The company came out of stealth today with a substantial seed round.
AI and proteins have been in the news lately, but largely due to the efforts of research teams like DeepMind and Baker Lab. Their machine learning models take easily collected RNA sequence data and predict the structure a protein will adopt. protein, a step that used to take weeks and expensive special equipment.
But as incredible as that capability is in some domains, it’s just the starting point for others. Modifying a protein so that it is more stable or binds to another particular molecule involves much more than just understanding its general shape and size.
“If you’re a protein engineer and you want to engineer a certain property or function into a protein, just knowing what it looks like doesn’t help you. It’s like if you have a picture of a bridge, that doesn’t tell you if it will fall down or not,” explained Cradle CEO and co-founder Stef van Grieken.
“Alphafold takes a sequence and predicts what the protein will look like,” he continued. “We’re the generative brother of that: you choose the properties you want to engineer, and the model will generate sequences that you can test in your lab.”
Predict what proteins will do, especially those new to science. in the place It is a difficult task for many reasons, but in the context of machine learning, the biggest problem is that there is not enough data available. So Cradle sourced much of his own dataset in a wet lab, testing protein after protein and seeing what changes in his own sequences seemed to lead to what effects.
Interestingly, the model itself is not exactly specific to biotech, but rather a derivative of the same “big language models” that have produced text output engines like GPT-3. Van Grieken noted that these models are not strictly limited to language in how they understand and predict data, an interesting “generalization” feature that researchers are still exploring.
The protein sequences that Cradle ingests and predicts are not in any language that we know of, of course, but are relatively simple linear sequences of text that have associated meanings. “It’s like an alien programming language,” van Grieken said.
Protein engineers aren’t helpless, of course, but their work necessarily involves a lot of guesswork. One can know for sure that among the 100 sequences they are modifying is the combination that will produce
The model works in three basic layers, he explained. First, he assesses whether a given sequence is “natural,” that is, whether it is a significant amino acid sequence or just random. This is similar to a language model capable of saying with 99 percent confidence that a sentence is in English (or Swedish, in van Grieken’s example) and that the words are in the correct order. It knows this by “reading” millions of such sequences determined by laboratory analysis.
Next, it discusses the actual or potential meaning in the foreign language of the protein. “Imagine we give you a sequence, and this is the temperature at which this sequence will fall apart,” she said. “If you do that for many sequences, you can say not just, ‘this looks natural,’ but ‘this looks like 26 degrees Celsius.’ that helps the model determine which regions of the protein to focus on.”
The model can then suggest sequences to fit: informed guesses, essentially, but a starting point stronger than zero. And the engineer or the lab can test it and bring that data back to the Cradle platform, where it can be ingested again and used to tune the model for the situation.
Protein modification for various purposes is useful in biotechnology, from drug design to biomanufacturing, and the path from the vanilla molecule to the effective and efficient personalized molecule can be long and expensive. Any way to shorten it will probably be welcomed, at least, by lab technicians who have to run hundreds of experiments just to get a good result.
Cradle has been operating stealthily, and is now emerging after raising $5.5 million in a seed round co-led by Index Ventures and Kindred Capital, with participation from angels John Zimmer, Feike Sijbesma and Emily Leproust.
Van Grieken said the funding would allow the team to expand data collection (the more the merrier when it comes to machine learning) and work on the product to be “more self-service.”
“Our goal is to reduce the cost and time of bringing a biobased product to market by an order of magnitude,” van Grieken said in the press release, “so that anyone, even ‘two kids in their garage’ , can bring to market a bio-based product”.