This AI Software Nearly Predicted Omicron’s Tricky Structure

New algorithms that decipher complex sequences of amino acids offered an early view of the coronavirus variant. They could point the way to future drugs.
3D rendering of Omicron coronavirus variant on black background
Illustration: Uma Shankar sharma/Getty Images

On November 26, the World Health Organization designated the strain of coronavirus surging in South Africa a “variant of concern” and christened it Omicron. The next day, University of British Columbia professor Sriram Subramaniam downloaded a genome sequence posted online and ordered samples of Omicron genes to be shipped to his lab.

Subramaniam’s group uses electron microscopes to reveal the 3D structure of proteins, to better understand how they work. It had already mapped the spike proteins that coronaviruses use to bind and enter human cells for some earlier strains. Describing Omicron’s spike protein felt urgent because its genome differed in ways that might explain the variant’s rapid spread. But like others doing online shopping that weekend, Subramaniam had to be patient: Until the samples arrived in the mail, he couldn’t put Omicron proteins under the microscope.

Across the continent, University of North Carolina at Charlotte computational genomics researcher Colby Ford had also been thinking about Omicron’s spike protein. Relatives had been asking him a question also troubling many experts: Would Omicron evade existing vaccines? Those vaccines teach the body to respond to spike proteins from an earlier strain. Instead of ordering lab supplies, Ford tried a recently invented shortcut. On the same day WHO christened Omicron, he used free artificial intelligence software to try and predict the structure from the sequence of amino acids encoded in Omicron’s genome.

In about an hour, Ford got his first results, and quickly posted them online. Early in December, he and two colleagues posted a fuller paper, now accepted for publication, including predictions that some antibodies to previous strains would be less effective against Omicron.

The atomic structure of the Omicron variant spike protein (purple) bound with the human ACE2 receptor (blue).

Courtesy of Dr. Sriram Subramaniam/The University of British Columbia

Subramaniam’s lab received its Omicron gene samples soon after and published its microscope observations of the structure along with results from tests of real antibodies on December 21. One of Ford’s two predicted structures proved to be pretty much right: He calculated that the positions of its central atoms differ by around half an angstrom, about the radius of a hydrogen atom. “These tools allow you to make an educated guess really quickly—which is important in a situation like Covid,” Ford says. “With any new virus that comes along, someone else will replicate what I did here.”

The way predictions raced ahead of experiments on Omicron’s spike protein reflects a recent sea change in molecular biology brought about by AI. The first software capable of accurately predicting protein structures became widely available only months before Omicron appeared, thanks to competing research teams at Alphabet’s UK-based AI lab DeepMind and at the University of Washington.

Ford used both packages, but because neither was designed or validated for predicting small changes caused by mutations like those of Omicron, his results were more suggestive than definitive. Some researchers treated them with suspicion. But the fact that he could easily experiment with powerful protein prediction AI illustrates how the recent breakthroughs are already changing the ways biologists work and think.

Subramaniam says he received four or five emails from people proffering predicted Omicron spike structures while working towards his lab’s results. “Quite a few did this just for fun,” he says. Direct measurements of protein structure will remain the ultimate yardstick, Subramaniam says, but he expects AI predictions to become increasingly central to research—including on future disease outbreaks. “It’s transformative,” he says.

Because a protein’s shape determines how it behaves, knowing its structure can help all kinds of biology research, from studies of evolution to work on disease. In drug research, figuring out a protein structure can help reveal potential targets for new treatments.

Determining a protein’s structure is far from simple. They are complex molecules assembled from instructions encoded in an organism’s genome to serve as enzymes, antibodies, and much of the other machinery of life. Proteins are made from strings of molecules called amino acids that can fold into complex shapes that behave in different ways.

Deciphering a protein’s structure traditionally involved painstaking lab work. Most of the roughly 200,000 known structures were mapped using a tricky process in which proteins are formed into a crystal and bombarded with x-rays. Newer techniques like the electron microscopy used by Subramaniam can be faster, but the process is still far from easy.

In late 2020, the long-standing hope that computers could predict protein structure from an amino acid sequence suddenly became real, after decades of slow progress. DeepMind software called AlphaFold proved so accurate in a contest for protein prediction that the challenge’s cofounder John Moult, a professor at University of Maryland, declared the problem solved. “Having worked personally on this problem for so long,” Moult said, DeepMind’s achievement was “a very special moment.”

The moment was also frustrating for some scientists: DeepMind did not immediately release details of how AlphaFold worked. “You’re in this weird situation where there’s been this major advance in your field, but you can’t build on it,” David Baker, whose lab at University of Washington works on protein structure prediction, told WIRED last year. His research group used clues dropped by DeepMind to guide the design of open source software called RoseTTAFold, released in June, which was similar to but not as powerful as AlphaFold. Both are based on machine learning algorithms honed to predict protein structures by training on a collection of more than 100,000 known structures. The next month, DeepMind published details of its own work and released AlphaFold for anyone to use. Suddenly, the world had two ways to predict protein structures.

Minkyung Baek, a postdoctoral researcher in Baker’s lab who led work on RoseTTAFold, says she has been surprised by how quickly protein structure predictions have become standard in biology research. Google Scholar reports that UW's and DeepMind’s papers on their software have together been cited by more than 1,200 academic articles in the short time since they appeared.

Although predictions haven’t proven crucial to work on Covid-19, she believes they will become increasingly important to the response to future diseases. Pandemic-quashing answers won’t spring fully formed from algorithms, but predicted structures can help scientists strategize. “A predicted structure can help you put your experimental effort into the most important problems,” Baek says. She’s now trying to get RoseTTAFold to accurately predict the structure of antibodies and invading proteins when bound together, which would make the software more useful to infectious disease projects.

Despite their impressive performance, protein predictors don’t reveal everything about a molecule. They spit out a single static structure for a protein, and don’t capture the flexes and wiggles that take place when it interacts with other molecules. The algorithms were trained on databases of known structures, which are more reflective of those easiest to map experimentally rather than the full diversity of nature. Kresten Lindorff-Larsen, a professor at the University of Copenhagen, predicts the algorithms will be used more frequently and will be useful, but says, “We also as a field need to learn better when these methods fail.”

In addition to a spike protein structure, Subramaniam’s Omicron paper also included results of a kind AI hasn’t yet conquered—a combined structure for a spike bound to the human protein it targets. The results suggested the variant’s structural changes allow it to bind host cells more strongly while also being less vulnerable to antibodies from previous strains, a combination that appears to explain why Omicron can overrun even highly vaccinated communities.

“The gold standard will always be direct measurement,” says Subramaniam. “If you’re building a billion-dollar drug program, people want to know what’s the real thing.” At the same time, he says his experimental work is now often informed by AI predictions. “It’s changed the way we think,” Subramaniam says.

Updated, 1-13-21, 2:15pm ET: An earlier version of this article incorrectly referred to samples of Omicron DNA.

More Great WIRED Stories