NVIDIA: EvolutionaryScale Unveils ESM3 Generative AI Model for Protein Design
Generative AI has revolutionized software development with prompt-based code generation, and now protein design is the next frontier. EvolutionaryScale has announced the release of its ESM3 model, the third-generation ESM model, which simultaneously reasons over the sequence, structure, and functions of proteins, providing protein discovery engineers with a programmable platform, according to the NVIDIA Blog.
The startup, which emerged from the Meta FAIR (Fundamental AI Research) unit, recently secured funding led by Lux Capital, Nat Friedman, and Daniel Gross, with investment from NVIDIA. EvolutionaryScale is at the forefront of programmable biology, assisting researchers in engineering proteins that can target cancer cells, find alternatives to harmful plastics, drive environmental mitigations, and more.
EvolutionaryScale’s ESM3 model used NVIDIA H100 Tensor Core GPUs, resulting in the most compute ever put into a biological foundation model. The 98 billion parameter ESM3 model uses roughly 25x more FLOPs and 60x more data than its predecessor, ESM2. The company has developed a database of over 2 billion protein sequences to train its AI model, offering technology applicable to drug development, disease eradication, and understanding human evolution at scale.
Accelerating In Silico Biological Research With ESM3
With significant advancements in training data, EvolutionaryScale aims to accelerate protein discovery with ESM3. The model was trained on nearly 2.8 billion protein sequences sampled from various organisms and biomes, allowing scientists to prompt the model to identify and validate new proteins with increasing accuracy.
ESM3 offers substantial updates over previous versions. The model is inherently generative and follows an “all to all” approach, meaning structure and function annotations can be provided as input rather than just as output. Once publicly available, scientists can fine-tune this base model to create purpose-built models based on their proprietary data. ESM3’s large-scale generative training across vast amounts of data offers a revolutionary tool for in silico biological research.
Driving the Next Big Breakthroughs With NVIDIA BioNeMo
ESM3 provides biologists and protein designers with a generative AI boost, improving their engineering and understanding of proteins. With simple prompts, it can generate new proteins with a provided scaffold, self-improve its protein design based on feedback, and design proteins based on the user-indicated functionality. These capabilities can be used in any combination to provide chain-of-thought protein design, akin to messaging a researcher fluent in the intricate three-dimensional meaning of every known protein sequence.
“In our internal testing, we’ve been impressed by ESM3’s ability to creatively respond to complex prompts,” said Tom Sercu, co-founder and VP of engineering at EvolutionaryScale. “It solved an extremely challenging protein design problem to create a novel Green Fluorescent Protein. We expect ESM3 to help scientists accelerate their work and open up new possibilities — we’re excited to see its contribution to future research in life sciences.”
EvolutionaryScale will be opening an API for closed beta today, with code and weights available for a small open version of ESM3 for non-commercial use. This version will soon be accessible on NVIDIA BioNeMo, a generative AI platform for drug discovery. The complete ESM3 family of models will be available to select customers as an NVIDIA NIM microservice, run-time optimized in collaboration with NVIDIA, and supported by an NVIDIA AI Enterprise software license for testing at ai.nvidia.com.
The computing power required to train these models is growing exponentially. ESM3 was trained using the Andromeda cluster, which employs NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand networking. The ESM3 model will be available on select partner platforms and NVIDIA BioNeMo.
Image source: Shutterstock