Decode and Design the Structure of Life

The world's most general-purpose atomistic foundation model, unifying structure prediction and de novo generation for the atoms of life

The 2024 Nobel Prize in Chemistry marked a historic achievement: a once intractable problem in biology has now largely been solved through AI. This problem was once the “Holy Grail” of structural biology: predicting a protein structure from its primary amino acid sequence. Yet, in drug design, knowing a protein's structure is just the first step in an effort that often takes over a decade. Solving monomer folding is merely a stepping stone to the ultimate goal: making biology programmable - opening endless possibilities in medicine and the many life science industries beyond.

Achieving this goal will require stepping into a new frontier to develop tools that not only predict the structures of life, but which also design new molecules in a programmable manner, integrated with as much experimental data as can be harnessed. Furthermore, we must strive to increase the rate of experimental data collection, increasing the fidelity and applicability of these tools.

Today, we introduce VantAI's first foundational model: Neo-1. Neo-1 unifies structure prediction and molecular design at an atomic level, allowing prompting with multimodal and fine-grained structural information both for individual molecules and their interactions. In addition to designing biomolecules, this programmability allows Neo-1 to accelerate the collection of structural data when combined with our cross-linking mass spectrometry (XLMS) platform, NeoLink.

Neo-1 integrates state-of-the-art structure prediction and all-atom molecular generation into a single, unified model—to our knowledge, the world's first model to decode and design the structure of life. We're excited to give a glimpse today into how we're combining the capabilities of Neo-1 and NeoLink, bringing us one step closer to transforming biology into an engineering discipline, where we can engineer the biological circuits of nature.

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig1_Neo1StepChange/fig1_neo1stepchange.svg

Fig. 1: Capabilities of structural biology foundation models over the generations.

Introducing Neo

Fig. 2: Neo-1 de novo designing and co-folding an all-atom molecular glue structure simultaneously from sequence alone, a feat not possible with previous models.

VantAI was started with a grand vision: making protein interactions programmable. The current approach to designing artificial proteins as therapies is, in a sense, reinventing the wheel. Billions of years of evolution have endowed our cells with proteins which fulfill every function needed for survival. What if, instead, we could leverage the diversity of proteins and their many functions already present in our cells, and precisely direct them against diseases?

Many protein therapeutics (e.g. antibodies) are currently only able to reach extracellular targets or a limited set of tissues, often triggering unwanted immune responses. By contrast, reprogramming proteins already present within cells—redirecting them toward new targets using small molecules, peptides, or macrocycles—enables the treatment of a wide array of diseases currently considered untreatable, opening a new chapter in medicine, as evidenced by the dozens of ongoing clinical trials in protein degradation.

Naturally occurring proteins reprogrammed to have new functions are known as “Neoproteins.” They are at the center of a new therapeutic modality called Proximity Modulation (ProMod) that VantAI is pioneering.

However, existing approaches prevalent in small molecule drug discovery, where the structure of a protein target is first determined experimentally, or computationally via folding methods, and then a molecule is designed against that structure, do not apply here. This is because the complex often only exists (stably) in the presence of the molecule and hence its structure can often not be determined correctly without the molecule. For this task in particular, simultaneous co-folding and generation is required, but impossible with current methods.

For the first time, designing Neoproteins with a unified model that simultaneously folds the complex structure and designs a small molecule is now achievable. And hence we call our foundation model series “Neo”.

The technical leap required to achieve this unlocked not just ProMod design, but design of all molecular modalities in the process. We trained Neo-1 to not only decode and design ProMods, but to decode and design the structure of any molecule, including proteins, small molecules, and more. We are excited to work together with partners in academia and industry to leverage these capabilities.

Neo-1 Foundation Model Capabilities
Molecular Generation
Molecular Generation
Designing novel therapeutics
Protein Design
Protein Design
Creating the molecules of life
Structure Prediction
Structure Prediction
Revealing molecular architecture
Inpainting
Inpainting
Generating improved molecules
And many more...
A Next-Generation Model Unifying Structure Prediction and De Novo Generation

Current all-atom structure prediction methods share a common blueprint, replicating or closely following the ideas introduced in AlphaFold2 in 2020: they predict 3D atomic coordinates directly, provided an input sequence.

In parallel, there has recently been a surge in task-specific models developed for purposes such as protein backbone generation or designing small molecules targeting predefined binding pockets. These approaches typically alternate between separate molecule generation and structure prediction stages, each utilizing specialized models.

Such iterative workflows can inadvertently accumulate errors and diminish controllability, primarily due to the absence of comprehensive, all-atom structural understanding. Critically, this fragmented approach restricts the design of molecules capable of inducing large changes in protein conformations or complexes—a key requirement of rational ProMod design—where simultaneous, integrated structure prediction and molecular generation is essential.

Neo-1, for the first time, enables such integration between prediction and generation, ushering in the next chapter of atom-level foundation models. We achieve this by moving the diffusion process from the conventional coordinate space to the latent space, enabling the model to reason over a smoother landscape of both sequence and structure. This shift has enabled Neo-1 to generate completely novel molecules, including proteins, peptides, and small molecules, at all-atom resolution, while simultaneously predicting their structures with state-of-the-art accuracy.

Fig.3: All-atom de novo designed molecules by Neo-1 across different molecular types. Designed parts shown in green. Select prompting highlighted in pink.

A Step-Change in Generality and Programmability

Neo-1 unifies a plethora of tasks traditionally tackled with separate specialist models: all-atom co-folding, docking, inverse folding, all-atom protein design, small molecule design, motif scaffolding, R-group enumeration, fragment linking, among others—all within a single model.

At its core, Neo-1 uses a learned, unified latent representation of different biomolecules that compresses and abstracts information from various input modalities. This learned latent representation can be decoded into complete molecules, including small molecules, lipids, proteins, and DNA/RNA, along with their atom types and coordinates. By varying the input from which the model has to construct this latent representation, the model can perform any task from entirely de novo protein-ligand complex generation to inpainting of small molecular fragments in an otherwise provided structure, and we train Neo-1 on a mixture of these tasks. For example, providing sequence-only inputs turns the prediction into a folding task, providing partial structure conditioning turns the prediction into an inpainting task, and prompting the model to generate a small molecule given protein sequence(s) simultaneously designs the small molecule and co-folds the complex structure. We also include auxiliary conditioning, such as molecule type, binding site, distance restraints, and molecular properties, to further increase programmability. This means at inference time, Neo-1 can be prompted with desired sequence, structure and/or property information.

Fig. 4 shows a limited selection of Neo-1 generated small molecules to illustrate just a few of the precise and diverse structural and non-structural prompts Neo-1 can leverage to generate novel small molecules and ProMods. They include 1) pocket-specific co-folding and small molecule generation if provided with a sequence and four residues to indicate the binding site, 2) R-group enumeration if provided with a known structure and molecular scaffold, 3) molecular glue design and complex co-folding if prompted with two protein sequences and 4) expanding a molecular glue for a specific protein-protein interface when provided with a binder scaffold and protein structures.

Fig. 4: Neo-1 can de novo design biomolecules with fine-grained and diverse structural prompting. Prompts to generate structures are shown on the left. De novo generated small molecules are shown in the middle, with the generated atoms highlighted in green. Known reference binders are shown on the right with the re-designed elements highlighted in gray.

Fig. 5 highlights the same versatility for proteins, including 1) binder design when provided with a desired target protein structure, 2) antibody VH loop design against a specific epitope when provided with distance restraints, a desired target structure, and partial VH structure, 3) DNA/RNA binder design when provided with a protein scaffold structure and a desired oligonucleotide binder structure, 4) design of peptide non-canonical amino acids when provided a partial peptide and target structure.

Fig. 5: Neo can de novo design biomolecules with fine-grained and diverse structural prompting. Prompts to generate structures with desired features are shown on the left. Examples of Neo-1 de novo generated proteins, loops, and peptides with non-canonical amino acids are shown in the middle, with generated atoms highlighted in green. Known reference molecules are shown on the right, with the re-designed elements highlighted in gray.

Neo-1 Generates Diverse and Desirable Molecules

Neo-1 generates valid and structurally diverse proteins and small molecules with desirable properties. This demonstrates it has accurately learned the underlying data distributions, as seen in Fig. 6 & 7.

Fig. 6 illustrates property distributions for small molecules generated by Neo-1 without prompting into specific directions of molecular space, highlighting the inherent versatility and robustness of the model. Neo-1 consistently yields diverse and chemically valid molecules exhibiting drug-like properties, with atom type distributions closely matching those found in known drug-like molecules (Fig. 6C). Ligands produced by Neo-1 have both similar and different shapes to reference compounds (Fig. 6D), demonstrating utility to explore both validated and novel interactions for drug discovery.

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig6_GenerationDistributions/fig6b_generation_distribution_smallmols.svg

Fig. 6: A & B: For around 6000 samples generated against 42 diverse protein sequences, the distribution of QED (Quantitative Estimate of Druglikeness) and SAS (Synthetic Accessibility Score) produced by Neo-1 is similar to drug-like molecules in the PDB. C: Neo-1 produces similar atom type distribution to PDB. D: A high-druglikeness sample of around 800 Neo-1 generated molecules shows both similar and different shapes compared to co-crystallized reference compounds. Right panel: Neo-1 generated molecules.

Fig. 7 illustrates an analogous case for protein generation, focusing on secondary structure. The structure and sequence of 610 proteins were jointly generated, sampling lengths approximately evenly from 51 to 392 amino acids. Unlike early protein design models, Neo-1 does not exhibit a bias towards helicity, accurately matching the secondary structure distribution found in the PDB (Fig. 7A & B). Amino acid distributions are matched equally well (Fig. 7C), and Neo-1 generates a mixture of known and novel structures as seen by the TM-score (Fig. 7D).

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig6_GenerationDistributions/fig6a_generation_distribution_proteins.svg

Fig. 7: A & B: Generated proteins closely match distribution of secondary structures as proteins in the training dataset. C: Generated proteins contain approximately the same distribution of amino acids as the training dataset, with a slight overrepresentation of alanine. D: Neo-1 can generate proteins significantly different from PDB. Right panel: Neo-1 generated protein samples.

Steerable Molecular Generation Unlocks Precision Generation

Neo-1 simultaneously generates predictions in a "coarse-to-fine" manner, offering distinct advantages over autoregressive models commonly used in protein and small molecule design. Autoregressive models sequentially generate atoms without the flexibility to adjust previously generated portions based on later additions. In contrast, Neo-1 enables steering of molecule generation towards any objective by applying intermediate rewards across the entire molecular structure. Furthermore, unlike diffusion-based guidance methods, Neo-1's inference-time steering accommodates complex multi-property optimization—including properties that are non-differentiable— without requiring retraining or external classifiers. Fig. 8 illustrates Neo-1's steering capability on a simple example by demonstrating the generation of more rigid molecules, an essential process in lead optimization, particularly for ProMods.

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig7_PropertyGuidedSteering/fig7_property_guided_steering.svg

Fig. 8: Steered design of a muscarinic M1 receptor binder, based on input sequence (6ZFZ). Design process aims to reduce number of rotatable bonds, yielding more rigid molecular scaffolds. Top right plot compares distribution of the steered property (rotatable bonds) with and without steering. Examples of Neo-1 generation with and without steering are shown on the left (with number of rotatable bonds annotated), while the bottom right illustrates an example of the entire steered complex generated by Neo-1.

Highly Accurate Structure Prediction

When prompted with complete sequence information but no structure, Neo-1 serves as a structure prediction model.

Neo-1 was trained using structural data and clusters defined in our previously presented datasets, PINDER and PLINDER, which were created through a collaboration with NVIDIA, MIT, and the University of Basel. We also included curated, real and synthetic datasets covering monomers, protein-protein, and protein-ligand complexes. Training and evaluating Neo-1 was made possible by computational accelerations provided by GPU-mmseqs2, a tool developed by NVIDIA and SNU, inference will be powered via NVIDIA's recently announced MSA-Search NIM.

We compared Neo-1's structure prediction capabilities against Boltz-1, a validated open-source reproduction of the current state-of-the-art AlphaFold 3 model. We use identical time cutoffs for train and test splits and leverage the test set results directly provided by Boltz-1 where possible. We split the evaluation across three drug-discovery relevant scenarios to prevent biases due to varying sample sizes: Protein-Protein interactions (PPIs, i.e., multiple protein chains with no small molecules), Protein-Ligand interactions (PLIs, i.e., at least one small molecule with more than five heavy atoms, excl. oligosaccharides and covalently bound ligands), and Monomers (i.e., single protein chains with no small molecules).

Overall, Neo-1 achieves performance comparable to Boltz-1, with strengths and limitations reflecting its training distribution. It notably excels in the prediction of binding-pockets (success defined by <5.0 Å RMSD accuracy), protein-protein complexes, and, in particular, ProMod-induced complexes, as demonstrated in Fig. 9 & 10. Neo-1's slightly lower accuracy on monomers reflects their reduced representation in training, consistent with the model's primary focus on complex structures relevant to drug discovery involving ligands or binder proteins.

TYPEMETRICNEOBOLTZ
PPI (PROTEIN-PROTEIN INTERFACES)
PPI Success Rate (DockQ > 0.23) (⭡)0.680.69
I-RMSD (⭣)5.556.51
L-RMSD (⭣)10.6015.42
PLI (PROTEIN-LIGAND INTERFACES)
PLI-LDDT (⭡)0.490.48
PLI Success Rate (< 2.0 Å RMSD) (⭡)0.330.36
PLI Success Rate (< 5.0 Å RMSD) (⭡)0.650.50
MONOMER
BB-LDDT (⭡)0.810.92
TM-Score (⭡)0.870.93

Fig. 9: Structure prediction performance of Neo-1 and Boltz-1. Only systems with predictions from both Neo-1 and Boltz-1 are included (excluding systems with oligonucleotides), resulting in 191 PPIs, 30 PLIs and 163 monomers. Boltz-1 predictions sourced from official GitHub. Both methods use matched inputs (MSA, sequences, SMILES) and do not incorporate protein templates or binding-site conditioning. Mean oracle metrics reported. I-RMSD/L-RMSD: Interface/Ligand root mean square deviation.

As illustrated in Fig. 10, Neo-1 demonstrates exceptional performance in challenging prediction scenarios highly relevant to real-world drug discovery programs, such as ternary complexes, antibody-antigen interactions, and protein-peptide complexes.

Fig. 10: Structure prediction capabilities of Neo-1 for biomolecular complexes outside the training set. Ground truth structures are shown in transparent shades, predicted structures in blue (proteins) resp. green (small molecule/peptide). First panel: M1-StaR-T4L in complex with GSK1034702—muscarinic M1 receptor agonist for Alzheimer's disease (6ZG9). Second panel: Molecular glue ternary complex: HIV-1 protease in complex with a novel non-peptidic inhibitor (7WBS). Third panel: Protein-peptide & antibody complexes: Fab Fragment of Monoclonal Antibody LNKB-2 complexed with Antigenic Nonapeptide from Human Interleukin-2 (7YZJ).

Proteolysis-targeting Chimeras (PROTACs), a class of ProMods, are two small molecules connected by a linker designed to bind two proteins together in the cell. They are an ideal and difficult testing ground for all-atom folding models. Their protein interface is difficult to predict as they have no or limited co-evolution, a critical input for current folding models. Additionally, the protein interaction is often transient and highly mobile as long linkers allow for proteins with limited compatibility and interactions to be brought together, unlike molecular glues which often have less transient and more substantial protein interactions.

As seen in Fig. 11, Neo-1 excels in PROTAC-complex prediction, significantly outperforming Boltz-1, across 19 PROTAC structures released after the training time cutoff. As shown in the next section and unique to Neo-1, its performance can be even further improved by leveraging structural information that's available for PROTACs due to their chemical composition, leading to highly accurate predictions not possible with other models.

Fig. 11: Performance of Neo-1 and Boltz-1 on 19 PROTAC ternary complex structures. Neo-1 can recover the correct binding interface in twelve structures, while Boltz-1 correctly predicts nine. Neo-1 correctly predicts the PROTAC conformation and binding mode to sub-2 Å accuracy in 7 cases, despite the long, flexible linkers used to induce the non-native interface.

Broad Structural Programmability Beyond NeoLink

Beyond NeoLink-derived distance restraints and protein-ligand distance restraints, highlighted earlier for molecular generation, Neo-1 has also been trained to be conditioned on available monomer and/or ligand structures. In these scenarios, Neo-1 effectively functions as a docking method. As seen in Fig. 17, providing known monomer structures significantly boosts accuracy.

TYPEMETRICDOCKINGCO-FOLDING
PPI (PROTEIN-PROTEIN INTERFACES)
PPI Success Rate (DockQ > 0.23) (⭡)0.820.68
I-RMSD (⭣)2.295.55
L-RMSD (⭣)6.3410.60
PLI (PROTEIN-LIGAND INTERFACES)
PLI-LDDT (⭡)0.610.49
PLI Success Rate (< 2.0 Å RMSD) (⭡)0.550.33
PLI Success Rate (< 5.0 Å RMSD) (⭡)0.730.65

Fig. 17: When available, providing monomer or ligand structural information significantly improves the structure prediction performance of Neo-1. Compared to sequence-only inputs, this conditioning enables a docking-based approach that achieves higher accuracy across all metrics on PLI and PPI systems in the test set (N=191 for PPI, N=30 for PLI). Mean oracle metrics reported.

How a Fully Programmable Model Unlocks Drug Discovery

Neo-1's broad programmability enables many steps of traditional protein and small molecule design in a single model. In typical optimization campaigns, more and more information becomes available as a program progresses. Neo-1, unlike specialist models, can be prompted with a broad range of datapoints spanning sequence and structure, progressively increasing its utility as additional information becomes available. This unlocks what we believe to be the future of AI-enabled drug discovery—seamless iterative interaction between three interlinked contributors: 1) drug designers, 2) experimental data and 3) AI tools as copilots.

Despite this significant technological leap, AI models such as Neo-1 still have many limitations. However, these limitations can be effectively addressed through precise control, leveraging the decades of domain knowledge from drug hunters and experimental evidence.

To showcase Neo-1's unique features highlighted above, two examples are presented below. These case studies on molecular glues/inhibitors and on antibodies show how Neo-1 manages to rediscover known molecules or molecules with similar properties through step-by-step optimization typical in discovery campaigns.

Case Study 1: End-to-End Glue and Small Molecule Discovery
https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig13_CaseStudy1_AbDesign/fig13_case_study_01.svg

Fig. 18: Neo-1 unlocks end-to-end rational glue discovery. Schematic illustrating how Neo-1 can accelerate multiple stages of the drug discovery pipeline through its unique programmability. Neo-1 inputs for each example are shown at the top, while a representative Neo-1 generated molecule is highlighted in green.

Once a disease target is identified, the initial step in both small molecule and ProMod discovery is finding a molecule that engages the target (or effector and target in the ProMod case). Especially if targets are not structurally well-characterized or change conformation upon binding, being able to simultaneously generate protein structures and small molecules is critical. This is particularly important for molecular glues, which are small molecules inducing protein-protein interfaces that are often unstable without the molecule. In such a case the structure of the complex is undefined in the absence of the glue. Neo-1 is the first model able to generate small molecules and molecular glues in re-folded structures from sequence alone.

As shown for the target CDK2 in Fig. 19, Neo-1 is able to de novo generate active site inhibitors directly from protein sequence alone. Many validated active site inhibitors (e.g., roscovitine bound to CDK2, PDB ID 3DDQ) make hydrogen bonding interactions with the hinge region, thus, newly generated molecules can be filtered for this pattern. Generated molecules both “rediscover” the core of known binders, but also show different scaffolds that maintain hinge-binding interactions crucial for kinase inhibitors. Neo-1's strong all-atom structure understanding is critical to designing such precise interactions.

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig13_CaseStudy1_AbDesign/fig13_case_study_02.svg

Fig. 19: When prompted with the CDK2 sequence, Neo-1 generates a variety of small molecules that capture the key hinge binding interactions observed in known binders. Neo-1 co-folded and generated protein-ligand complex on left, with Neo-1 designs, filtered to recover the hydrogen bonding pattern, shown in green. Roscovitine bound to CDK2 structure (3DDQ) shown as a reference in white.

If instead prompted with CDK12 and DDB1 sequences, Neo-1 is able to de novo design molecular glues that stabilize their interface. The generated set represents a diverse set of molecules with varying drug-likeness, remarkably able to simultaneously reconstruct ternary complex features observed in the previously observed reference structure (PDB ID 8BUG) with high accuracy (Fig. 20).

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig13_CaseStudy1_AbDesign/fig13_case_study_03.svg

Fig. 20: Neo-1 produces diverse molecules when prompted with only sequences of CDK12 and DDB1. Neo-1 co-folded and designed complex on the left, crystallized reference complex showing strong alignment on right. Generated molecules shown in green. Additional designed molecules shown in middle. The generated structures match the PPI and PLI of the previously reported structure (8BUG), however, Neo-1 generates a variety of novel molecular glue structures.

In typical drug discovery workflows, after experimental validation of initial hits, molecules are optimized for desirable properties that enable reaching desired tissues, stability, half-life, and many other objectives. QED (Quantitative Estimate of Druglikeness) is often used as a composite score to quantify these properties. Fig. 21 shows how steering de novo generation of molecules towards higher QED scores given only the previous DDB1/CDK12 sequences produces diverse, but now increasingly drug-like molecules. This example highlights Neo-1's capability not only in initial hit discovery but critically in lead optimization, enabling rapid molecular refinement—a step that traditionally consumes multiple years in conventional drug discovery pipelines.

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig13_CaseStudy1_AbDesign/fig13_case_study_04.svg

Fig. 21: Steering generation for higher QED score yields additional starting points with improved molecular properties. Left: Neo-1 generations without steering. Center: shift in distribution of molecular property through steering. Right: molecules generated with steering. The inputs for generation were identical to Fig. 20 except for an additional goal to sample trajectories leading to higher QED values.

Once optimal starting points have been found and optimized, often "lead series'" of molecules are identified. There, a common scaffold that has been validated to be critical for binding is held constant while other parts of the molecule are changed to optimize different properties. Neo-1 can be prompted with the structure and/or sequence of such scaffolds and the target to further expand on the lead molecules. In order to accommodate for the compound, part of the protein complex may also be generated ad hoc by the model, only providing the structure that is known to bind the key recognition motif and co-folding the rest while generating new molecules (Fig. 22).

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig13_CaseStudy1_AbDesign/fig13_case_study_05.svg

Fig. 22: Prompted with CDK12 structure, its conserved binding motif and DDB1 sequence, Neo-1 correctly folds the substrate protein and completes the ligand fragment while flexibly accommodating the “inpainted” interactions. Parts generated by Neo-1 are shown with carbon atoms in green. Reference molecule co-crystalized with CDK12 shown in gray (6TD3).

Case Study 2: End-to-End Antibody Discovery

Fig. 23: Neo-1 unlocks end-to-end rational antibody discovery by incorporating knowledge via conditioning. Generated structure is shown in green, folded or input structure in blues. 1) Initial design by folding VH against structure of SARS-COV-2 RBD and generating portion of CDRH3 sequence. 2) Generation is steered toward target epitope via distance restraint conditioning (pink). 3) Sequence conditioning of paratope residues (arrows) steers sequence choice. 4) Loop generation can be conditioned on structures of both antibody framework regions and antigen.

This case study highlights how Neo-1 can be used in rational antibody design against a known antigen, an incredibly important task both in proximity modulation drug discovery and more broadly. As illustrated by several emerging proximity modality types, such as DACs, LYTACs, and others now approaching clinical studies, antibodies can create unique possibilities in the proximity modulation landscape.

We leverage the recently crystallized SARS-COV-2 RBD (PDB ID: 7MSQ) as an example. For conceptual simplicity, we focus on the VH antibody fragment (i.e. nanobody design) and ignore VL, which could be designed in a similar fashion. Fig. 23 shows the end-to-end process, which displays a key innovation of Neo-1: Neo-1 can be directly conditioned with multiple types of information to ensure that experimentally obtained knowledge is used efficiently during the entire design process.

In the initial step, Neo-1 is prompted with a partial antibody sequence and the structure of the antigen, then simultaneously co-folds the VH against the antigen structure and generates a portion of the CDRH3 sequence (Fig. 23.1). Despite this structure being disclosed after Neo-1's training cutoff, Neo-1 designs a VH that is predicted to interact via its CDRH3 with a real epitope. Neo-1 also samples diverse putative epitopes, which is desirable when epitopes are not known a priori (Fig. 24).

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig14_CaseStudy2_AbDesign/fig14_case_study_02.svg

Fig. 24: Restrained generation steers VH designs toward target epitopes. Epitope heatmaps (pink) indicate antigen sites frequently interacting with the designed portion of the CDRH3 in unrestrained (left) and restrained (right) samples. Color intensity reflects normalized occurrence.

As the antigen becomes experimentally characterized by epitope mapping with Neo-1-designed or naturally occurring antibodies, scientists will increasingly design targeted antibodies that bind to a preferred epitope. This information can be used to generate refined designs that capture key CDRH3 characteristics using Neo-1. Prompting Neo-1 with interval distance restraints between CDRH3 and epitope residues results in increased exploration of the target epitope (Fig. 24) and generation of more favorable beta-strand interactions (Fig. 23.2).

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig14_CaseStudy2_AbDesign/fig14_case_study_03.svg

Fig. 25: Neo-1 conditioned with antigen structure, epitope distance restraints, and sequence motifs generates biologically reasonable all-atom features. Top left: hydrogen-bonding network stabilizing CDRH3 loop. Top right: burial of user-specified tryptophan in hydrophobic pocket. Bottom left: polar and charged interactions with antigen residues. Bottom right: backbone hydrogen bonds in antiparallel β-strand. Designed residues in green, folded residues in dark blue.

As designed antibodies are tested and improved via design iteration, affinity maturation, or other strategies, paratope sequence patterns can emerge that illuminate conserved interactions that should be maintained. Neo-1 can be conditioned with sequence information to maintain key motifs while exploring protein sequence space. Fig. 23.3 depicts a scenario in which Neo-1 is conditioned with antigen structure, distance restraints and two key sequence motifs as they might have been found through affinity maturation. The resulting generated structure exhibits favorable interactions both encouraged by user conditioning and generated unconditionally (Fig. 25). Distance restraint conditioning encourages the formation of the beta strand interaction with the desired epitope, while sequence conditioning encourages the selection of user-specified hydrophobic side chains which are buried in an antigen pocket. Favorable interactions also arise that were not explicitly conditioned, e.g., the formation of inter-chain polar or charged side chain interactions and emergence of a hydrogen bonding network between designed and folded VH residues that may help pre-arrange the CDRH3 loop.

After multiple iterations of affinity optimization, structures of antibody:antigen complexes are often solved to further rationalize and design interactions. Fig. 23.4 shows how Neo-1 easily integrates this additional information, and produces CDRH3 designs conditioned on such experimental obtained binder structures in conjunction with precise distance and epitope information. With this, Neo-1 manages to rediscover strand interactions highly similar to the crystallized optimized binder (Fig. 26).

https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig14_CaseStudy2_AbDesign/fig14_case_study_04.svg

Fig. 26: Neo-1 conditioned with the structures of antigen, VH framework, and distance restraints recovers strand interactions with high fidelity. CDRH3 loop inpainting designs form an antiparallel beta strand interaction with the antigen in a manner highly similar to the reference structure (7MSQ), recovering four of five hydrogen bonds.

Limitations and the Way Forward

While the benchmarks and case studies presented above underscore the capabilities of Neo-1 as a powerful, unified model—demonstrating its ability to predict structures, rediscover critical molecular interactions, and serve as a highly controllable co-pilot in drug discovery—these retrospective evaluations naturally come with inherent limitations. We have already deployed Neo-1 in both internal and collaborative research programs to great effect and look forward to sharing prospective and experimentally validated results soon.

Decoding and designing molecular structures is a breakthrough, but it's only the beginning. NeoLink takes us further by illuminating how proteins and molecules dynamically interact within the living cell, revealing the real-time choreography that underpins both therapeutic success and potential side effects. We look forward to scaling our technology with aims to bridge the gap between static structure and dynamic function, offering critical insights into the emergent clinical outcomes shaped by these complex molecular networks. For VantAI's internal programs, we already benefit from these unique capabilities, and are able to design and prioritize molecules in ways not previously possible.

Additionally, while contributions of Neo-1 and the underlying NeoLink data platform represent a significant milestone and future potential, they remain one step among many to come. We are grateful to the academic field and contributions of many companies who have released their science and developments and look forward to a large release of novel data from VantAI, to supplement our PINDER and PLINDER resources. As the amount of additional information that can be extracted from the RCSB PDB and public data sources is waning, our structural proteomics data platform and breakthrough model advances speak to a new frontier, where we and others will innovate what is possible in our quest to map the true complexity of biology.

If this kind of frontier research as part of a small and extremely talented team with a compute, data and model advantage excites you, and you have a track record of excellence—we are hiring. Reach out to us at recruiting@vant.ai.

The Team Behind Neo-1

Clemens Isert*, Michael Pun*, Emanuele Rossi*, Thomas Castiglione, Doug Tischer, Mehmet Akdel, Daniel Kovtun, Marco Pegoraro, Thomas Duignan, Alex Zhang, Vladas Oleinikovas, Graham Holt, Yusuf Adeshina, Patrick Kunzmann, Arjun Ramesh, Douglas Wu, Alex Goncearenco, Lidor Foguel, Dana Felker, Davide Sabbadin, Vivian Lam, Matthias Grass, Zach Carpenter, Michael Bronstein, Luca Naef

*Equal contribution