weix.us

We have protein folding at home

SimpleFold: Folding Proteins is Simpler than You Think.

Protein folding models typically employ computationally expensive modules involving triangular updates, explicit pair representations or multiple training objectives curated for this specific domain. Instead, SimpleFold employs standard transformer blocks with adaptive layers and is trained via a generative flow-matching objective with an additional structural term. We scale SimpleFold to 3B parameters and train it on approximately 9M distilled protein structures together with experimental PDB data. On standard folding benchmarks, SimpleFold-3B achieves competitive performance compared to state-of-the-art baselines, in addition SimpleFold demonstrates strong performance in ensemble prediction which is typically difficult for models trained via deterministic reconstruction objectives. Due to its general-purpose architecture, SimpleFold shows efficiency in deployment and inference on consumer-level hardware. SimpleFold challenges the reliance on complex domain-specific architectures designs in protein folding, opening up an alternative design space for future progress.

I tested this out at home on my macbook on 1BFP (blue fluorescent protein). It feels very performant, and I like that I can run it locally. Most of the time to process was booked up by downloading the models. I was able to get an acceptable structure in about 43 seconds, which is astonishing for a protein folding problem. It didn’t account for the formation of the chromophore buried inside of the beta-barrel, but that also happens post-folding. I’m pretty satisfied with this result.

That being said, when I ran it on subtlisin (4NMX), I got something that was kind of confounding: simplefold only ran a prediction on the first part of the subtlisin complex. Multimeric structures are an issue, which means I will have to wait to do most of the things I’m interested in guessing structures for.

By no means is it a replacement for real structural biology, but it’s so cool to me that there’s now protein folding models that fit on a consumer laptop and can at least make a good guesses towards structures from sequences.

Updated by Elliott Weix.