BioEmu on Modal

Gist link to modal input script here

It seems the literature is heading to the conclusion that cofolding models do not generalise to new chemistry, see, e.g. this paper or this one

As an alternative to cofolding models, perhaps we could just expose a cryptic site by generating an ensemble of conformations of some protein of interest. Once the site is exposed, then regular, boring, modeling techniques that have stood the test of time (e.g. docking) might work well. Molecular dynamics is one way to expose such cryptic sites, given a very long time. Neverthless, even with techniques like cosolvent simulations or enhanced sampling, it'll take a while and not insignificant money, so it won't be a standard approach for most.

Instead, there are now protein structure-prediction models that aim to generate conformer ensembles. One such model was trained in multiple stages on a dizzying multi-modal dataset, including experimental stabilities, molecular dynamics, and static snapshots. Yet I'd never heard of it until seeing an experimental notebook in the ColabFold repo use it, which didn't work for me. So, today's snippet shares how to run BioEmu on Modal for the fellow GPU-poor. This took some serious tinkering - Modal is fussy at best, but then BioEmu wants you to install colabfold, into a fresh conda, without activating the environment, then finally patching several files. There's also a uv virtual env tucked in there, too. Obviously that will completely screw with a Modal instance. But the magic incantation does exist, and here it is.

The example snippet uses T4 lysozyme and generates 50 samples, apparently in proportion to their free energy if you believe the paper. After trying a few systems, this might be true. Some samples are filtered out due to unphysical geometry. The output is in an xtc file, which can be parsed easily enough with MDTraj or MDAnalysis. Pro tip: do not use HPacker to rebuild sidechains. Use Yang Zhang's FASPR instead - faster, already works perfectly, and much less fiddly.