HCT solvation in numpy

Gist link to notebook here

A great collaborative paper recently demonstrated how to use implicit solvation to filter out inactives from a list of high-scoring docked compounds: Identifying Artifacts from Large Library Docking. At a time when AI hype is high, papers like this are a nice reminder that there's a lot of useful improvement we can wring out of classical algorithms. It would be nice to tweak such solvation models so to improve performance. A good starting point is a hackable (ideally differentiable) implementation in python.

The field of implicit solvation is awfully complicated, with lots of jargon, and algorithms that span several linebreaks in their parent publications. At the heart of this complexity is that there are many ways to calculate the effective Born radius. The HCT model was an early one. Taking some atom of interest, it assumes each nearby atom will remove a sphere of solvent equal to the size of the nearby atom's vDW sphere. We know water molecules are excluded by more than just the vDW volume - see Meshing the solvent-excluded surface - but it was a reasonable approximation and works well when an atom isn't too buried (e.g. small molecules).

Once you've got a Born radius, everything else is pretty much the same. So implementing the HCT model in numpy is a good starting point to sanity check the results. I found the original HCT publication impenetrable, but thankfully the OpenMM implementation is clear to me. So, in this notebook, I use openforcefield/openmmforcefields, espaloma, and OpenMM to parameterize a small molecule (indomethacin) and print out the solvation energy according to the HCT model. Following this, I calculate it with numpy and show that the result is the same. By itself, this code doesn't do anything that OpenMM couldn't do - but, of course, one could port the numpy parts to jax or mlx and then run gradient descent against any of the input parameters (many of which have empirically-determined values anyway).