Hbonds with rdkit and numpy

Gist link

There are several tools to calculate hydrogn bonds (eg biotite, mdtraj, mdanalysis), but they don't have the perfect combination of features for me: speed, playing nice with rdkit, and tuneable atom perception via SMARTS definitions of donor/acceptor types.

As an example of what doesn't work for me, I'll use a library that is clearly very good for it's intended purpose. MDTraj is fast for calculating HBonds in a molecular dynamics trajectory with a million frames. However, to do the same thing on a million docked or otherwise modelled protein-ligand structures would require a million conversions from rdkit to PDB to mdtraj topology, wasting time.

So here's how to do it directly from rdkit with numpy. This function takes about 600 µs after loading and preparing the rdkit molecules (with Chem.Combine and Chem.Sanitize), which seems fast enough for my purposes. There's an example here of how to draw hydrogen bonds with py3Dmol too: