Match bonding topology to PDBQT files

PDBQT files are here to stay (for now). AutoDock- and Vina-flavoured docking is used in a wide range of academic and private sector pipelines, so for historical reasons we have to work with PDBQT files, even despite their shortcomings. The most obvious shortcoming is the lack of connectivity information, meaning the underlying molecular graph has to be inferred from the coordinates.

In VMD, when a PDB file is loaded without a PSF, bonding is inferred using a distance based rule: pairwise distance of less than (vdwRadius1 + vdwRadius2)*0.6 implies they are covalently bonded, since nonbonded atoms don't normally approach this close at kT. Of course you need another step to catch double bonds. However, you will likely have a SMILES code, meaning the problem is just to match the topology to the coordinates.

This snippet uses networkx to marry the two. RDKit takes a first pass over the coordinates from the pdbqt, inferring connectivity using a distance-based rule. An Isomeric SMILES code is used to build the true molecular graph. The atom identities will be out of order, and the key step is use of the VF2 algorithm to match nodes between the graphs. Subsequently, a conformer is added to the true molecular graph and the coordinates of the conformer are set based on knowledge of atom identities from the graph matching. The VF2 algorithm is available in networkx as networkx.algorithms.isomorphism.GraphMatcher(molGraph1, molGraph2.

      
from rdkit import Chem
from networkx.algorithms import isomorphism
from rdkit.Geometry import Point3D

def mol_to_nx(mol):
    G = nx.Graph()

    for atom in mol.GetAtoms():
        G.add_node(atom.GetIdx(),
                   atomic_num=atom.GetAtomicNum())
    for bond in mol.GetBonds():
        G.add_edge(bond.GetBeginAtomIdx(),
                   bond.GetEndAtomIdx())
    return G

def pdbqtToMol(fname):
    lines = [i[:54] for i in open(fname) if 'ATOM' in i]
    return Chem.MolFromPDBBlock('\n'.join(lines))

def makeMol(isomericSmiles, pdbqtFile):
    """
    Given an isomeric SMILES code, and it's corresponding pdbqt filename,
    infer the molecular graph from the pdbqt, then match to the actual graph
    using networkx. Use the matching to set the atom coordinates of rdkit molecule,
    resulting in an rdkit Mol object with both correct topology and correct coordinates.
    """
    mol = Chem.MolFromSmiles(isomericSmiles)
    pdbqt = pdbqtToMol(pdbqtFile)
    #get positions of the atoms from the pdbqt:
    positions = pdbqt.GetConformer(0).GetPositions()
    n_atoms = mol.GetNumAtoms()
    
    #get atom-to-atom matching.
    g_matching = isomorphism.GraphMatcher(
        mol_to_nx(mol), 
        mol_to_nx(pdbqt)
    )
    #if we don't have isomorphism, the inferred bonds from the pdbqt are wrong:
    assert(g_matching.is_isomorphic())
    
    #get positions of the atoms from the pdbqt:
    positions = pdbqt.GetConformer(0).GetPositions()
        
    #give the mol a blank conformer
    mol.AddConformer(Chem.Conformer(n_atoms))
    conf = mol.GetConformer(0)
    for i in range(n_atoms):
        x,y,z = positions[g_matching.mapping[i]]
        conf.SetAtomPosition(i,Point3D(x,y,z))
    return mol