PDB Similarity with RDKitJS

Link to the Javascript tool

Working with hits that have no crystal structure, one often wonders whether the proposed binding mode is accurate. Having a good understanding of this helps facilitate the analoguing phase. While the analogue SAR can inform on the binding mode, one way to build supporting evidence beforehand is comparison to existing PDB ligand structures.

The RCSB does have an input field for SMILES (Advanced Search > Chemical Similarity, then Query Type = Descriptor), but I find it sometimes returns nothing - perhaps there is some unknown threshold. Doing the same in rdkit/python, where every parameter is controllable, is something I do often. Nevertheless, it's still a pain loading up a notebook every time.

Recently, RDKitJS has had a lot of improvements, and it's perfect for a web-based tool that can be accessed quickly. As a project to learn Javascript, I wrote an app (?) that takes a query SMILES, fingerprints all the ligands from the PDB (with mol.wt. 200-550), and returns a list, ordered by similarity to the query, along with pictures. The list is formatted by datatables. As an aside, it's a lot of fun to browse the different javascript spreadsheet options at jspreadsheets.

Technically it works on iOS, but it doesn't look correct. And please note that it still takes 15-20 seconds to featurize all the PDB molecules :) otherwise, it's just like the python equivalent. Use the tool here.


PostScript for javascript newbies like me: Just inspect the JS to see how it's all done, but don't expect anything pretty! There were three components that had to be figured out:

1: Reading a CSV file of names and SMILES. I used ChatGPT here. I'm not even sure what the xml http request does, but it seems pretty general - it just loads data that comes from a file rather than being present in the js or html of the page.

2: Using RDKitJS to fingerprint molecules into a 'dataframe'. This is done by creating an array of arrays, splitting each row and appending (smiles, name, fp) triplets. The RDKitJS function for fingerprinting is demonstrated in this chembl blog code, but note that it's deprecated and now takes parameters supplied as a JSON string. Storing this array separately from the fingerprinting code means the FPs only need to be calculated once.

3: Calculating similarities and showing a table. This was the most involved. ChatGPT gave an incorrect function to calculate Tanimoto similarity, but it was close enough to figure it out. I could not figure out a way to use sparse fingerprints, but there is a way (see datagrok). I also wanted the datatables object to be refreshable, but once it's rendered on the page it won't re-render straightforwardly. To do that, you'll see that the app first instantiates a table with an empty data array. At the point of requesting similarities, this table is deleted and a new table is populated that contains the similarities to whatever query SMILES is in the text field at that time.

The grid of molecules at the end is a hack - I don't think there's a MolsToGridImage in RDKitJS yet. But I could re-use the drawing function used at the top of the page, adding an inline option that specifies a class='column' tag in the relevant div, along with some css in the header, which emulates a grid.