Tethered minimization of small molecules with RDKit

I think that’s my first RDKit post! So reason to celebrate!

Here' I’ll focus on a very nice feature available in Open Source Software and very useful in daily structure based but even ligand based drug design tasks.

The problem

How can I dock small molecules into a receptor and avoid as much as possible the “what is the right pose” problem?

The solution

Sorry to disappoint you there is no easy solution to that. But given a few assumptions you can actually work around this issue quite well. Let me explain that a bit further below.

The approach taken here

Docking small molecules in a pocket of a protein structure can be done for various reasons and use cases. Virtual screening or lead optimization and SAR analysis for instance. I’ll focus on lead optimization stages here and this setting gives us a few pre-requisits to work with.

Let’s suppose you have a crystal structure or a very good model of how your lead compound binds to the target. Then docking your whole compound series where decorations have been added here and there could actually make use of the known positioning of the original lead compound. That’s based on the (strong) assumption that the modifications made on the ligand do not affect the binding mode of the ligand in a significant way. I know that this is not always true, but very often it’s a good enough hypothesis to work with and get things done.

So how can we achieve something like this with two popular open source tools like RDKit and docking using rdock?

rdock allows you to do tethered docking….but for that you need 3D conformations of the ligands already in the place where they should be tethered and then a property in the SD file indicating which atoms should be tethered.

In this post I’ll show how to do that first part: how to prepare your sd file for tethered docking with rdock using RDKit.

All the functional source code and sample data and output can be found on this github repo.

Example on HSP90 inhbitors

Let’s suppose we have a reference structure we would like to dock into. Here I chose my favourite structure from the PDB: 1uyd.

PU8 ligand in RCSB PDB structure 1uyd

PU8 ligand in RCSB PDB structure 1uyd

Now let us see how we can prepare a docking of a series of compounds that all share at least the adenine fragment in them:

Compounds we’d like to dock

Compounds we’d like to dock

Let’s set up a RDKit script that takes a reference 3D molecular structure and a set of ligands and does a tethered minimization on the MCS of the ligand and reference structure. Below I’ll just go through the main steps (not the full code). Refer to this github repo for the full code and examples.

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdFMCS
from rdkit.Chem.rdMolAlign import AlignMol

reference = #reference molecule
ligands = #ligands to align

for mol in ligands:
    mcsResult=rdFMCS.FindMCS(mols,threshold=0.9,completeRingsOnly=True)    #find the maximum common substructure

    if mcsResult.smartsString and len(mcsResult.smartsString)>0 :
        patt = Chem.MolFromSmarts(mcsResult.smartsString,mergeHs=True)

        #keep only the core of the reference molecule
        if ref: 

Now we have identified the common core between our ligand and the reference molecule. Next, we will use that to minimize our ligand on top of that core from the reference.

            newmol=Chem.Mol(mol)    #create a new instance of the ligand, as we will change the coordinates
            AllChem.ConstrainedEmbed(newmol,core)    #constrained minimization of newmol versus the core of the reference

Actually, just two lines of code to do the tricky part. Rather cool! Now we just have to prepare our output for rdock, which is basically write out the minimized and aligned molecule into an SD file with a property containing all atoms that should be constrained upon docking.

            tethered_atom_ids=newmol.GetSubstructMatches(patt)    #that's to get the atom ids only
            if tethered_atom_ids : 
                t1=map(lambda x:x+1, list(t))
                ta=','.join(str(el) for el in t1)
                nm=Chem.AddHs(newmol, addCoords=True)    #create a new 3D molecule  and add the TETHERED ATOMS property
                nm.SetProp('TETHERED ATOMS',ta)
                w.write(nm)    #write to an sd file

That was not too difficult actually. So once this is done we end up with an SD file containing all molecules aligned and with the relevant property associated to each molecule:

Now we have the TETHERED ATOMS in the output SD file - ready for rdock

Now we have the TETHERED ATOMS in the output SD file - ready for rdock

Final aligned and minimized molecules - all adenins are perfectly aligned and even other parts of the molecule are as well, if the MCS was bigger than the adenine fragment.

This SD file can now be used for tethered docking with rdock. Follow a normal docking protocol with rdock but provide in the ligand section of your docking configuration file something along these lines.

Excerpt from the  rdock documentation

Excerpt from the rdock documentation

Obviously you can use this script or the output SD to do a lot of other things as well.

For the rdock affiliates among you

The code presented here can serve as a complete replacement of the sdtether.py that was originally shipped with rdock. The original sdtether only allowed rigid body alignments on a reference ligand which obviously comes with important caveats. With the new RDKit based version you run flexible minimizations.


I’d like to thank Greg Laundrum for his great work on RDKit and his hints to the ConstrainedEmbed function. David Morley, Xavier Barril, Daniel Alvarez-Garcia & Sergio Luiz Carmona for writing and maintaining rdock.

Related readings

Same idea with a slightly different approach (not rdock related though): https://iwatobipen.wordpress.com/2019/06/04/constrain-embedding-with-mcs-as-a-core-rdkit-chemoinformatics/

Rdkit blog: