hdnka Protocol File(s)

Name

Deep learning pharmacophore based docking

Software

HTMD1.7.43/OpenBabel2.4.1/RDKit2015.09.1/CryptoScout/MGLTools-1.5.6/ACEMD3/

System Preparation Parameters

HTMD proteinPrepare protocol at default values (pH=7.4)
GAFF2 forcefield to parameterize the ligands for ligand-protein simulations
CHARMM22* forcefield to parameterize the protein for CryptoScout and AMBER14SB to parameterize proteins in ligand-protein simulations.

System Preparation Method

HTMD proteinPrepare protocol was used to prepare the DSO4 structure both in CryptoScout simulations and the ligand-protein simulations with default parameters (pH=7.4).
Also, CHARMM22* was used in CryptoScout, while AMBER14SB was used for the ligand-protein simulations. All water molecules were eliminated from this structure.
Ligand conformational libraries were generated using RDKit2015.09.1. 400 conformers were clustered into 25 representatives for each ligand.
Preparation for docking was done using MGLTools-1.5.6 to convert both ligand and protein PDB files to PDBQT files.
HTMD was used to place the ligands in the desired binding site.

Pose Prediction Parameters

Exhaustiveness = 30 iterations # number of generations in the genetic algorithm
Population = 30 parents #number of individuals in the genetic algorithm population
Elitism = 10 #number of parents selected to create the next generation
Num_starting_poses=12 #number of poses generated by the kabsch algorithm that are then refined by the genetic algorithm
Fields threshold = 0.95 #value to define the shape of the fields
Dimensions of the binding pocket box = 24*24*24 Angstroms
Timestep for the ligand-protein simulations = 2 * 10^-15 s (2 femtoseconds)

Pose Prediction Method

A deep neural network inspired in DeepSite[1] was used to predict the pharmacophore of the binding pocket.
The resulting pharmacophore was used to guide the docking by maximizing the overlap of the ligand features and the pharmacophore features.
Reference 1) Jiménez, J., et al. "DeepSite: Protein binding site predictor using 3D-convolutional neural networks." Bioinformatics (2017).
Binding site selection.
The DMSO4 structure provided was processed with CryptoScout (currently unpublished software), which finds protein binding sites, pre-formed or hidden, by simulating the protein solvated in mixture of water and benzene at 0.1M. Based on benzene binding sites, we are able to detect potential drug binding hot-spots. We selected the best two detected hotspots and we clustered the residues around them based on solvent-accessible surface area (SASA) into 10 clusters using Kmeans algorithm. We then selected 1 representative from each cluster, obtaining a total of 10 protein conformations. The binding site that we found matches the one in PDB 2HHN structure and is close to SO4 and DMSO4 groups in the structures provided by the contest.
Ligand preparation.
Ligands were generated from the provided smiles using RDKit2015.09.1 and possible tautomers were considered for each ligand by hand.
At the end, we had 3 tautomers for CatS_3, 2 for Cats_23 and 2 more for CatS_17. 400 conformers were generated through RDKit2015.09.1
for each molecule. Of those 400 conformers, only 25 representatives were selected to be docked. After docking, the ligands were parameterized using GAFF2 forcefield.
Protein preparation.
In order to run CryptoScout and find the binding site of the protein, a system containing the protein must be prepared and simulated. We prepared the DMSO4 structure using the proteinPrepare protocol of HTMD with default parameters (pH = 7.4) and the CHARMM22* forcefield. After docking, proteins were prepared for simulation using AMBER14SB.
Docking preparation.
In order to dock the ligands with the proteins, the files containing their coordinates must be transformed into PDBQT file format using MGLTools-1.5.6 prepare_ligand4 and prepare_receptor4, respectively.
Docking.
We docked every ligand conformer with every protein conformation, guiding the docking with the pharmacophore prediction at the binding site that we found.
This pharmacophore prediction is composed by fields representing regions of space with high probability of having a given chemical group.
It also creates a prediction of the shape of the ligand.
We then use the Kabsch algorithm to place the features of the ligand on top of these predicted fields and a genetic algorithm is run at the end to refine this overlap.
Selection.
After docking each ligand and their tautomers -if any- with every protein conformation, several poses for each pair were available.
The best 5 results of these several poses were selected according to a score based on the overlap between ligand features and pharmacophore fields normalized by the proportion of that feature in the ligand. We discarded poses with clashes. These 5 best poses were simulated using ACEMD3, parameterizing the protein with the AMBER14SB forcefield and the ligand with the GAFF2 forcefield at timestep 2 fs (femtoseconds) for 2 ns (nanoseconds) of equilibration and 2 ns of production. The resulting frames were used as input for an in-house deep learning based binding affinity predictor and the ligand-protein pairs that showed the best mean affinity were selected and ranked. The protein and ligand coordinates of the last frame of the simulation were used to do the submission after superposing with the provided SO4 structure.

Answer 1

Yes

Answer 2

Yes

hdnka-PosePredictionProtocol.txt