jk3no-FreeEnergyProtocol.txt

Name

ScaffOpt_MD100ps_FE3

Software

LigPrep v33013
ScaffOpt

Protein Forcefield

None

Ligand Forcefield

None

Water Model

None

Parameters

Assumed pH 5 for ligand preparation.
-dt 'RDK5' -dt 'ErgFP' -dt 'ECFP' -dt 'FCFP' -dt '2Dpp' -dt '3Dpp' -dt 'mmgbsa_fp' -dt 'moments' -dt 'moments_sift' -binsize 0 -xmean 0.00001 -repbt 100 -hpnum 100 -of 'multi' for all 4 ScaffOpt runs.

Method

Briefly, 3D ligand conformations and tautomerization/ionization states were generated with LigPrep at target pH=5. In case of compounds with alternative tautomers/ionization states, only the one with lowest LigPrep state penalty was used. CHEMBL was queried for similarity to the 3 CatS datasets ('score', 'FESet' and 'pose') and 4 assays were selected to be used by ScaffOpt algorithm as training set. These were (assay IDs): CHEMBL1048481, CHEMBL1103405, CHEMBL1103448, CHEMBL2318072. ScaffOpt was executed 4 times, one for each assay. ScaffOpt is a fully-automatic machine learning algorithm that takes as input a few molecules with measured binding affinity (training set) and scores a given screening database of compounds according to their predicted binding affinity to the receptor. The details of ScaffOpt algorithm will be described in a forth-coming publication. For this submission both 3D and 2D structural information of the compounds was used. The predicted binding scores in this submission were obtained only from assay CHEMBL2318072 at simcut level 0.3 ('simcut' is a ScaffOpt parameter that quantifies the similarity between the training set and the screening set). ATTENTION: the numbers are scores (the lowest the stronger the binder) not free energies in kcal/mol, therefore RMSEc results may be poor compared with Kendall's tau and Pearson's R which are unaffected. Also an updated version of ScaffOpt was used.

Answer 1

No

jk3no-PosePredictionProtocol.txt

Name

ScaffOpt_MD100ps_FESet

Software

LigPrep v33013
ScaffOpt

System Preparation Parameters

Assumed pH 5 for ligand preparation.
Semi-empirical AM1 charges generated with antechamber (AmberTools 17).
AMBER14SB and GAFF2 force fields.

System Preparation Method

3D ligand conformations and tautomerization/ionization states were generated with LigPrep at target pH=5. The reason I am submitting for both 'Score - Structure based' and 'Free Energy Set 1' is because the latter dataset contains fewer compounds (33) and hence I can use predictions at higher simcut level (see FreeEnergyProtocol.txt for details), which tend to be more accurate. Therefore, it will be interesting to compare the performance of a screening program like ScaffOpt with the more rigorous Free Energy methods.

Pose Prediction Parameters

None

Pose Prediction Method

The starting structures for MD simulations were generated with ScaffOpt's own pose prediction algorithm.
First .pdb files wih crystal ligands similar to the training and blind test sets were fetched using fetch_similar_ligands_from_pdb.py script.
Some of them were close homologues to the target protein but many of them were distant homologues that shared a very conserved binding pocket.
The selection of the right crystal structure and the alignment of each CatS or training compound to the reference crystal ligand was performed
using align_lig.py script. This script performs a Maximum Common Substructure (MCS) alignment between the scaffolds (chemotypes) of the crystal
ligand and the query ligand. Subsequenly performs an energy minimization with restraints on MCS and best conformer selection by energy, shape similarity
and RMSD to crystal ligand. Since many reference crystal ligands originated from distant homologues to the target protein, a homology model
of the target protein with the query ligand in the binding pocket was generated using two template structures. One was the original distant homologue and the other was a
crystal stucture (with another ligand) of the target protein or very close homologue of it, which was selected using align_models.py script. The homology
models were generated using homology_model_generator.py script and preserved the key protein-ligand interactions present in the distant homologous crystal
structure. Finally, a 100 ps long MD simulation was carried out for each CatS or training ligand complex for equilibration using OpenMM. The analysis was
conducted on the whole MD ensemble. The final snapshots of each trajectory are included in this folder as .pdb (protein without waters) and .mol (ligand) files.

Answer 1

No

Answer 2

No