l8rmr-LigandScoringProtocol.txt

Name

LIE

Software

OpenBABEL-2.3.2
AutoDock Vina-1.1.2
GROMACS-4.5.5
y_tb-0.2.3 (in house)
y_eye-0.1.0 (in house)

Parameters

please see supplementary for GROMACS mdps
dodecahedron shaped periodic box with ~12.5K TIP3 explicit waters and 0.1M NaCl at pH=7.0
AMBER-ILDN parameters set for protein atoms
adjusted YFF1 with Kirchhoff charges for compounds; bonded part from GAFF

Method

A linear interaction energies (LIE) model (one per pose) consists of 8 replications (from the same starting coordinates but different random seed in velocities generation routine), each of 12 ns MD run followed by 4ns of productive run with statistics being collected. The subset of 12 reported in literature compounds were analyzed to parameterize the 5-parameters weighted LIE model: dG=a*Q_comp+b*LJ_comp+g*Q_site+d*LJ_site+e*T*Entropy_loss, {the fit LIE model is a=-0.059274, b=0.343518, g=0.012304, d=0.110224, e=-0.00003, q_site_0=+6418.28KJ/mol l_site_0=-1458.75/KJ/mol}, for weights calculations dG was also multiplied by factor 0.02 which doesn't have good theoretical justification but known to work considerably better in practice [our experience plus for example ref Almlof et al, 2007].
After LIE model was developed, an iterative training was applied to obtain neuron network (NN) that can predict poses_after_MD from poses_after_docking at the input (see poses generation protocol for details). As the training set were used compounds 1-36 (see poses generation for details). For the rest of compounds the trained NN made speculations about atomic densities voxel maps (see scriptx py_box_gro.py in SuppInfo) of the complex after MD and the most similar (see py_box_gro.py in SuppInfo) pose to the know binders was studied with LIE. Normally AI is also trained to estimate LIE parameters from a docking pose and the speculated evolution of a molecular complex, but the option is rather for big data sets and makes no sense for a set of only 102 compounds.
The selected by NN poses were the starting geometry for 37-102 compounds. The LIE protocol was applied and energies of binding were computed using fitted LIE model above. They are converted to kcal units and enclosed in this archive.

l8rmr-PosePredictionProtocol.txt

Name

AI_MD

Software

OpenBABEL-2.3.2
AutoDock Vina-1.1.2
GROMACS-4.5.5
y_tb-0.2.3 (in house)
y_eye-0.1.0 (in house)

System Preparation Parameters

please see supplementary for Autodock vina.cfgs
please see supplementary for GROMACS mdps
dodecahedron shaped periodic box with ~12.5K TIP3P explicit waters and 0.1M NaCl at pH=7.0
AMBER-ILDN parameters set for protein atoms
adjusted non-bonded YFF1 with Kirchhoff charges for compounds; bonded part from GAFF

System Preparation Method

Compounds were converted to SDF with OpenBABEL and then to PDBQ/GROMACS (gro, itp) with y_tb.
3 receptor structures were taken from PDB (3dct, 3oof and 3bej), non-protein molecules are cut, structures aligned so that positive direction of z-axis corresponded to (un)binding reaction coordinate and 22 x 22 x 30 A binding site was defined manually. The protein was threat as rigid during docking stage, and consecutive simulations of molecular dynamic (MD) addressed the flexibility problem.

Pose Prediction Parameters

For each of the 3 selected conformations of FXR protein up to 20 poses were generated with Autodock Vina (up to 60 totally).
5-'colours' atomic density voxel maps of size 21x21x30 cube unit (edge=1A) centered at the docking grid center were calculated for each docking pose (for details see py_box_gro.py in SuppInfo).
AI part is convolution neuron network (NN) with 9 layers and 503088 neurons.
For each complex, 8 parallel MD runs from the same starting point (docked pose) but different random seeds for velocities generation were carried out. 12 ns evolution runs followed by 4 ns production runs with detailed statistics collection were carried out with GROMACS.
The productive run collected statistics about 5 LIE parameters: Coulomb and Lennard-Jones interactions of compound, the site (residues, numeration from pdb3dct, 265, 270, 273, 284, 286, 287, 288, 290, 291, 294, 325, 328, 329, 331, 332, 335, 336, 352, 357, 365, 369, 384, 447, 451, 454, 461 and 469), rest_of_enviroment and coordinates to calculate compound entropy loss (Quasyharmonic approximation). For details of LIE model please check LigandScoringPotocol.txt.
The final binding pose is created by continuation of the best LIE-scored trajectory per compound 1-36 with (linear) simulated annealing from T=310K to T=77K (boiling of N2) in 0.3 ns.

Pose Prediction Method

Simulations of molecular dynamics (MD) are preferred for dealing with flexible targets like FXR.
Because it is expensive, the simulations are used for only a small affordable subset of candidates (here it is compounds 1-36 which anyway require extra treatment) and then a neuron network (NN) is trained to predict evolution of the complexes from a single docked pose using the simulated subset as the training set.
After NN is trained, it is used to rapidly predict MD evolution of numerous docked poses for each compound of even a huge set and only the most promising (poses of) candidates from entire set are studied with real simulations.
Additionally AI can be trained to estimate linear interaction energies (LIE) parameters from a docking pose and the predicted evolution of molecular complex, but this option was not used in this protocol.
So we docked each compound into every map with up to 20 replicas saved per compound. Among 3x20 docked poses the highest scored and the most 'similar' (smallest norm of differences between atomic density voxel maps of the binding site, for details see py_box_gro.py and py_box_compare.py in SuppInfo) conformation were simulated.
Then we trained NN - as the input structure was used a voxel map of docked compound and as the desired NN output was used a voxel map of the top scored (by LIE) structures after MD simulation. The NN was trained with y_eye using combination of Genetic global optimizer to shuffle convolution feature maps with crossingovers and then fit models with Conjugate gradient local optimizer. After NN reached its limit in predicting of voxel_maps_after_MD from voxel_maps_after_docking for the training set, it was used to speculate after_MD_voxel_maps for each docked pose of each compound. The speculated voxel maps after MD were checked for similarity with results of LIE training set (consist of known binders, see scoring protocol for details). This iterative process was looping till convergence i.e. when NN trained on entire set of MD outcomes can't speculate a new candidate from docked pool that was not analyzed with MD previously; it actually converges almost instantly - after two iterations.
After the convergence we selected the trajectory with highest estimated LIE energy and cooled it down to obtain a good pose from the most promising ensemble. The 1-36 frozen structures are enclosed.
For compounds 37-102, the initial pose was selected by trained NN among (up to) 60 docked conformations considering similarity of predicted after_MD_voxel_maps and actual voxel maps of the know binders. Because for scoring we don't need a structure but (equilibrium) ensemble, for compound 37-102 we present the selected initial pose for generative LIE protocol.