y_tb-0.2.3 (in house)
y_eye-0.1.0 (in house)
please see supplementary for Autodock vina.cfgs
please see supplementary for GROMACS mdps
dodecahedron shaped periodic box with ~12.5K TIP3P explicit waters and 0.1M NaCl at pH=7.0
AMBER-ILDN parameters set for protein atoms
adjusted non-bonded YFF1 with Kirchhoff charges for compounds; bonded part from GAFF
Compounds were converted to SDF with OpenBABEL and then to PDBQ/GROMACS (gro, itp) with y_tb.
3 receptor structures were taken from PDB (3dct, 3oof and 3bej), non-protein molecules are cut, structures aligned so that positive direction of z-axis corresponded to (un)binding reaction coordinate and 22 x 22 x 30 A binding site was defined manually. The protein was threat as rigid during docking stage, and consecutive simulations of molecular dynamic (MD) addressed the flexibility problem.
For each of the 3 selected conformations of FXR protein up to 20 poses were generated with Autodock Vina (up to 60 totally).
5-'colours' atomic density voxel maps of size 21x21x30 cube unit (edge=1A) centered at the docking grid center were calculated for each docking pose (for details see py_box_gro.py in SuppInfo).
AI part is convolution neuron network (NN) with 9 layers and 503088 neurons.
For each complex, 8 parallel MD runs from the same starting point (docked pose) but different random seeds for velocities generation were carried out. 12 ns evolution runs followed by 4 ns production runs with detailed statistics collection were carried out with GROMACS.
The productive run collected statistics about 5 LIE parameters: Coulomb and Lennard-Jones interactions of compound, the site (residues, numeration from pdb3dct, 265, 270, 273, 284, 286, 287, 288, 290, 291, 294, 325, 328, 329, 331, 332, 335, 336, 352, 357, 365, 369, 384, 447, 451, 454, 461 and 469), rest_of_enviroment and coordinates to calculate compound entropy loss (Quasyharmonic approximation). For details of LIE model please check LigandScoringPotocol.txt.
The final binding pose is created by continuation of the best LIE-scored trajectory per compound 1-36 with (linear) simulated annealing from T=310K to T=77K (boiling of N2) in 0.3 ns.
Simulations of molecular dynamics (MD) are preferred for dealing with flexible targets like FXR.
Because it is expensive, the simulations are used for only a small affordable subset of candidates (here it is compounds 1-36 which anyway require extra treatment) and then a neuron network (NN) is trained to predict evolution of the complexes from a single docked pose using the simulated subset as the training set.
After NN is trained, it is used to rapidly predict MD evolution of numerous docked poses for each compound of even a huge set and only the most promising (poses of) candidates from entire set are studied with real simulations.
Additionally AI can be trained to estimate linear interaction energies (LIE) parameters from a docking pose and the predicted evolution of molecular complex, but this option was not used in this protocol.
So we docked each compound into every map with up to 20 replicas saved per compound. Among 3x20 docked poses the highest scored and the most 'similar' (smallest norm of differences between atomic density voxel maps of the binding site, for details see py_box_gro.py and py_box_compare.py in SuppInfo) conformation were simulated.
Then we trained NN - as the input structure was used a voxel map of docked compound and as the desired NN output was used a voxel map of the top scored (by LIE) structures after MD simulation. The NN was trained with y_eye using combination of Genetic global optimizer to shuffle convolution feature maps with crossingovers and then fit models with Conjugate gradient local optimizer. After NN reached its limit in predicting of voxel_maps_after_MD from voxel_maps_after_docking for the training set, it was used to speculate after_MD_voxel_maps for each docked pose of each compound. The speculated voxel maps after MD were checked for similarity with results of LIE training set (consist of known binders, see scoring protocol for details). This iterative process was looping till convergence i.e. when NN trained on entire set of MD outcomes can't speculate a new candidate from docked pool that was not analyzed with MD previously; it actually converges almost instantly - after two iterations.
After the convergence we selected the trajectory with highest estimated LIE energy and cooled it down to obtain a good pose from the most promising ensemble. The 1-36 frozen structures are enclosed.
For compounds 37-102, the initial pose was selected by trained NN among (up to) 60 docked conformations considering similarity of predicted after_MD_voxel_maps and actual voxel maps of the know binders. Because for scoring we don't need a structure but (equilibrium) ensemble, for compound 37-102 we present the selected initial pose for generative LIE protocol.