dnil6-LigandScoringProtocol.txt

Name

In_house_machine_learning_score/Maestro/GOLD

Software

Maestro_Schrodinger Suite_2015-2/CCDC_Gold Suite_2016/Scikit-learn_0.17

Parameters

Use PH values listed in Data_set_fxr_crystallization_conditions.csv
autoscale = 0.2
GBDT(from Scikit-learn):params = {'n_estimators': 20000, 'max_depth': 8, 'min_samples_split': 6, 'learning_rate': 0.005, 'loss': 'ls', 'subsample': 0.7, 'max_features': 'sqrt', 'min_samples_leaf': 3, 'random_state': None}

Method

Use four sets of data as the training set. First set was chosen from PDBbind database, protein in each complex was similar to the FXR protein provided in the packet. Second set was chosen from RCSB Protein Data Bank, protein in each complex has similar binding site to the FXR protein. Third set was also chosen from RCSB Protein Data Bank, smile similarity search was perfomed to all ligands (FXR_1 - FXR_102) and as a result complexes with similar ligands were chosen. Fourth set was generated by combining PDBbind refined sets (v2007, v2013 and v2015). Gold suite was used to generate decoy sets for each complex. Structure and atom type based in house machine learning protocol was used to generate features for both training set and testing set. Gradient boosting tree regression was used to get predict values.

dnil6-PosePredictionProtocol.txt

Name

Maestro/Gold/Tscore

Software

Maestro Schrodinger Suite_2015-2/CCDC Gold Suite 2016

System Preparation Parameters

(prepwizard) -propka_pH # Use PH values listed in Data_set_fxr_crystallization_conditions.csv
(prepwizard) -fillsidechains -s # fillsidechains for target protein
(ligprep) -adjust_itc -ph # pH given in Data_set_fxr_crystallization_conditions.csv

System Preparation Method

Maestro's prepwizard was used to optimize the protein with PH value and fillsidechains option.
Maestro's ligprep was used to generate optimized 3d structure of ligands from 2d structure. Sample ligand was manually filled into the binding pocket as the reference ligand. Position of the reference ligand was set as binding site.

Pose Prediction Parameters

autoscale = 0.2
floodfill_center = cavity_from_ligand 8 atoms
gold_fitfunc_path = goldscore # use goldscore as scoring function to generate poses
num_poses = 500 # generate 500 poses as pose_pool for each complex

Pose Prediction Method

Use Gold to dock the ligand and target protein to generate 500 poses. After getting the pose_pool, use in house machine learning based score to rescore the pose_pool and choose 5 poses with minimum pairwise RMSD of 1.5 \AA as the candidates.