Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.
Please note that a number of procedures have been changed, relative to prior challenges, to facilitate the processing and evaluation of predictions.
For Stage 2, you can submit predictions for protein-ligand affinity scores or rankings, and/or binding free energies for the small free energy (FE) challenge sets. Each prediction must be submitted in the form of a gzipped tar (.tgz) file. The D3R website will provide separate upload options for files of the following four types:
Participants will select one of the above options with radio buttons on the D3R website.
If you make multiple predictions by different methods, such as by docking with several different energy functions, each prediction set must be in its own separate .tgz file.
For structure-based affinity and free energy predictions, the poses used must be provided along with the scores, even if the poses used for scoring are the same as the Stage 1 released structures. This is to simplify our recording and tracking of predictions. Although the poses in the Score tgz will not be evaluated, we ask that you provide them because they may be useful in explaining the quality of the scoring predictions.
SMILES string for ligand FXR_33: You may have noticed that the SMILES string for ligand FXR_33 does not match the ligand in the corresponding crystal structure. Roche scientists have explained to us that the ligand used to grow the crystal was in fact the N-oxide, but the pyridine appeared in the crystal structure instead. They attribute this either to oxidation occurring during the several days required for the crystal to grow, or possibly to the pyridine being present as an impurity (1% or less). We are considering whether to evaluate scoring of this ligand.
Also new in GC2, the names of the Score and Free Energy tgz files are now free-form; they are not required to include any particular text.
The following section details the required contents and format of these files. Additionally, blank template files for the two components of this challenge are available for download, as are examples of completed Score.tgz and Free_Energy.tgz files. Note that the example files contain artificial information, and thus serve only to illustrate the required contents and formats of a submission.
The information required for each type of tgz file is summarized in Figure 1 and detailed in the following subsections. Note that every tgz type requires a user information file (Userinfo.txt). Because this file is common to all tgz types, it is not addressed in the descriptions of each tgz type, but is instead detailed separately (see The User Information File, below), as is the general procedure for generating the tar files (see How to make your Score and Free Energy tar files, below). In addition, we now encourage you to submit a Supplementary Information directory as part of each Score or Free Energy tgz file; this may contain any input, data, script or other files that would help to us interpret and/or reproduce your results. Potential contents of a Supplementary Information directory are listed in a subsection below.
A final subsection below touches on some of the common validation errors we have seen, and how to avoid them.
A structure-based Score tgz file is used to submit one set of ligand scores or rankings generated by a structured-based method, along with information on how the predictions were made, the set of poses used, and the protocol used to generate the poses. A User Information file is required, and a Supplementary Information directory (see below) is optional but encouraged.
Each ligand scoring and ranking is described by two files: a ligand scoring protocol file, and a ligand scoring results comma-separated value (CSV) file, named LigandScoringProtocol.txt and LigandScore.csv respectively. The results file contains the scores and corresponding rankings generated by the protocol.
The ligand scoring protocol file must contain a brief, structured summary, in the form of a plain-text document, of how you scored the ligands according to predicted affinity for the target protein. A template file is provided for your convenience, as is a completed example file. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example:
Each item must begin with the appropriate keyword; respectively:
The ligand scoring results file lists your rankings and scores or energies of the binding strengths of the ligands to the target protein. Again, lines beginning with # will be treated as comments. As some scoring methods provide results interpretable as binding energies or free energies, while others provide scores without well-defined units, the first non-comment line of your file must specify whether you are providing energies or scores. This line must take one of the following forms:
If your results are given as energies, the units must be in units of kcal/mol.
Each subsequent non-comment line of the file comprises the identifier of one ligand; your ranking of the ligand within the set, where 1 corresponds to maximal affinity; and your computed binding energy, free energy, or score. These three items should be separated by commas.
The scoring file must contain a line for every ligand in the challenge. If you have not entered a prediction for a ligand, the corresponding line should have a placeholder: "inact" for compounds you identified as inactive, or "nopred" if you are not supplying a prediction for the compound for any reason.
Please refer to the template and example files; note that the template files are prefilled with the list of ligand identifiers, for your convenience.
The pose prediction protocol and the predicted poses should be provided, where only one pose is permitted for each protein-ligand pair, and this submitted pose should be the pose used to generate the score or ranking of the ligand. If you used multiple poses to generate ligand scores, please submit only one pose per ligand in this tgz file, and provide the rest of the poses in the Supplementary Information Directory (below).
The protocol file is named PosePredictionProtocol.txt, and it contains a brief, structured summary, in the form of a plain-text document, of your pose predictions methods. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example:
Each item must begin with the appropriate keyword; respectively:
Each pose prediction must be provided in the form of a protein structure PDB file and a corresponding ligand V2000 MDL mol file (see, e.g., http://en.wikipedia.org/wiki/Chemical_table_file) with 3D atomic coordinates for the pose, where the coordinates in the protein PDB file and the ligand molfile are in the same frame of reference. Any ligand coordinates provided in PDB format or included in the protein PDB files will be ignored. You may treat the protein as rigid or flexible, but you must rotationally and translationally superimpose all of your final structure predictions onto the reference protein structure provided in the challenge data package in order to facilitate evaluation of your predictions. We request molfiles to prevent problems with the parsing of ligand coordinates in PDB format, which arose in Grand Challenge 2015.
The file names of your pose prediction protein PDB and ligand mol files must be constructed as follows:
<PDB ID of initial protein structure>-<LigandID>.pdb
<PDB ID of initial protein structure>-<LigandID>.mol
Here <PDB ID of initial protein structure> is the PDB ID of the structure that you docked the ligand into; for example, it might be 3OMM. LigandID is the identifier of the ligand for this challenge; for example, it might be FXR_1. Thus, your pose prediction for ligand FXR_1, generated by docking into structure 3OMM, would be contained in the following two files:
If you used a Stage 1 crystal structure, such as 1HGAF.pdb, please leave the initial "1" off when constructing these filenames; e.g.,
Sample pose prediction protein PDB files and ligand mol files are included with these instructions, as are example and template files for PosePredictionProtocol.txt.
A ligand-based Score tgz file is used to submit one set of ligand scores or rankings generated by a method that does not use ligand-protein poses, along with information on how the predictions were made. It has the same contents as a structure-based Score file (above), except that it does not include the poses and pose-prediction protocol files.
A Free Energy tgz file is used to submit predictions of absolute or relative binding free energies for one of the free energy sets in the challenge. If you ran calculations for both free energy sets, you should still submit them separately. Note that you are free to submit both ligand scores for the full set of compounds (above), and free energy calculations for these two smaller sets. A Free Energy tgz file should contain free energy predictions in a CSV file, FreeEnergies.csv; and a protocol file, FreeEnergyProtocol.txt, explaining the methodology. If your calculations involved pose predictions by a docking method, then the free energy file should also include these pose predictions. A User Information file is required, and a Supplementary Information directory (see below) is optional but encouraged.
The FreeEnergyProtocol file must contain a brief, structured summary, in the form of a plain-text document, of how you scored the ligands according to predicted affinity for the target protein. A template file is provided for your convenience, as is a sample filled-out file. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example.
The FreeEnergies.csv file follows a format similar to that used for ligand scoring. Each line should list one ligand, followed by the predicted binding free energy (kcal/mol), and then by your estimate of the numerical uncertainty (standard error of the mean) in the prediction due to limitations in sampling. If you computed relative binding free energies, the results should all be referenced to the first ligand listed (see template), which hence should be assigned a free energy of 0.0 with zero uncertainty.
If your calculations relied on predicted poses, please provide these, along with the protocol used, as specified for the structure-based Score tgz file (above). Note that only one pose is permitted for each protein-ligand pair.
As noted above, every tgz file must include a User Information file. This is a text file named UserInfo.txt and containing five lines of text, as follows:
We are asking for this file to maximize clarity regarding the associations between submissions and submitters and research groups.
Every tgz file can optionally contain a Supplementary Information directory (folder), called SuppInfo, containing added files that would help interpret and reproduce your results. Examples of files you might provide include:
The files may have any names you like. To include this directory in your .tgz file, just include a SuppInfo directory with your files within the directory you tar up.
In order to enable automated processing of all submissions, we ask that you generate your Pose, Score and Free Energy tar files as follows: