Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.
Procedures are the same as they were in Grand Challenge 2, except that "yes/no" question(s) have been added to the protocol files, and there are some new fields in FreeEnergyProtocol.txt.
For Stage 1, you can submit predictions for protein-ligand poses, protein-ligand affinity scores or rankings, and/or binding free energies for the small free energy (FE) challenge sets. Each prediction must be submitted in the form of a gzipped tar (.tgz) file. The D3R website will provide separate upload options for files of the following four types:
Participants will select one of the above options with radio buttons.
If you make multiple predictions by different methods, such as by docking with several different energy functions, each prediction set must be in its own separate .tgz file.
If you are predicting both a set of poses and a set of scores based on the poses, you must submit two files: a Pose tgz file and a separate structure-based Score file containing the poses used along with the scores, even though the poses used for scoring are the same as some of those in the Pose tgz file. This is to simplify our recording and tracking of predictions. We will separately evaluate the poses in the Pose tgz and the scores in the Score tgz. Although the poses in the Score tgz will not be evaluated, we ask that you provide them, because they may be useful in explaining the quality of the scoring predictions.
New to GC3 is the addition of some Yes/No questions to the Protocol files and some new fields in FreeEnergyProtocol.txt.
The following section details the required contents and format of these files. Additionally, blank template files for the three components of this challenge are available for download, as are examples of completed gc3_cats_pose.tgz, gc3_cats_scoreligand.tgz, gc3_cats_scorestructure.tgz, and gc3_cats_freeenergy.tgz files. Note that the example files contain artificial information, and thus serve only to illustrate the required contents and formats of a submission.
template files || pose example files || structure based score example files || ligand based score example files || free energy example files
Submissions that do not adhere to these requirements should be rejected by our submission system; we recommend that you leave time before the deadline to correct any technical errors that result during the submission process. If a file with technical errors does pass the automated validation step, we will do our best to interpret the submission and may contact you for help with this. However, if a file proves particularly problematic, it may be necessary to omit the submission from evaluation.
The information required for each type of tgz file is summarized in Figure 1 and detailed in the following subsections. Note that every tgz type requires a user information file (UserInfo.txt). Because this file is common to all tgz types, it is not addressed in the descriptions of each tgz type, but is instead detailed separately (see The User Information File, below), as is the general procedure for generating the tar files (see How to make your Pose, Score and Free Energy tar files, below). In addition, we now encourage you to submit a Supplementary Information directory as part of each Pose, Score or Free Energy tgz file; this may contain any input, data, script or other files that would help to us interpret and/or reproduce your results. Potential contents of a Supplementary Information directory are listed in a subsection below.
A Pose tgz file is used to submit one set of predicted protein-ligand poses, where up to five poses are permitted for each protein-ligand pair. Each Pose tgz file must contain a single docking protocol file, and, for each ligand, a minimum of one and up to 5 protein structure PDB files and 5 corresponding ligand MDL mol file poses predicted by this protocol. A User Information fie is required, and a Supplementary Information directory (see below) is optional but encouraged.
The protocol file is named PosePredictionProtocol.txt, and it contains a brief, structured summary, in the form of a plain-text document, of your pose predictions methods. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example:
Each item must begin with the appropriate keyword; respectively:
Each pose prediction must be provided in the form of a protein structure PDB file and a corresponding ligand MDL mol file (see, e.g., en.wikipedia.org/wiki/Chemical_table_file) with 3D atomic coordinates for the pose, where the coordinates in the protein PDB file and the ligand molfile are in the same frame of reference. Any ligand coordinates provided in PDB format or included in the protein PDB files will be ignored. You may treat the protein as rigid or flexible, but you must rotationally and translationally superimpose all of your final structure predictions, onto the reference protein structure provided in the challenge data package in order to facilitate evaluation of your predictions. We are asking for molfiles to prevent problems with the parsing of ligand coordinates in PDB format, which arose in Grand Challenge 2015.
The file names of your pose prediction protein PDB and ligand mol files must be constructed as follows:
<PDB ID of initial protein structure>-<LigandID>-<poseRank>.pdb
<PDB ID of initial protein structure>-<LigandID>-<poseRank>.mol
Here <PDB ID of initial protein structure> is the PDB ID of the structure that you docked the ligand into; for example, it might be TJYG. LigandID is the identifier of the ligand for this challenge; for example, it might be CatS_1. And poseRank is the rank of this pose among the poses you predicted for this ligand, where 1 is best and 5 is worst; if you predicted only one pose, assign it a poseRank of 1. Thus, your second-ranked pose prediction for ligand CatS_1, generated by docking into structure TJYG, would be contained in the following two files:
TJYG-CatS_1-2.mol
TJYG-CatS_1-2.pdb
Additionally, if you submit multiple poses for a ligand, then the first line of each molfile must take the form
REMARK <energy/score> <value>
For example, this line might be
REMARK energy -20.6
or
REMARK score 5.7
Energies must be in kcal/mol; scores may be in arbitrary units.
Sample pose prediction protein PDB files and ligand mol files are included with these instructions, as are example and template files for PosePredictionProtocol.txt.
A structure-based Score tgz file is used to submit one set of ligand scores or rankings generated by a structured-based method, along with information on how the predictions were made, the set of poses used, and the protocol used to generate the poses. A User Information file is required, and a Supplementary Information directory (see below) is optional but encouraged.
The pose prediction protocol and the predicted poses should be provided in the same manner as in a Pose tgz file (above), except that only one pose (not five) is permitted for each protein-ligand pair, and this should be the pose used to generate the score or ranking of the ligand.
Each ligand scoring and ranking is described by two files: a ligand scoring protocol file, and a ligand scoring results comma-separated value (CSV) file, named LigandScoringProtocol.txt and LigandScores.csv respectively. The results file contains the scores and corresponding rankings generated by the protocol.
The ligand scoring protocol file must contain a brief, structured summary, in the form of a plain-text document, of how you scored the ligands according to predicted affinity for the target protein. A template file is provided for your convenience, as is a sample filled-out file. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example:
Each item must begin with the appropriate keyword; respectively:
The ligand scoring results file lists your rankings and scores or energies of the binding strengths of the ligands to the target protein. Again, lines beginning with # will be treated as comments. Since some scoring methods provide results interpretable as binding energies or free energies, while others provide scores without well-defined units, the first non-comment line of your file must state whether you are providing energies or scores. This line must take one of the following forms:
Type: energy
or
Type: score
If your results are given as energies, the units must be in units of kcal/mol.
Each subsequent non-comment line of the file comprises the identifier of one ligand for the ligand in question; your ranking of the ligand within the set, where 1 corresponds to maximal affinity; and your computed binding energy, free energy, or score. These three items should be separated by commas.
The scoring file must contain a line for every ligand in the challenge. If you have not entered a prediction for a ligand, the corresponding line should have a placeholder: "inact" for compounds you identified as inactive, or "nopred" if you are not supplying a prediction for the compound for any reason. An example line with "inact" would be: CatS_100,inact,
Please refer to the template and example files; note that the template files are prefilled with the list of ligand identifiers, for your convenience.
A ligand-based Score tgz file is used to submit one set of ligand scores or rankings generated by a method that does not use ligand-protein poses, along with information on how the predictions were made. It has the same contents as a structure-based Score file (above), except that it does not include the poses and pose-prediction protocol files.
A Free Energy tgz file is used to submit predictions of absolute or relative binding free energies. Note that you are free to submit both ligand scores for the full set of compounds (above) and free energy calculations. A Free Energy tgz file should contain free energy predictions in a CSV file, FreeEnergies.csv; and a protocol file, FreeEnergyProtocol.txt, explaining the methodology. If your calculations involved pose predictions by a docking method, then the free energy file should also include these pose predictions. A User Information file is required, and a Supplementary Information directory (see below) is optional but encouraged.
The FreeEnergyProtocol file must contain a brief, structured summary, in the form of a plain-text document, of how you scored the ligands according to predicted affinity for the target protein. A template file is provided for your convenience, as is a sample filled-out file. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example.
Each item must begin with the appropriate keyword; respectively:
The FreeEnergies.csv file follows a format similar to that used for ligand scoring. Each line should list one ligand, followed by the predicted binding free energy (kcal/mol), and then by your estimate of the numerical uncertainty (standard error of the mean) in the prediction due to limitations in sampling. If you computed relative binding free energies, the results should all be referenced to the first ligand listed (see template), which hence should be assigned a free energy of 0.0 with zero uncertainty.
If your calculations relied on predicted poses, please provide these, along with the protocol used, as specified for the structure-based Score tgz file (above). Note that only one pose is permitted for each protein-ligand pair.
As noted above, every tgz file must include a User Information file. This is a text file named UserInfo.txt and containing five lines of text, as follows:
We are asking for this file to maximize clarity regarding the associations between submissions and submitters and research groups.
Every tgz file can optionally contain a Supplementary Information directory (folder), called SuppInfo, containing added files that would help interpret and reproduce your results. Examples of files you might provide include:
The files may have any names you like. To include this directory in your .tgz file, just include a SuppInfo directory with your files within the directory you tar up.
In order to enable automated processing of all submissions, we ask that you generate your Pose, Score and Free Energy tar files as follows: