Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.
There are a few changes relative to prior Grand Challenges, as summarized below. Thus, the present instructions supersede those of prior Grand Challenges.
For Grand Challenge 4, you can submit predictions for protein-ligand poses, protein-ligand affinity scores or rankings, and/or binding free energies for the small free energy (FE) challenge sets. The D3R website will provide separate upload options for files of the following four types:
Each subchallenge/stage is different, so please check the subchallenge/stage overview for information about the prediction method options. The overviews will provide links to template and example files. The template files contain protocol files with all the required fields and CSV files with values removed. The example files are complete submissions with arbitrary data that will pass validation.
If you make multiple predictions by different methods, such as by docking with several different energy functions, each prediction set must be in its own separate .tgz file.
If you are predicting both a set of poses and a set of scores based on the poses, you must submit two files: a Pose tgz file and a separate structure-based Score file containing the poses used along with the scores, even though the poses used for scoring are the same as some of those in the Pose tgz file. This is to simplify our recording and tracking of predictions. We will separately evaluate the poses in the Pose tgz and the scores in the Score tgz. Although the poses in the Score tgz will not be evaluated, we ask that you provide them, because they may be useful in explaining the quality of the scoring predictions.
In GC4, we will be handling protocol files in a new way. To facilitate comparisons of well-defined protocols across different components of the challenge, and potentially across challenges. Protocol files will no longer be included in tgz files but will instead be uploaded separately. The submission pages will then allow you to associate each tgz file with the appropriate protocol files(s). The format and content of the protocols are unchanged from GC3, however.
The submission process requires two main steps: "Protocol Submission" and "tgz File Submission". These are detailed in the following sections. Submissions that do not adhere to these requirements will be rejected by our submission system; we recommend that you leave time before the deadline to correct any technical errors that result during the submission process. If a file with technical errors does pass the automated validation step, we will do our best to interpret the submission and may contact you for help with this. However, if a file proves particularly problematic, it may be necessary to omit the submission from evaluation.
The Protocols page has two areas, the "Manage Your Protocol Files" area where you can manage already uploaded protocols and the "Upload Your Protocol File" where you can upload new protocols. If you have uploaded a protocol, you will see information about the protocol file(s). Clicking on the protocol file name will pop up a window with the contents of the protocol file.
In the "Upload Your Protocol File" section, you will enter some information before uploading your file. First, click a protocol type radio button. Then, select your protocol file by dragging it into the "Drag file here" area, or find the file by clicking the "Add File". Click "Start Upload" to start the file upload and validation process. A successful submission will give a message "No errors", whereas an unsuccessful submission will give error messages below the file upload area.
The template and example files include protocol files. You should rename these to help you remember the file contents.
PosePredictionProtocol.txt contains a brief, structured summary, in the form of a plain-text document, of your pose predictions methods. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example:
Each item must begin with the appropriate keyword; respectively:
LigandScoringProtocol.txt contains a brief, structured summary, in the form of a plain-text document, of how you scored the ligands according to predicted affinity for the target protein. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example:
Each item must begin with the appropriate keyword; respectively:
FreeEnergyProtocol.txt contains a brief, structured summary, in the form of a plain-text document, of how you scored the ligands according to predicted affinity for the target protein. Lines beginning with a hash-tag (#) may be included as comments. The file must contain the following components, as illustrated in the template and example.
Each item must begin with the appropriate keyword; respectively:
The Submissions page has a similar design as the Protocols page. Successful submisions are listed at the top of the page, and new submissions can be added at the bottom of the page.
In the "Upload Your Data" section, you will enter some information before uploading your file. First, click a predication category radio button. Clicking the radio button should show one or more protocol file menus depending on the prediction category. The protocol menu contents are based upon the protocol files uploaded in the Protocols section, as described above. Next, there is a checkbox allowing you to choose whether this submission will be treated as anonymous. Finally, select your tgz submission file by dragging it into the "Drag file here" area, or find the file by clicking the "Add File". Click "Start Upload" to start the file upload and validation process. A successful submission will give a message "No errors", whereas an unsuccessful submission will give error messages below the file upload area.
The following section details the required contents and format of these files.
The information required for each type of tgz file is summarized in Figure 1 and detailed in the following subsections. Note that every tgz type requires a user information file (UserInfo.txt). Because this file is common to all tgz types, it is not addressed in the descriptions of each tgz type, but is instead detailed separately (see The User Information File, below), as is the general procedure for generating the tar files (see How to make your Pose, Score and Free Energy tar files, below). In addition, we now encourage you to submit a Supplementary Information directory as part of each Pose, Score or Free Energy tgz file; this may contain any input, data, script or other files that would help to us interpret and/or reproduce your results. Potential contents of a Supplementary Information directory are listed in a subsection below.
A Pose tgz file is used to submit one set of predicted protein-ligand poses, where up to five poses are permitted for each protein-ligand pair. Each Pose tgz file must contain, for each ligand, a minimum of one and up to 5 protein structure PDB files and 5 corresponding ligand MDL mol file poses predicted by a method described in the separately uploaded pose prediction protocol file. A User Information fie is required, and a Supplementary Information directory (see below) is optional but encouraged.
Each pose prediction must be provided in the form of a protein structure PDB file and a corresponding ligand MDL mol file (see, e.g., en.wikipedia.org/wiki/Chemical_table_file) with 3D atomic coordinates for the pose, where the coordinates in the protein PDB file and the ligand molfile are in the same frame of reference. Any ligand coordinates provided in PDB format or included in the protein PDB files will be ignored. You may treat the protein as rigid or flexible, but you must rotationally and translationally superimpose all of your final structure predictions, onto the reference protein structure provided in the challenge data package in order to facilitate preliminary evaluation of your predictions. We are asking for molfiles to prevent problems with the parsing of ligand coordinates in PDB format, which arose in Grand Challenge 2015.
The file names of your pose prediction protein PDB and ligand mol files must be constructed as follows:
<PDB ID of initial protein structure>-<LigandID>-<poseRank>.pdb
<PDB ID of initial protein structure>-<LigandID>-<poseRank>.mol
For instance, when docking to BACE1.pdb and BACE12.pdb, the following format would apply:
Please note the use of 0 in BA01. This allows us to ensure a 4 digit identifier for each structure. This will apply for the first 9 BACE structures. For stage 1B submissions, please align to any of the BACE structures the D3R released at the conclusion of Stage 1a.
Here <PDB ID of initial protein structure> is the PDB ID of the structure that you docked the ligand into; for example, it might be TJYG. LigandID is the identifier of the ligand for this challenge; for example, it might be CatS_1. And poseRank is the rank of this pose among the poses you predicted for this ligand, where 1 is best and 5 is worst; if you predicted only one pose, assign it a poseRank of 1. Thus, your second-ranked pose prediction for ligand CatS_1, generated by docking into structure TJYG, would be contained in the following two files:
Additionally, if you submit multiple poses for a ligand, then the first line of each molfile must take the form
REMARK <energy/score> <value>
For example, this line might be
REMARK energy -20.6
REMARK score 5.7
Energies must be in kcal/mol; scores may be in arbitrary units.
Sample pose prediction protein PDB files and ligand mol files are included with these instructions, as are example and template files for PosePredictionProtocol.txt.
Historically, our alignment tool has had difficulties with negative residue numbering, so please include only positive integer residue numbers in your PDB files. In addition, nonstandard characters in some submitted PDB files have been difficult to parse, so please include only standard alphanumeric characters.
A structure-based Score tgz file is used to submit one set of ligand scores or rankings generated by a structured-based method and the set of poses used. Separately uploaded protocol files will detail how the predictions were made, and the protocol used to generate the poses. A User Information file is required, and a Supplementary Information directory (see below) is optional but encouraged.
The predicted poses should be provided in the same manner as in a Pose tgz file (above), except that: 1) only one pose (not five) is permitted for each protein-ligand pair, and this should be the pose used to generate the score or ranking of the ligand; 2) there is no need to superimpose your cocrystal structure on any reference structure. If you used multiple docked poses to predict your scores (e.g. ensemble docking), just submit a single pose for each ligand, and feel free to submit the additional poses as a supplementary file.
Each ligand scoring and ranking is described by two files: a ligand scoring protocol file, and a ligand scoring results comma-separated value (CSV) file, named LigandScoringProtocol.txt and LigandScores.csv respectively. The results file contains the scores and corresponding rankings generated by the protocol.
The ligand scoring results file lists your rankings and scores or energies of the binding strengths of the ligands to the target protein. Again, lines beginning with # will be treated as comments. Since some scoring methods provide results interpretable as binding energies or free energies, while others provide scores without well-defined units, the first non-comment line of your file must state whether you are providing energies or scores. This line must take one of the following forms:
If your results are given as energies, the units must be in units of kcal/mol.
Each subsequent non-comment line of the file comprises the identifier of one ligand for the ligand in question; your ranking of the ligand within the set, where 1 corresponds to maximal affinity; and your computed binding energy, free energy, or score. These three items should be separated by commas.
The scoring file must contain a line for every ligand in the challenge. If you have not entered a prediction for a ligand, the corresponding line should have a placeholder: "inact" for compounds you identified as inactive, or "nopred" if you are not supplying a prediction for the compound for any reason. An example line with "inact" would be: CatS_100,inact,inact
Please refer to the template and example files; note that the template files are prefilled with the list of ligand identifiers, for your convenience.
A ligand-based Score tgz file is used to submit one set of ligand scores or rankings generated by a method that does not use ligand-protein poses, along with information on how the predictions were made. It has the same contents as a structure-based Score file (above), except that it does not include the poses. In addition, it is not associated with a pose-prediction protocol files.
A Free Energy tgz file is used to submit predictions of absolute or relative binding free energies. Note that you are free to submit both ligand scores for the full set of compounds (above) and free energy calculations. A Free Energy tgz file should contain free energy predictions in a CSV file, FreeEnergies.csv. It should be associated with a separately uploaded free energy protocol file. If your calculations involved pose predictions by a docking method, then the free energy file should also include these pose predictions. A User Information file is required, and a Supplementary Information directory (see below) is optional but encouraged.
The FreeEnergies.csv file follows a format similar to that used for ligand scoring. Each line should list one ligand, followed by the predicted binding free energy (kcal/mol), and then by your estimate of the numerical uncertainty (standard error of the mean) in the prediction due to limitations in sampling. If you computed relative binding free energies, the results should all be referenced to the first ligand listed (see template), which hence should be assigned a free energy of 0.0 with zero uncertainty.
If your calculations relied on predicted poses, please provide these, along with the protocol used, as specified for the structure-based Score tgz file (above). Note that only one pose is permitted for each protein-ligand pair and no superimposition to a reference structure is required.
As noted above, every tgz file must include a User Information file. This is a text file named UserInfo.txt and containing six lines of text, as follows:
We are asking for this file to maximize clarity regarding the associations between submissions and submitters and research groups.
Every tgz file can optionally contain a Supplementary Information directory (folder), called SuppInfo, containing added files that would help interpret and reproduce your results. Examples of files you might provide include:
The files may have any names you like. To include this directory in your .tgz file, just include a SuppInfo directory with your files within the directory you tar up.
In order to enable automated processing of all submissions, we ask that you generate your Pose, Score and Free Energy tar files as follows: