• Register
X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X

Leaving Community

Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.

No
Yes
X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

File Formats for Submitting SAMPL5 Distribution Coefficient Predictions

January 18, 2016

The distribution coefficient (DC) component of SAMPL5 comprises 53 compounds, broken into three batches: Batch 0 (13 compounds); Batch 1 (20 compounds); and Batch 2 (20 compounds). You should upload a separate prediction file for each prediction methodology, or protocol, that you have used. The experimental data for the distribution coefficients are being reported as log D, where the logarithm is (as is standard) in base 10 and the distribution coefficient measures Ccyclohexane/Cwater.

A prediction file is a plain text file which contains both your predictions, with uncertainty estimates as described below, and information about your computational protocol. The format requirements for this file is detailed below, and a sample file is available for download from the D3R website: DC-anytexthere-1.txt.

Note that you are free to submit multiple predictions, generated by different computational methods, for each dataset. Each prediction should be submitted separately, using a separate file.

If you are registered as “anonymous”, please note that any file names and files you submit are subject to public release, so you may want to avoid including identifying information in them.

Contents of a Prediction file

The name of a prediction file must begin with DC and must end with an integer indicating which of your predictions for this host it contains. For example, your first submission (even if you are only submitting one) might be DC-myname-1.txt, where myname is arbitrary text of your choice. If you use two prediction files (in two separate submissions) you might name them DC-myname-1.txt and DC-myname-2.txt.

This file will be machine parsed, so correct formatting is essential and incorrectly formatted submissions will likely be rejected.

Lines beginning with a hash-tag (#) may be included as comments. These, and blank lines, will be ignored when parsing these files.

The file must contain the following four components in the following order: your predictions, a name for your computational protocol, a list of the major software packages used, and a long-form methods description. Each of these components must begin with a line containing only the corresponding keyword: Predictions:, Name:, Software:, and Method:, as illustrated in the provided example file. Each of these four components is now described, but please refer to the sample file.

Predictions

Each non-commented, nonblank line in this component must contain the following four items, separated by commas:

  1. Compound identifier; e.g. SAMPL5_015 [required]
  2. Predicted log D value (as noted above, base 10 and with the cyclohexane concentration in the numerator) [required]
  3. Standard error of the mean of the predicted log D value. This is a measure of repeatability, or precision, of your calculation, not of its expected accuracy. This is most applicable to methods where there are clear measures of statistical error (such as simulation-based methods), but at least an estimate is required in all cases. [required]
  4. Model accuracy of the predicted log D value. This is your estimate of your expected error or accuracy in the predicted value itself due to potential model errors (rather than due to statistical error as in #3). For example, you might base this on expected force field accuracy for the type of compound considered, likely precision of force field parameters, and so on. This is not as well defined as the expected statistical precision (standard error in the mean) and a back-of-the-envelope or experience-based estimate is acceptable and may be unavoidable. [required]

This section must contain predictions for Batch 0, Batches 0-1, or Batches 0-2 (the complete set). Missing compounds in a submission will result in an error (except in the case of standardization runs, discussed below).

Name of computational protocol

The name of the protocol should be brief but informative, as illustrated in the example files. Ideally, it will say something about the nature of the method and the key parameters or settings, such as force field chosen or quantum chemistry level.

Major software packages used

List the name and version number of each major software package used in your protocol, one package per line.

Computational method

Please use this section to provide a long-form description of the computational method used to make the predictions. The level of detail should be at least as complete as that of a typical “Computational Details” section of a computational paper. Thus, for a simulation-based method, it should describe the sampling methodology and extent, the force field(s) used, the method used to extract thermodynamic results from the predictions (e.g., solvation free energies of the neutral species at infinite dilution in water and cyclohexane were used to estimate the distribution coefficient), and how statistical uncertainties were evaluated (e.g., statistical inefficiency or blocking analysis), and so on.

Submission of Standardization Run Results

If you are using MD with explicit solvent to compute binding affinities, you have, hopefully, applied your technology to the standard setups we provided, at minimum. To submit your results for these special cases, please create a prediction file whose name begins with DCStandard, such as DCStandard-myname.txt. This file should be structured as detailed above, but need not include the full set of standard cases. It should include prediction lines only for those standard setups you actually ran.

X

Are you sure you want to delete that component?