• Register
X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X

Leaving Community

Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.

No
Yes
X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Cathepsin_S - Overview

Challenge timeframe: Sep 01, 2017 to Dec 01, 2017


The cathepsins constitute an 11-member family of proteases involved in protein degradation. Cathepsin S is highly expressed in antigen-presenting cells, where it degrades major histocompatiability complex class II (MHC II)-associated invariant chain. CatS is a candidate target for regulating immune hyper-responsiveness, as the inhibition of CatS may limit antigen presentation. [1-3].

Updates
2018-01-16    Cathepsin S Stage 2 answers updated for siginificant digits
2017-11-19    Cathepsin S Deadline - Dec 15, 2017 23:59PST
2017-11-13    A sharp-eyed participant has correctly pointed out that compound CatS_23 is identical to CatS_102. Janssen lists these as having the same IC50, as anticipated. To avoid bookkeeping complexities, we will ask that you include both compounds in the submission files.
2017-10-16    Submissions are now being accepted for Stage 1B with a closing date/time of 2017-10-23 23:59PST.
2017-10-12    Submission instructions, which includes template and example files are online now.
2017-10-10    A correction was made to the Stage 1B data download. The new docking_structures_stage1B_REVISED_20171010.tgz file fixed the PDBID of CatS_10, which should be "WCGQ"
2017-10-09    A new Stage 1b docking component, closing October 23, has been added. Go here for details.
2017-09-28    If you use the crystal structures provided in the GC3 data package for docking, please construct your Pose filenames with the following temporary PDB IDS: CAT1 for the SO4 structure and CAT2 for the DMSO structure.
2017-09-26    A minor change to the "CatS_initial_packet_to_participants_REVISED_20170921.zip" has been made. The line breaks of the "CatS_pose_compounds_D3R_GC3.csv" file were updated and saved to a new "CatS_initial_packet_to_participants_REVISED_20170926.zip".
2017-09-21    Our rerefinement of the Cathepsin S crystal structures revealed that three ligands for which the SMILES strings we were given listed a trimethyl group actually have a trifluoro group. We confirmed this correction with Janssen. The compounds are CatS_7, Cat_9, and CatS_14.
2017-09-21    Submission instructions, which includes template and example files are online now.
Stage 1 - Dataset packet provided: CatS_initial_packet_to_participants.zip
  • CatS_target_D3R_GC3.fasta: Protein sequence file of the Cathepsin S construct used in the IC50 experiments.
  • CatS_DMSO_structure_D3R_GC3.pdb: Crystal structures of a DMSO-bound Cathepsin S
  • CatS_SO4_structure_D3R_GC3.pdb: Crystal structures of a SO4-bound Cathepsin S
  • CatS_pose_compounds_D3R_GC3.csv: CSV file of 24 compounds and their corresponding SMILES string, target, and subchallenge name.
  • CatS_score_compounds_D3R_GC3.csv: CSV file of 136 compounds and their corresponding SMILES string, target, and subchallenge name.
  • CatS_FESet_compounds_D3R_GC3.csv: CSV file of 33 compounds selected from CatS_score_compounds_D3R_GC3.csv for explicit-solvent relative or absolute free energy calculations.

Note: No attempt was made to set appropriate starting conformations or optimal protonation or tautomer states for the ligands, or to generate alternative tautomer states. It is up to you to choose and set these states for your calculations.

Stage 1b - Dataset packet provided: docking_structures_stage1B.tgz

This folder contains 24 CatS crystallographic structures for docking, in addition to a CSV file containing mappings between ligand IDs and structure IDs:

File CatS_ligandID_structureID.csv displays which ligand was crystallized with each structure ID.

Files in structure ID files are labeled by the following syntax: pdbid-CatS_chain#.pdb

All structure ID files were aligned to chain A of the gabj structure.

Stage 2 - Dataset packet provided: CatS_stage1B_answers.tar.gz

This folder contains 24 refined CatS crystallographic structures provided to D3R from Janssen and re-refined by D3R crystallographers, in addition to a CSV file containing mappings between ligand IDs and structure IDs: File CatS_ligandID_structureID.csv displays which ligand was crystallized with each srtucture ID. Structure files were aligned to the chain A of the gabj structure.

For each of the 24 CatS crystallographic structures,
CatS_crystal_structures/ contains all 24 re-refined CatS crystal structures
ChainA/ contains unique files for each A chain from the 24 re-refined CatS crystal structures
chainB/ contains unique files for each B chain from the 24 re-refined CatS crystal structures
chainC/ contains unique files for each C chain from the 24 re-refined CatS crystal structures
chainD/ contains unique files for each D chain from the 24 re-refined CatS crystal structures

Template and Example packet: CatS-stage2_Submission_examples_and_templates.zip

General instructions on what to include in each subchallenge component can be found at https://drugdesigndata.org/about/grand-challenge-3-submission-instructions.

This packet includes "templates" and "examples" folders.
The "template" folder includes files where values have been removed.
The "examples" folder includes complete submissions that passed validation:
- scorestructure.tgz
- scoreligand.tgz
- freeenergy.tgz

Note: the data in the example TGZ files are merely for demonstration purposes and hold no scientific value.

General information on Cathepsin S

Cathepsin S

The cathepsins constitute an 11-member family of proteases involved in protein degradation. Cathepsin S is highly expressed in antigen-presenting cells, where it degrades major histocompatiability complex class II (MHC II)-associated invariant chain. CatS is a candidate target for regulating immune hyper-responsiveness, as the inhibition of CatS may limit antigen presentation. [1-3].

This data set comprises non-peptidic, non-covalent, small molecule inhibitors across a three order of magnitude range (nM to μM) of IC50s for CatS. Specifically, we provide 136 CatS inhibitors for affinity prediction, 24 for pose prediction, and 33 for free energy prediction.

Binding pocket information

The conformation of the CatS binding pocket can change, depending on the nature of the bound ligand. In particular, Phe211 in the S2 pocket adopts a different conformation, depending on the size of the P2 moiety of the ligand [4]. If the P2 moiety is small, Phe211 swings into a conformation that closes the entrance to the deeper part of the S2 pocket. However, a larger P2 moiety can induce Phe211 to swing open, allowing for ligand binding to the deeper portion of the S2 pocket [5].

Cathepsin S Binding assay conditions

For details of the binding assays, we were referred to publications from the Janssen group, which describe the following binding assays conditions [6]:

"In general, the assays were run using fluorescence resonance energy transfer-based substrates. For example, the cathepsin S, L, and L2 assays used the substrate (Aedens)EK ARVLAEAA(Dabcyl)K-amide and cathepsin S cleaves between amino acids Leu-6 and Ala-7. The fluorescence of the aedens group is quenched by the dabcyl moiety in the intact peptide. Upon cleavage by cathepsin S, the quenching is released and the fluorescence of the aedens group can be measured.

The cysteine CatS assays were run in 100-μl volume with a buffer consisting of 100 mM sodium acetate, pH 5.0, containing 100 mM NaCl and 1 mM dithiothreitol, except for cathepsin Z, which used 10 mM dithiothreitol and cathepsins E, D, and napsin where no dithiothreitol was present. The enzymes were mixed with 7.5 ml of buffer and then 75 μl was added to a Dynax black Mi- crofluor 2 plate. To this, 5 μl of a 20 μl solution of compound in 30% DMSO was added. This was followed by the addition of 20 μl of 5X substrate to initiate the reaction. In all cases, an 11 point 1:2 dilution of the compound was used at seven substrate concentrations (also 1:2). The increase in fluorescence was measured on a Cytofluor II (Applied Biosystems, Foster City, CA) with an excitation filter of 360/40 nm and an emission filter of 460/40 nm. A reading was made every minute for 20 to 60 min, depending on the assay, and the slope obtained from a linear regression of this time course was used as the reaction rate."

Cathepsin S Crystallization conditions

The structure files provided by Janssen, did not specify crystallization conditions. However, manuscripts provided to us that discuss PDB IDs 3iej, 3mpe, and 3kwn, list the following [1-3]:

"Crystallization conditions: 100MM sodium acetate, pH 4.5, 200mm ammonium acetate, 25% PEG 8000. Protein concentration 7 mg/ml, vapor diffusion, sitting drop, temperature 293 K."

References:

1. Ameriks MK, Bembenek SD, Burdett MT, et al (2010) Diazinones as P2 replacements for pyrazole-based cathepsin S inhibitors. Bioorg Med Chem Lett 20:4060-4064. doi: 10.1016/j.bmcl.2010.05.086

2. Wiener DK, Lee-Dutra A, Bembenek S, et al (2010) Thioether acetamides as P3 binding elements for tetrahydropyrido-pyrazole cathepsin S inhibitors. Bioorg Med Chem Lett 20:2379-2382. doi: 10.1016/j.bmcl.2010.01.103

3. Ameriks MK, Axe FU, Bembenek SD, et al (2009) Pyrazole-based cathepsin S inhibitors with arylalkynes as P1 binding elements. Bioorg Med Chem Lett 19:6131-6134. doi: 10.1016/j.bmcl.2009.09.014

4. Pauly TA, Sulea T, Ammirati M, et al (2003) Specificity Determinants of Human Cathepsin S Revealed by Crystal Structures of Complexes,. Biochemistry (Mosc) 42:3203-3213. doi: 10.1021/bi027308i

5. Markt P, McGoohan C, Walker B, et al (2008) Discovery of Novel Cathepsin S Inhibitors by Pharmacophore-Based Virtual High-Throughput Screening. J Chem Inf Model 48:1693-1705. doi: 10.1021/ci800101j

6. Thurmond RL, Sun S, Sehon CA, et al (2004) Identification of a Potent and Selective Noncovalent Cathepsin S Inhibitor. J Pharmacol Exp Ther 308:268-276. doi: 10.1124/jpet.103.056879

Cathepsin_S - Data Download

Challenge timeframe: Sep 01, 2017 to Dec 01, 2017


Stage 1 (08/31/2017 to 10/03/2017)
Please first
Stage 1B (10/09/2017 to 10/23/2017)
Please first
Stage 2 (10/24/2017 to 12/15/2017)
Please first

Cathepsin_S - Protocols

Challenge timeframe: Sep 01, 2017 to Dec 01, 2017


Please join the challenge and .

Cathepsin_S - Submissions

Challenge timeframe: Sep 01, 2017 to Dec 01, 2017


Please join the challenge and .

Cathepsin_S - Evaluation Results

Challenge timeframe: Sep 01, 2017 to Dec 01, 2017


Evaluation Results

Overviews

Last updated April 9, 2018

Pose Prediction

Pose predictions were evaluated in terms of symmetry-corrected root-mean-square deviations (RMSD, Å). Stage 1A is a cross-docking challenge while Stage 1B is a self- docking challenge. In the graphs, the pull-down menu allows display of the results for all submissions for each compound (e.g., CatS_1), the average over all compounds (Mean over all), or the median over all compounds (Median over all). The three buttons above the menu in the graphs allow display of the RMSD values for lowest RMSD pose among the maximum of five in each submission (Closest); of the average RMSD value across all five (Average); or of the RMSD value for the top-scoring pose in each submission (Pose 1). The tables and csv files provide the statistics averaged over ligands.
Cathepsin Stage 1A

Pose Predictions (partials)

Cathepsin Stage 1B

Pose Prediction

Affinity Rankings

This section presents metrics of the ability of the predictions to correctly rank ligands by affinity. The rankings were evaluated in terms of the Kendall's τ, Spearman's ρ, and estimated binding energies for the free energy sets were additionally evaluated in terms of centered root-mean-square error (RMSEc, kcal/mol) and Pearson's r. Uncertainties in these statistics (e.g., Kendall's τ Errors in the table) were obtained by recomputing them in 10,000 rounds of resampling with replacement, where, in each sample, the experimental IC50 or Kd data were randomly modified based on the experimental uncertainties. Experimental uncertainties are added to the free energy, ΔG, as a random offset δG drawn from a Gaussian distribution of mean zero and standard deviation RTln(Ierr). In this evaluation, the value of Ierr was set to 2.5.

For the kinases, a number of experimental Kd values were reported as ≥10 µM, making them difficult to include in standard metrics of ranking accuracy, so these cases were excluded from these affinity ranking assessments. However, they are included in the Active/Inactive Classification assessments, below.

Cathepsin Stage 1

Scoring (partials)
Free Energy Set

Cathepsin Stage 2

Scoring (partials)
Free Energy Set

VEGFR2

Scoring (partials)

JAK2 SC2

Scoring (partials)

p38-α

Scoring

Active / Inactive Classification

A number of experimental Kd values were reported as ≥ 10 µM, making them difficult to include in standard metrics of ranking accuracy. Instead, for the full set of compounds, we used the Matthews Correlation Coefficient, a classification statistic, to evaluate the ability of prediction methods to differentiate between the highest and lowest affinity ligands. In effect, we considered the Na compounds with measured Kd values <10 µM as active, and the Ni ones with measured Kd values >10 µM as inactive. We similarly assigned the Ni lowest-ranked ligands in each submission as inactive, and the top- ranked Na as active, and used the Matthews statistic to assess how well the predictions did at classifying the ligands.
VEGFR2

Scoring (partials)

JAK SC2

Scoring (partials)

p38-α

Scoring (partials)

Affinity Rankings for Cocrystalized Ligands

In order to test methods in a setting where errors in the ligand poses do not contribute to errors in the affinity predictions, all submissions from Stage 1 and 2 were reevaluated for the CathepsinS dataset using only the 19 ligands for which crystallographic poses had been provided in Stage 2 (CatS 1 to CatS 24, excluding CatS 7, 9, 14, and 21). The rankings of compounds by affinity were evaluated in terms of the Kendall's τ, Spearman's ρ, and the free energy sets were additionally evaluated in terms of centered root-mean-square error (RMSEc, kcal/mol) and Pearson's r. Uncertainties in these statistics (e.g., Kendall's t Errors in the table) were obtained by recomputing them in 10,000 rounds of resampling with replacement, where, in each sample, the experimental IC50 data were randomly modified based on the experimental uncertainties. Experimental uncertainties are added to the free energy, ΔG, as a random offset δG drawn from a Gaussian distribution of mean zero and standard deviation RTln(Ierr). In this evaluation, the value of Ierr was set to 2.5.
Cathepsin Stage 1

Scoring (partials)
Free Energy Set

Cathepsin Stage 2

Scoring (partials)
Free Energy Set

User Submissions

Further details of these procedures and results will be provided in an overview paper in the special issue of JCAMD.
All scripts used to evaluate submissions are publicly available on Github.
(partials) indicates submissions that do not include the full set of predictions
X

Are you sure you want to delete that component?