Enter Your Mutations:

Publication:

Publications that use these data should cite the following:


M.F. Rogers, T.R. Gaunt, C. Campbell (2020). CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome. Bioinformatics


Other relevant references:
Rogers MF, Shihab H, Gaunt TR, Campbell C (2017). CScape: a tool for predicting oncogenic single-point mutations in the cancer genome. Nature Scientific Reports

Input Format:


Our software accepts comma-separated mutation data in the following format:

  • Chromosome
  • Position
  • Reference Base
  • Mutant Base

For example:

11,219046,A,C
11,224139,A,T
11,375885,G,T
11,408898,A,T
11,499190,G,C
11,551832,C,A
11,607532,C,T
11,773638,A,T
11,800755,C,A
11,828599,C,G
11,988551,G,C
11,1025084,C,G
11,1027680,C,A
17,46827903,A,G
17,79060569,A,G
18,756761,C,A
18,3879501,C,A
19,407408,G,T
19,407519,G,C
19,407627,G,A
19,757693,C,A
19,757792,G,T
19,812882,G,T
2,45966,C,A
20,9048655,A,G
20,9923941,A,G
20,18479366,A,G
20,53170414,T,C
3,48265219,A,G
3,52848428,A,G
3,66659209,A,G
3,184195375,A,G
7,193598,C,T
9,916799,C,T
9,3324019,A,T
9,5050791,G,T
9,5077554,C,T
9,6013277,T,A
9,6550908,C,A
9,6554763,C,A

Note 1: 'Chr' is not required when defining the chromosome above (e.g. Chr1) and all our predictions are derived using the forward strand.

Note 2: All predictions are based on version GRCh37 (ENSEMBL release 87) of the human genome.

VCF files

The software also accepts Variant Call Format (VCF) files with up to 100,000 queries. This is a tab-delimited format that must have, at a minimum, these first five columns:

  1. Chromosome
  2. Position
  3. Identifier
  4. Reference Base
  5. Mutant Base

As an example, try the file: test.vcf


Back to Top ...

Prediction Interpretation:


CScape-somatic is designed to discriminate between cancer driver mutations that occur early in a tumour's development, and passenger variants that accumulate after a tumour starts to grow and metastasis begins.

Predictions are given as probability estimates, or p-scores in the range [0, 1]: values above 0.5 are predicted to be cancer drivers, while those below 0.5 are predicted to be passenger variants. P-scores close to the extremes (0 or 1) are the highest-confidence predictions that yield the highest accuracy.

We also apply cautious classification thresholds, defined as those thresholds that yield the highest possible accuracy (see our paper for details). These are reported using different thresholds for coding (0.89 or above) and noncoding (0.70 or above) SNVs.

We use distinct predictors for positions either in coding regions (positions within coding-sequence exons) and non-coding regions (positions in intergenic regions, introns or non-coding genes).

Downloads:


Please note: some files will not be available for download until the CScape-somatic method has passed peer review.

To run CScape queries locally, download the following files and run the query script as outlined below. Please note that you must have tabix installed to run the script.

Python query script (7.4KB)
Examples for testing script (0.5KB)
css_coding.vcf.gz (632MB)
css_coding.vcf.gz.tbi (631KB)
css_noncoding.vcf.gz (46GB)
css_noncoding.vcf.gz.tbi (2.3MB)

Usage: cscape_somatic_query.py query-file [options]

Predict the oncogenic potential of single nucleotide variants (SNVs).  The query
file must be a list of queries that use the following format:

chromosome,position,reference,mutant

Example:

1,69094,G,A
11,168961,T,A
18,119888,G,A

Options:
  -h, --help  show this help message and exit
  -c CDB      CScape coding database [default: css_coding.bed.gz]
  -n NDB      CScape noncoding database [default: css_noncoding.bed.gz]
  -o OUTPUT   Output file [default: stdout]
  -v          Verbose mode [default: False]

Docker Hub:

Training and test data, along with Python scripts for running LOCO-CV and ICGC tests, are also available as a Docker Hub container:
docker run -it mr13541/somatic:1.0
Back to Top ...

-->