M.F. Rogers, T.R. Gaunt, C. Campbell (2020). CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome. Bioinformatics
Our software accepts comma-separated mutation data in the following format:
Chromosome
Position
Reference Base
Mutant Base
11,219046,A,C 11,224139,A,T 11,375885,G,T 11,408898,A,T 11,499190,G,C 11,551832,C,A 11,607532,C,T 11,773638,A,T 11,800755,C,A 11,828599,C,G 11,988551,G,C 11,1025084,C,G 11,1027680,C,A 17,46827903,A,G 17,79060569,A,G 18,756761,C,A 18,3879501,C,A 19,407408,G,T 19,407519,G,C 19,407627,G,A 19,757693,C,A 19,757792,G,T 19,812882,G,T 2,45966,C,A 20,9048655,A,G 20,9923941,A,G 20,18479366,A,G 20,53170414,T,C 3,48265219,A,G 3,52848428,A,G 3,66659209,A,G 3,184195375,A,G 7,193598,C,T 9,916799,C,T 9,3324019,A,T 9,5050791,G,T 9,5077554,C,T 9,6013277,T,A 9,6550908,C,A 9,6554763,C,A
Chromosome
Position
Identifier
Reference Base
Mutant Base
CScape-somatic is designed to discriminate between cancer driver
mutations that occur early in a tumour's development, and passenger
variants that accumulate after a tumour starts to grow and metastasis begins.
Python query script (7.4KB)
Predictions are given as probability estimates, or p-scores in the range [0, 1]: values above 0.5 are
predicted to be cancer drivers, while those below 0.5 are predicted to be passenger variants.
P-scores close to the extremes (0 or 1) are the highest-confidence predictions
that yield the highest accuracy.
We also apply cautious classification thresholds, defined as those thresholds
that yield the highest possible accuracy (see our paper for details).
These are reported using different thresholds for coding (0.89 or above)
and noncoding (0.70 or above) SNVs.
We use distinct predictors for positions either in coding regions (positions
within coding-sequence exons) and non-coding regions (positions in intergenic
regions, introns or non-coding genes).
Downloads:
Please note: some files will not be available for download until the CScape-somatic
method has passed peer review.
To run CScape queries locally, download the following files
and run the query script as outlined below. Please note that you must
have tabix installed
to run the script.
Examples for testing script (0.5KB)
css_coding.vcf.gz (632MB)
css_coding.vcf.gz.tbi (631KB)
css_noncoding.vcf.gz (46GB)
css_noncoding.vcf.gz.tbi (2.3MB)
Usage: cscape_somatic_query.py query-file [options]
Predict the oncogenic potential of single nucleotide variants (SNVs). The query
file must be a list of queries that use the following format:
chromosome,position,reference,mutant
Example:
1,69094,G,A
11,168961,T,A
18,119888,G,A
Options:
-h, --help show this help message and exit
-c CDB CScape coding database [default: css_coding.bed.gz]
-n NDB CScape noncoding database [default: css_noncoding.bed.gz]
-o OUTPUT Output file [default: stdout]
-v Verbose mode [default: False]
Docker Hub:
Training and test data, along with Python scripts for running LOCO-CV and ICGC tests, are also available as a Docker Hub container:
docker run -it mr13541/somatic:1.0
Back to Top ...