DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

DoCM Principles

  • Highly curated lists of disease-causing mutations enable researchers and clinicians to foster collaboration and understand the current state of the art of pathogenic variation.
  • A centrallized public repository of pathogenic mutations allows for more comprehensive curation of the literature and reduces duplication of curation effort.
  • To promote DoCM's utility as well as collaboration between clinicians, researchers, and industry, DoCM's content is openly licensed under a Creative Commons license (CC BY 4.0), requiring only that attribution be given to the community that created the content.
  • No fees or exclusive access will be introduced.
  • DoCM will preserve previous versions of the database allowing access to static snapshots for the development of "locked down" assays.

DoCM Purpose

Curation of the literature to produce a high quality set of pathogenic somatic mutations is not trival. Sifting through the ever growing body of cancer research literature (6% annual growth rate in the last 10 years), with about 160,000 articles related to cancer indexed by PubMed in 2015. This volume of literature makes it difficult to identify bona fide somatic mutations with characterized functional or clinical significance in cancer. Once identified, these mutations require significant curation efforts to format and standardize the mutations in a consistent way that enables databasing. For example, publications often only specify the amino acid change and gene name to describe the mutation. DoCM addresses these challenges by acting as an accessible, open-source, and openly licensed repository that aggregates somatic mutations from other curated resources and the literature through community contributions.

Database Versions

Version Number Variant Count Disease Count
1 386 49
2 708 64
2.1 750 76
2.2 753 78
3 1340 114
3.1 1361 119
3.2 1364 122

Database Summary (current version)

DoCM Feature Type Count
Variant Types SNV 1302
DNV 27
DEL 27
Variant Effects missense 1276
stop_lost 45
frameshift 18
inframe 15
start_lost 5
synonymous 4
protein_altering_variant 1
Cancer Subtypes 122
Genes 132
Transcripts 184
Publications 876


A correspondence describing DoCM has been published in Nature Methods: DoCM: a database of curated mutations in cancer. Nature Methods (2016) doi:10.1038/nmeth.4000.


Ben Ainscough
Genome Institute profile

Malachi Griffith, PhD
Genome Institute profile

Obi Griffith, PhD
Genome Institute profile


DoCM by The McDonnell Genome Institute at Washington University School of Medicine is licensed under a Creative Commons Attribution 4.0 International License. Questions? Comments? Concerns? You can contact us here.