# METABRIC

#### Necessary Disclaimers and Legal

The user is responsible for reviewing and complying with the license requirements of the data referenced in this documentation.

## Citations

The data in this project are downloaded from cBioPortal - study ID: [brca\_metabric](https://www.cbioportal.org/study/summary?id=brca_metabric)

This dataset is described in the publication [The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes](https://pmc.ncbi.nlm.nih.gov/articles/PMC4866047/).

## Data Availability

The tarball brca\_metabric.tar.gz is untarred to a folder named [brca\_metabric](https://platform.dnanexus.com/panx/projects/J3Bq2K80BQBp4F05q29YG4xP/data/brca_metabric). This folder includes all clinical information related to the study.

File [brca\_metabric\_clinical\_data.tsv](https://platform.dnanexus.com/panx/projects/J3Bq2K80BQBp4F05q29YG4xP/data/?name=brca_metabric_clinical_data.tsv) contains clinical data of all the patients/samples in the study. Data for each patient/sample was downloaded directly from cBioPortal.

Files in folder [METABRIC\_methylation\_profiles](https://platform.dnanexus.com/panx/projects/J3Bq2K80BQBp4F05q29YG4xP/data/METABRIC_methylation_profiles) are downloaded from the [github site](https://github.com/cclab-brca/METABRIC_methylation_profiles)

* promoter\_avg\_meth\_raw\.csv: Average methylation of promoters defined as follows:

1. Promoter = 500bp upstream, 50 downstream of Transcription Start Site (TSS).
2. Had coverage >= 20 in 70% of the tumor samples and 70% of the normal samples.

* promoter\_avg\_meth\_norm.csv: Promoter methylation as promoter\_avg\_meth\_raw\.csv after TME (tumor microenvironment) normalisation using Methylayer.

Files in folder [mutationalProfiles](https://platform.dnanexus.com/panx/projects/J3Bq2K80BQBp4F05q29YG4xP/data/mutationalProfiles) are downloaded from [the github site](https://github.com/cclab-brca/mutationalProfiles/tree/master/Data)

* somaticMutations.txt: Non-silent mutations for 2433 primary breast tumours. chr=chromosome. vaf=variant allele fraction. Codon changes are not described for insertion/deletion events.
* somaticMutations\_incNC.txt: As above, but includes silent mutations and non-exonic mutations. Mutations defined as being in ncRNAs (as determined by ASCAT) have locations =‘other’
* ascatSegments.txt: Allele-specific copy number as determined using ASCAT (Van Loo et al. 2010). chr=chromosome. nMajor=copies of major allele. nMinor=copies of minor allele. purity=aberrant cell fraction
* ascatSegments\_withoutCNVs.txt: Allele-specific copy number as determined using ASCAT (Van Loo et al. 2010). CNVs (defined in Curtis et al. 2012) were removed as described in Pereira et al. (2016). chr=chromosome. nMajor=copies of major allele. nMinor=copies of minor allele. purity=aberrant cell fraction.
* patientData.txt: Minimal dataset required to reproduce analyses in the main publication. ER and HER2 status determined by IHC and gene expression (see paper). Complete clinical data for the METABRIC dataset are available at cBioPortal.
* tumorIdMap.txt: Id mapping file to match samples in the published study with those from the previous METRIC studies.
* targetedIntervals.bed: Intervals that were targeted for sequencing in the study. 50bp padding was used for variant calling.
* allSamples.gistic\_curated.zip: GISTIC peaks obtained using segmented copy number data processed with ASCAT. Peaks curated manually; comments in file.

File [ncomms11479-s4.xlsx](https://platform.dnanexus.com/panx/projects/J3Bq2K80BQBp4F05q29YG4xP/data/?name=ncomms11479-s4.xlsx) is downloaded from the supplementary file of the publication ["The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes"](https://pmc.ncbi.nlm.nih.gov/articles/PMC4866047/) (PMID: 27161491). The file contains the matrix for mutations across all genes and samples. NA=no coding mutation. For inframe indels and missense SNVs, the distinction between recurrent and non-recurrent events is made.
