# AURORA Retrospective

#### Necessary Disclaimers and Legal

The user is responsible for reviewing and complying with the license requirements of the data referenced in this documentation.

## Citations

The data in this project are downloaded from the Gene Expression Omnibus (GEO) from the retrospective phase of the [AURORA clinical study](https://auroraus.org/).

This dataset is described in the publication [Multiomics in primary and metastatic breast tumors from the AURORA US network finds microenvironment and epigenetic drivers of metastasis](https://www.ncbi.nlm.nih.gov/pubmed?cmd=DetailsSearch\&term=36585450\[PMID])

## Available datasets

The results published in these GEO datasets include both expression counts from RNAseq assays and methylation chip results for primary and metastatic breast cancer tumors.

### GSE193103

* [GSE193103](https://platform.dnanexus.com/panx/projects/J2g1gfQ0vKvPQqfz2PQ9XbjF/data/GSE193103) compares the gene expression patterns between primary and metastatic tumors.
* Platforms:
  * GPL11154 Illumina HiSeq 2000 (Homo sapiens)
  * GPL16791 Illumina HiSeq 2500 (Homo sapiens)
* For more information, see the [GEO page](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE193103)

### GSE212375

* [GSE212375](https://platform.dnanexus.com/panx/projects/J2g1gfQ0vKvPQqfz2PQ9XbjF/data/GSE212375) is a SuperSeries GEO accession that is composed of the SubSeries: [GSE209998](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE209998) and [GSE212370](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE212370) and includes comparative gene expression profiling analysis of RNA-seq data of breast primary and matched metastatic tumors as well as DNA methylation profiles in breast primary and matched metastatic tumors. See below for more in-depth descriptions of these SubSeries accessions.
* Platforms:
  * GPL16791 Illumina HiSeq 2500 (Homo sapiens)
  * GPL23976 Illumina Infinium HumanMethylation850 BeadChip
* For more information, see the [GEO page](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE212375) The folder GSE212375\_RAW in /GSE212375 is unzipped from GSE212375\_RAW\.tar and contains the IDAT format raw files from the methylation chips in GSE212370.

### GSE209998

* [GSE209998](https://platform.dnanexus.com/panx/projects/J2g1gfQ0vKvPQqfz2PQ9XbjF/data/GSE209998) is a comparative gene expression profiling analysis of RNA-seq data of breast primary and matched metastatic tumors.
* Platform: GPL16791 Illumina HiSeq 2500 (Homo sapiens)
* For more information, see the [GEO page](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE209998)

### GSE212370

* [GSE212370](https://platform.dnanexus.com/panx/projects/J2g1gfQ0vKvPQqfz2PQ9XbjF/data/GSE212370) contains DNA methylation profiles in breast primary and matched metastatic tumors were analyzed using the Infinium MethylationEPIC Array.
* Platform: Illumina Infinium HumanMethylation850 BeadChip
* For more information, see the [GEO page](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE212370)

## Metadata file formats

SOFT formatted family files are text files that incorporate complete data and metadata for all Platform, Sample, and Series records in the family.

MINiML formatted family files are XML files that incorporate complete data and metadata for all Platform, Sample and Series records in the family.

Series Matrix files are text files that include a tab-delimited value-matrix table generated from the "VALUE" column of each Sample, headed by Sample and Series metadata. \*.series\_matrix.csv files in this project are converted from \*series\_matrix.txt
