Who do I contact if I need help?
For questions regarding data management, community affairs, general DCC questions
- Christina Conrad, Biomedical Data Manager, Schedule Meeting For questions regarding bioinformatics, data model and annotations, data upload & technical issues:
- Anh Nguyet Vu, _Senior Biomedical Data Manager, Schedule Meeting For questions regarding working groups, data model and annotations
- Elvira Mitraka, Assocatiate Director of SCCE, Email, include
working groups
in subject for correct routing
For questions regarding data use/transfer agreements, other data governance - Kimberly Corrigan, Governance Analyst Email, include
governance
in subject for correct routing
Why should I share data?
- Engage and develop connections with the community
- Have a remote respository of data for future researchers
- Achieve visibility of your research
- Saves time and advances scientific discovery
- Many grants and journals now require open-access-data
Where will my data be stored?
Data is stored on synapse. There, you can organize your data by specific assays. FOr more info, see Submitting Data
Should I wait until a paper is published before sharing data?
You may upload your data and keep it stored privately until your paper is released if that is what you choose to do.
What is cBioportal?
cBioportal is an open-source interactive platform to visualize molecular and clinical attributes. For certain data sets on Synapse, visualization will be available on cBioportal.
What kind of data should be shared?
Omics data, imaging data, clinical data, or other types of data that are important to the experiment should be shared along with protocols to replicate those experiments. If you are unsure, please feel free to contact the DCC.
Can I use Synapse/Sage Bionetworks resources to fulfill the NIH data sharing plan requirements?
Yes, we are happy to help you work on a data sharing plan that will fulfill the NIH requirements.
What is a data model?
A data model organizes data elements and standardizes how the data elements relate to one another. It explicitly determines the structure of the data. -- Princeton University
Where does the Gray Foundation data model come from?
The Gray Foundation’s data model is derived from several data standards such as the Genomics Data Commons but has also been adapted to fit the needs of the consortium. It outlines, defines, and standardizes how data such as clinical data are represented and how they relate to one another, e.g. a patient has a diagnosis and receives therapy. One of the most important relations is of clinical data to generated data -- in Gray Foundation, most generated data are human data and need to be tied to the original patients for useful analysis.
The section Clinical Data explains what clinical data are prioritized.
In the data model, attributes are grouped into “components” or “modules”, e.g. patient-related attributes such as age
, sex
, etc., are in a patient core component.
Attributes appear as columnnar fields in a table when collecting data. They may be required or optional and may have controlled terminologies for the values.
What is metadata?
Metadata is additional, standardized information included alongside the data to give it context—data about the data, if you will. Metadata is what allows data in the portal to be searchable, discoverable, accessible, re-usable, and understandable to others, including those who were not involved in the data generation process. Metadata can be descriptive (i.e., the name of the file), administrative (i.e., provenance information), or research-based (i.e., information about the sampling and handling of data). -- AD Knowledge Portal Glossary
Metadata can also be thought of as "data about data", while clinical data can be thought of as "data about patients". On the Synapse platform, adding metadata to data entities (files) is most often called "annotating", and metadata is interchangeably called "annotations". The Dataset and File Metadata section goes into more detail what annotations are expected for datasets and different file types.
How do I submit an issue regarding the data model?
For questions/discussions, suggestions, and issues (bugs) regarding the data model, it is preferred that members submit an issue at our source repository. Note that this requires a GitHub account. If you do not have a GitHub account, please reach out to one our DCC staff listed in Contacts.
What is this acronym stand for?
Data Coordinating Center Words
ACL: Access Control List -- a list of users and teams that control the permissions to an entity
AR: Access Requirement or Access Restriction -- a condition for data access that must be met
BAM: Bidirectional Associative Memory
BCR: Biospecimen Core Resource
CNV: Copy Number Variation
DCC: Data Coordinating Center
eRA: Electronic Research Administration
MAGE-TAB: Microarray Gene Expression - Tabular format
PHI: Protected Health Information
TARGET: Therapeutically Applicable Research to Generate Effective Treatments
t-SNE: t-distributed stochastic neighbor embedding
TSV: Tab Separated Values
VCFS: Version Controlled File System
File Types
csv: Comma Separated Values
fastq: Text-based format for storing both a biological sequence and corresponding quality score
json: JavaScript Object Notation
maf: Mutation Annotation Format
rds: Ray Dream Studio (contains three-dimensional objects and animation settings)
tsv: Tab separated value
txt: Text
xml: Extensible Markup Language
Scientific Assays
CyCIF: Cyclic Immunofluorescence
CyTOF: Cytometry by time of flight
DLP+: DNA transposition single-cell library preparation
FACS: Fluorescence-activated cell sorting
FISH: Fluorescence in Situ Hybridization
H&E: Hematoxylin and eosin stain
IHC: Immunohistochemistry
inferCNV: Inferred copy number variation
scDNA: Single cell DNA sequencing
scRNA: Single cell RNA sequencing
scWGS: Single cell whole genome sequencing
t-CYCIF: Tissue-based cyclic immunofluorescence
TMAs: Tissue Microarrays
Breast Cancer Specific Words
AV: Alveolar Cells
BA: Basal Cells
BL: Borderline (both basal and luminal)
BRCA 1/2: Breast Cancer Type 1, 2
Cas9: Clustered regularly interspaced short palindromic repeats (CRISPR) associated protein 9
CNA: Copy Number Alteration
DCIS: Ductal carcinoma in situ
FFPE: Formalin-fixed, paraffin-embedded
HS: Hormone Sensing Cells
LOH: Loss of Heterozygosity
LP: Luminal Progenitors
LUM: Subset within ER+ mature luminal cells enriched in BRCA2 mutations
MECs: Mammary Epithelial Cell
ML: Mature Luminal
RNAi: RNA interference
ROS: Reactive Oxygen Species
SNV: Single nucleotide variant
TNBC: Triple Negative Breast Cancer
WOO: Window of opportunity clinical trials
WT: Wildtype
Web Applications
API: Application Programming Interface
CI/CD: Continuous integration/continuous delivery
DCA: Data Curator App
HTTP: Hypertext Transfer Protocol
REST: Representational State Transfer
URL: Universal Resource Locator
UUID: Universally Unique Identifier
Related Research Organizations
GDC: Genomic Data Commons
HTAN: Human Tumor Atlas Network
NCI: National Cancer Institute
NIH: National Institutes of Health
TCGA: The Cancer Genome Atlas