Skip to main content
Skip table of contents

Data Submission

This page contains information about the process and documentation necessary to submit data to NIAGADS. Depending on the data size, a member from NIAGADS will work with you on data transfer. Contact help@niagads.org to deposit data or if you have any questions.

Required Policy Documents

Please email the following required documents to help@niagads.org in order to deposit and share your data:

  1. Institutional Certification for ADRD Studies that covers all subjects in your study. Multiple certifications may be required.

  2. Signed copy of the NIA AD Genomics Sharing Plan.

  3. Data Registration Template: DSS_Dataset_Registration_Template.doc.docx

All documents related to the application should be provided in English. For institutions where English is not the primary language, please provide translations of documents along with the original document. Translated documents should be signed by the institutional signing official.

Data Submission Checklist

In addition to the documentation above, all data-submissions must include the following:

  • md5 checksum for every submitted data file
  • README in plain text (.txt), PDF (.pdf), or Microsoft Word (.doc or .docx)
    • Description of the dataset and concise description of the study design
    • Platform or array
    • Any version information
    • List of included files and formats, and data dictionary
    • Contributor contact information
    • Dataset Reference Genome Build
    • Publications

Requirements for additional documentation for each datatype are located in the drop downs below. If you do not see your datatype listed below, please contact us at help@niagads.org for assistance.

Genotype

Click here to expand...
  • Phenotype Data File in tab delimited format (including pedigree structures if applicable)
  • APOE Genotypes (if applicable)
  • Genotypes in PLINK or VCF file format
  • Consent level as specified in the Institutional Certification form for each subject
  • List of cohorts included and a description for each

Summary statistics/association results

Click here to expand...
  • Results files in .txt format

Whole genome or whole exome sequencing

Click here to expand...
  • Sequencing read data can be submitted in any of formats:
    • FASTQ: please save all reads, including those that could not be mapped to the reference genome.
    • BAM: please save all reads, including those that could not be mapped to the reference genome.
    • CRAM: please save all reads, including those that could not be mapped to the reference genome.
    • VCF: standard VCF4.2 format (recommend split by chr and gz these)
  • Sequencing center
  • Sequencer machine
  • Read length
  • PCR Free or PCR Amplified?
  • Kit Name and version
  • Copy of the WES target regions (if applicable)
  • Sequencing quality control metrics
  • Phenotype Data File in tab delimited format (including pedigree structures if applicable)
  • APOE Genotypes (if applicable)
  • Genotypes in PLINK or VCF file format
  • Consent level as specified in the Institutional Certification form for each subject
  • List of cohorts included and a description for each

RNA-seq- or microarray data

Sequencing read data
  • Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. The BAM file should contain all reads, including those that could not be mapped to the reference genome
  • Phenotype Data File in tab delimited format (including pedigree structures if applicable)
  • In README File:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • RNA extraction protocol (e.g. Trizol/chloroform extraction, Qiagen RNeasy kit)
    • RNA integrity (RIN number) per sample
    • Library preparation protocol (i.e. polyA capture, adapters used for ligation, read length and sequencing machine, single cell platform)
  • Consent level as specified in the Institutional Certification form for each subject
  • List of cohorts included and a description for each
  • OPTIONAL: QC report per sample (i.e. library characteristics (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
Summary Data
  • Read abundance files can be submitted as summaries in tab-separated file format with explanations
  • Phenotype Data File in tab delimited format (including pedigree structures if applicable)
  • In README File:
    • Sample source and organism; provide protocol details if iPSCs
    • How the RAW data was generated and processed (steps needed, e.g., how mapping was done, how was multi-mapping handled)
    • Raw data and library preparation protocol information (e.g., polyA capture, sequencing machine)
    • Unit of quantification in these summary files (e.g., genes, exons, etc.)
    • Annotation source and version (e.g., ENSEMBL version 94)
    • Unit of counts (e.g., raw counts, RPKM values, UMI counts). Please provide details if normalization were performed, technical variations/batch effects were accounted for
    • Software name and version used to generate those counts
  • OPTIONAL: QC report per sample (i.e. library information (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
  • OPTIONAL: Highly recommend to send the workflow via code repository (e.g. github, bitbucket)

Epigenetics studies (e.g., ChIP-seq, ATAC-seq)

Sequencing Read Data
  • Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. Save all reads, including those that could not be mapped to the reference genome. Besides, must include background samples (input or mock IP samples)
  • Phenotype Data File in tab delimited format (including pedigree structures if applicable)
  • In README File:
    • Sample source and organism; provide protocol details if iPSCs
    • Library preparation protocol (i.e. adapters used for ligation, read length and sequencing machine)
  • Consent level as specified in the Institutional Certification form for each subject
  • List of cohorts included and a description for each
  • OPTIONAL: QC report per samples (i.e. Library size (total number of reads), GC content, % of Aligned reads, coverage, insert size)
Summary Data
  • Processed peak files can be submitted in BED format with explanations (including significance of called peaks)
  • Phenotype Data File in tab delimited format (including pedigree structures if applicable)
  • In README File:
    • Sample source and organism; provide protocol details if iPSCs
    • Description of all the BED columns
    • Software name and version used to make those values (e.g. how do you filter the reads before calling peaks, was narrow or broad peaks called, how was the p-value corrected if any)
  • OPTIONAL: QC report per samples (i.e. Library size (total number of reads), GC content, % of Aligned reads, coverage, insert size)
  • OPTIONAL: Highly recommend to send the workflow via some code repository (e.g. github, bitbucket)

Quantitative trait locus (QTL) analysis summary stats

Click here to expand...
  • Variant position: chr, start, end
  • Allele information: ref, alt, a1, a2
  • Feature name (e.g. gene name, protein name)
  • P-value and/or Q-value
  • Effect size (Beta and Beta SE), or Spearman correlation p value
  • OPTIONAL: Allele frequency or allele count
  • OPTIONAL: Feature location: chr, start, end
  • OPTIONAL: Cis/trans
  • OPTIONAL: in README file:
    • Detailed sample source, molecular trait and organism; provide protocol details if iPSCs
    • Description of all the columns
    • Software name and version used to perform the analyses

For RNA-seq- or microarray data (including single-cell data)

Sequencing Read Data
  • Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. If submitting BAM, save all reads, including those that could not be mapped to the reference genome
  • Phenotype Data File in tab-delimited format (including pedigree structures if applicable)
  • In README File:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • RNA extraction protocol (e.g. Trizol/chloroform extraction, Qiagen RNeasy kit)
    • RNA integrity (RIN number) per sample
    • Library preparation protocol (i.e. polyA capture, adapters used for ligation, read length and sequencing machine, single cell platform)
  • OPTIONAL: If BAM file, how was the data processed (e.g. how mapping was done, how was multi-mapping handled)
  • OPTIONAL: QC report per sample (i.e. library characteristics (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
  • OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)
Summary/Processed data
  • Read abundance files can be submitted as summaries in tab-separated file format with explanations
  • Phenotype Data File in tab delimited format (including pedigree structures if applicable)
  • In README File:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • How the RAW data was generated and processed (steps needed, e.g., how mapping was done, how was multi-mapping handled)
    • Raw data and library preparation protocol information (e.g., polyA capture, sequencing machine, single cell platform)
    • Unit of quantification in these summary files (e.g., genes, exons, etc.)
    • Annotation source and version (e.g., ENSEMBL version 94)
    • Unit of counts (e.g., raw counts, RPKM values, UMI counts). Provide details if normalization were performed, technical variations / batch effects were accounted for
    • Software name and version used to generate those counts
  • OPTIONAL: QC report per sample (i.e. library information (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
    OPTIONAL: Highly recommend to send the workflow via code repository (e.g. Github, bitbucket)

For epigenetics studies (e.g., ChIP-seq, ATAC-seq) (including single-cell data)

Sequencing Read Data
  • Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. If submitting BAM, save all reads, including those that could not be mapped to the reference genome. Besides, must include background samples (input or mock IP samples)
  • Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
  • In README File:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • Library preparation protocol (i.e. adapters used for ligation, read length and sequencing machine)
  • OPTIONAL: QC report per sample (e.g., library size (total number of reads), GC content, % of uniquely aligned reads, coverage, insert size)
  • OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)
Summary/Processed data
  • Processed peak call files can be submitted in BED format with explanations (including significance of called peaks)
  • For ATAC-seq or similar protocols, fragment files (in BED format) can be submitted
  • Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
  • In README File:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • Description of all the BED columns
    • Software name and version used to make those values and processing details (e.g., how reads were filtered before calling peaks, was narrow or broad peaks called, how was the p-value corrected)
  • OPTIONAL: QC report per samples (e.g, library size (total number of reads), GC content, % of uniquely aligned reads, coverage, insert size)
  • OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)

For Methylation data (e.g., methylation array, bisulfite sequencing)

Sequencing Read Data / Raw Methylation Data
  • Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. If submitting BAM, save all reads, including those that could not be mapped to the reference genome
  • Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
  • In README File:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • Library preparation protocol (i.e. adapters used for ligation, read length and sequencing machine)
  • OPTIONAL: QC report per sample (e.g., library size (total number of reads), % of uniquely aligned reads, coverage)
  • OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)
Summary/Processed data
  • Processed methylation sites/peak call files can be submitted in BED format with explanations (including significance of called peaks)
  • Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
  • In README File:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • Description of all the BED columns
    • Software name and version used to make those values and processing details (e.g., how reads were filtered before calling peaks, was narrow or broad peaks called, how was the p-value corrected)
  • OPTIONAL: QC report per samples (e.g, library size (total number of reads), GC content, % of uniquely aligned reads, coverage, insert size)
  • OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)

Proteomics data

Mass Spec Related
  • Files in one of the standard Mass Spectrometer Output File Format e.g. mzML, mzXML
  • A matrix of samples against peptide/protein information in txt format
  • Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
  • In README file:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • Quantification method (e.g. Label-free: intensity, TMT quantitation analysis)
    • Digestion Method (e.g. In-solution digestion, on-bead digestion)
    • Online LC system (e.g. Agilent 1100- nano LC system, Agilent HPLC 1200 system, Dionex UltiMate 3000)
    • Mass Spectrometer (e.g. LTQ Orbitrap, LTQ Orbitrap Velos, Q Exactive HF)
    • Protease (e.g. Trypsin)
    • Fragmentation method (e.g. CID resonance-type, CID beam-type, high-energy collision-induced dissociation
    • Peptide identification and annotation; protein annotation information
    • QC/normalization details and steps involved (including outlier detection)
  • OPTIONAL: QC report per sample
Protein Array Data
  • Read abundance files can be submitted as summaries in tab-separated file format with explanations
  • Provide UniprotID and Target protein name measured
  • For SOMAscan, provide SOMAScan RFU values (recommend both raw and processed)
  • For Olink, provide NPX values (recommend both raw and processed)
  • Phenotype Data File in tab delimited format (including pedigree structures if applicable)
  • In README file:
    • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
    • How the RAW data was generated (protein array platform, chip version)
    • Unit of quantification in these summary files (e.g., proteins.)
    • Annotation source and version (e.g., uniprot version xx)
    • Unit of counts (e.g., raw counts, RPKM values, UMI counts). Please provide details if normalization were performed, technical variations/batch effects were accounted for
    • Software name and version used to generate those counts
  • OPTIONAL: QC report per sample
  • OPTIONAL: Highly recommend to send the workflow via code repository (e.g. Github, bitbucket)

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.