NG00118 - AMP-AD SV WGS and SV-xQTL

To access this data, please log into DSS and submit an application.
Within the application, add this dataset (accession NG00118) in the “Choose a Dataset” section.
Once approved, you will be able to log in and access the data within the DARM portal.

Description

Structural variants (SVs) were discovered in 1,760 donors by running a combination of seven different tools to capture the main classes of variation, including deletions (DEL), duplications (DUP), insertions (INS), inversions (INV), mobile element insertions (MEI), and complex rearrangements (CPX). We mapped associations of 25,421 SVs with MAF ≥ 0.01 in the ROS/MAP cohorts to four different molecular phenotypes in the DLPFC. These molecular phenotypes were measured for a partially overlapping set of samples and included gene expression for 15,582 genes (n=456), 110,092 splicing junctions proportions measured by “percent spliced in” values (PSI) (n=505), histone acetylation (H3K9ac) peaks (n=571), and proteomic data for 7,960 proteins (n=272). We refer to these analyses as SV-xQTL, in which we map differences in measurements of each molecular phenotype associated with specific SV’s. Therefore, each SV-xQTL is an SV-phenotype pair (i.e., SV-eQTL, SV-sQTL, SV-haQTL, or SV-pQTL). All phenotype measurements were adjusted prior to associations to account for known (e.g., sex and ancestry principal components) and unknown covariates, determined either with PEER (probabilistic estimation of expression residuals) or PCA (principal component analysis), and the allele alternative to the genome of reference was considered as effect allele. This identified 3,191 SV-eQTL, 2,866 SV-sQTL, 399 SV-pQTL, and 1,454 SV-haQTL (FDR < 0.05). Refer to the Related Publications tab for more details on the dataset.

Sample Summary per Data Type

Sample Set	Accession	Data Type	Number of Samples
AMP-AD WGS - SV Calls	snd10042	WGS	1,369

Available Filesets

Name	Accession	Latest Release	Description
AMP-AD SV Genotyping Data	fsa000041	NG00118.v1	SV Genotyping Data
AMP-AD SV-xQTL Association Summary Statistics	fsa000042	NG00118.v1	SV-xQTL Association Summary Statistics

View the File Manifest for a full list of files released in this dataset.

The AMP-AD WGS sampleset was sequenced on the Illumina HiSeqX sequencer (v2.5 chemistry) and made available from four aging and Alzheimer's disease cohorts: Religious Orders Study (ROS) and Memory and Aging Project (MAP), Mayo Clinic, and Mount Sinai Brain Bank (MSBB). Structural variation discovery quality control identified 46,197 SVs in Mayo Clinic (349 samples), 52,451 SVs in MSBB (305 samples), and 72,348 SVs in ROS/MAP (715 samples), totaling 170,966 across 1,369 samples. Subjects in ROS/MAP not consented for individual SV calls were removed from the individual level data submission but are still part of the aggregate association summary statistics.

Sample Set	Accession Number	Number of Subjects
AMP-AD WGS - SV Calls	snd10041	1,369

Consent Level	Number of Subjects
DS-ND-IRB-PUB-NPU	715
HMB-IRB-PUB	305
GRU-IRB-PUB	429

Visit the Data Use Limitations page for definitions of the consent levels above.

Total number of approved DARs: 4

Investigator:
Hatchwell, Eli
Institution:
Population Bio
Project Title:
Mutational Spectrum of Causal Genes for Neurological/Neurodegenerative Diseases and Endometriosis Identified via High Resolution Genome Wide Copy Number Analysis
Date of Approval:
September 7, 2023
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
While single gene rare variants have been shown to play a significant role in Early-Onset Alzheimer’s Disease (EOAD), their role in Late-Onset (LOAD) has not been emphasised. The gene discovery methodology we have developed at Population Bio allows for unbiased exploration of highly informative genomic variants in any cohort of interest. Our approach is based on ultra-high resolution copy number variant (CNV) analysis. We have invested heavily in such analysis on normal populations. These are used as comparators for cohorts of interest, such as LOAD. In our LOAD work, this analysis generated a list of CNVs which were either absent in the normal populations we studied or else present at significantly higher frequency in the LOAD cohort. Such CNVs are routinely annotated to determine if they overlie known genes and/or regulatory regions. As an example, we have discovered a deletion in 3% of our LOAD cases, which is present in <= 1% of normals. This deletion disrupts a transcription factor binding site in the intron of a gene, which, via GeneHancer, is known to control exon 1 of the gene. The gene in question is novel to LOAD, and is an important metabolic gene, with known biology. It is vital that we validate this finding by analysis of independent LOAD datasets. In addition, we wish to validate other genes discovered in the same manner We have very deep experience of analyzing WGS/WES datasets. Our focus will be to pull out of the available WGS/WES datasets all the variants for the candidate genes of interest. Such variants, including SNVs, indels and CNVs (called using a variety of tools we have experience with) will be analyzed by reference to databases of normal individuals: i.CNVs, by reference to our own internal database but also gnomad (https://gnomad.broadinstitute.org) CNV data and DGV (http://dgv.tcag.ca) ii.SNVs/indels, by reference to gnomad These analyses will allow us to determine whether there exists a mutational burden for our candidate genes of interest in independent LOAD cohorts, and will serve as validation/refutation. The main phenotype of interest will be definitive diagnoses of LOAD, based on neuropathological and clinical cognitive analyses
Non-Technical Research Use Statement:
Most of the common conditions that affect large numbers of the general population have a genetic basis. While progress has been rapid in the field of cancer, the same cannot be said for common, non-cancer, conditions, such as Late-Onset Alzheimer's Disease (LOAD). It is pretty clear now that not all cases of LOAD represent the same disease, in terms of what is the cause. Our approach has been to consider common diseases as collections of rare subgroups, each of which has a specific cause and which, in due course, will have a specific treatment. We have pioneered and implemented a method to rapidly uncover potentially causal genes in common disorders and will use the data generated from this study to strengthen our discoveries, by validating a set of novel candidate genes we have identified in LOAD Our project will allow us to: 1.Define subsets of disease 2.Work with pharmaceutical companies to develop drugs that will specifically target each subset of disease. In some cases, disease progression may be halted by the therapies developed. In some cases, reversal and/or cure may be possible
Investigator:
Roussos, Panagiotis
Institution:
Icahn School of Medicine at Mount Sinai
Project Title:
Higher Order Chromatin and Genetic Risk for Alzheimer's Disease
Date of Approval:
August 16, 2023
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Alzheimer's disease (AD) is the most common form of dementia and is characterized by cognitive impairment and progressive neurodegeneration. Genome-wide association studies of AD have identified more than 70 risk loci; however, a major challenge in the field is that the majority of these risk factors are harbored within non-coding regions where their impact on AD pathogenesis has been difficult to establish. Therefore, the molecular basis of AD development and progression remains elusive and, so far, reliable treatments have not been found. The overarching goal of this proposal is to examine and validate AD-related changes on chromatin accessibility and the 3D genome at the single cell level. Based on recent data from our group and others, we hypothesize that genotype-phenotype associations in AD are causally mediated by cell type-specific alterations in the regulatory mechanisms of gene expression. To test our hypothesis, we propose the following Specific Aims: (1) perform multimodal (i.e., within cell) profiling of the chromatin accessibility and transcriptome at the single cell level to identify cell type-specific AD-related changes on the 3D genome; (2) fine-map AD risk loci to identify causal variants, regulatory regions and genes; (3) functionally validate putative causal variants and regulatory sequences using novel approaches that combine massively parallel reporter assays, CRISPR and single cell assays in neurons and microglia derived from induced pluripotent stem cells; and (4) develop and maintain a community workspace that provides for the rapid dissemination and open evaluation of data, analyses, and outcomes. Overall, our multidisciplinary computational and experimental approach will provide a compendium of functionally and causally validated AD risk loci that has the potential to lead to new insights and avenues for therapeutic development.
Non-Technical Research Use Statement:
Alzheimer’s disease (AD) affects half the US population over the age of 85 and despite decades of research, reliable treatments for AD have not been found. The overarching goal of our proposal is to generate multiscale genomics (gene expression and epigenome regulation) data at the single cell level and perform fine mapping to detect and validate causal variants, transcripts and regulatory sequences in AD. The proposed work will bridge the gap in understanding the link among the effects of risk variants on enhancer activity and transcript expression, thus illuminating AD molecular mechanisms and providing new targets for future therapeutic development.
Investigator:
Wainberg, Michael
Institution:
Sinai Health System
Project Title:
Uncovering the causal genetic variants, genes and cell types underlying brain disorders
Date of Approval:
April 3, 2024
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
We propose a multifaceted approach to elucidate and interpret genetic risk factors for Alzheimer's disease. First, we propose to perform a whole-genome sequencing meta-analysis of the Alzheimer's Disease Sequencing Project with the UK Biobank and All of Us to associate rare coding and non-coding variants with Alzheimer's disease and related dementias. We will explore a variety of case definitions in the UK Biobank and All of Us, including those based on ICD codes from electronic medical records (inpatient, primary care and/or death), self-report of Alzheimer's disease or Alzheimer's disease and related dementias, and/or family history of Alzheimer's disease or Alzheimer's disease and related dementias. We will perform single-variant, coding-variant burden, and non-coding variant burden tests using the REGENIE genome-wide association study toolkit.Second, we propose to develop statistical and machine learning models that can effectively infer (“fine-map”) the causal gene(s), variant(s), and cell type(s) underlying each association we find, as well as associations from existing genome-wide association studies and other Alzheimer's- and aging-related cohorts found in NIAGADS. In particular, we propose to improve causal gene identification by incorporating knowledge of gene function as a complement to functional genomics. For instance, we plan to develop improved methods for inferring biological networks, particularly from single-cell data, and integrate these networks with the results of the non-coding associations from our first aim to fine-map causal genes. To fine-map causal variants and cell types, we plan to integrate the associations from our first aim with single-nucleus chromatin accessibility data from postmortem brain cohorts to simultaneously infer which variant(s) are causal for each discovered locus and which cell type(s) they act through.
Non-Technical Research Use Statement:
We have a comprehensive plan to understand and explain the genetic factors that contribute to Alzheimer's disease. Our approach involves two main steps.First, we'll analyze genetic information from large research databases to identify rare genetic changes associated with Alzheimer's and related memory disorders. We'll look at both specific changes in genes and other parts of the genetic code. We'll use data from different studies and combine them to get a clearer picture.Second, we'll create advanced computer models that can help us figure out which specific genes, genetic changes, and cell types are responsible for these associations. This will help us pinpoint the most important factors contributing to Alzheimer's disease. We'll also analyze data from previous studies to build a more complete understanding of these genetic links.
Investigator:
Zhao, Jinying
Institution:
University of Florida
Project Title:
Identifying novel biomarkers for human complex diseases using an integrated multi-omics approach
Date of Approval:
November 21, 2023
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
GWAS, WES and WGS have identified many genes associated with Alzheimer’s Dementia (AD) and its related traits. However, the identified genes thus far collectively explain only a small proportion of disease heritability, suggesting that more genes remained to be identified. Moreover, there is a clear gender and ethnic disparity for AD susceptibility, but little research has been done to identify gender- and ethnic-specific variants associated with AD. Of the many challenges for deciphering AD pathology, lacking of efficient and power statistical methods for genetic association mapping and causal inference represents a major bottleneck. To tackle this challenge, we have developed a set of novel statistical and bioinformatics approaches for genetic association mapping and multi-omics causation inference in large-scale ethnicity-specific epidemiological studies. The goal of this project is to leverage the multi-omics and clinical data archived by the ADSP, ADNI, ADGC as well as other AD-related data repositories to identify novel genes and molecular markers for AD. Specifically, we will (1) validate our novel methods for identifying novel risk and protective genomic variants and multi-omics causal pathways of AD; (2) identify novel ethnicity- and gender-specific genes and molecular causal pathways of AD. We will share our results, statistical methods and computational software with the scientific community.
Non-Technical Research Use Statement:
Although many genes have been associated with Alzheimer’s Dementia (AD), these genes altogether explain only a small fraction of disease etiology, suggesting more genes remained to be identified. Of the many challenges for deciphering AD pathology, lacking of power statistical methods represents a major bottleneck. To tackle this challenge, we have developed a set of novel statistical and bioinformatics approaches for genetic association mapping and multi-omics causation inference in large-scale ethnicity-specific epidemiological studies. The goal of this project is to leverage the rich genetic and other omic data along with clinical data archived by the ADSP, ADNI, ADGC as well as other AD-related data repositories to identify novel genes and molecular markers for AD. Such results will enhance our understanding of AD pathogenesis and may also serve as biomarkers for early diagnosis and therapeutic targets.

Acknowledgment statement for any data distributed by NIAGADS:

Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689), funded by the National Institute on Aging.

Use the study-specific acknowledgement statements below (as applicable):

For investigators using any data from this dataset:

Please cite/reference the use of NIAGADS data by including the accession NG00118.

For investigators using Integrating whole-genome sequencing with multi-omic data reveals the impact of structural variants on gene regulation in the human brain - Vialle et al. 2022 (sa000028) data:

We thank the participants of AMP-AD cohorts for their essential contributions and gift to these projects. ROSMAP study data were provided by the Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by National Institute on Aging (NIA) grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, U01AG61356, and the Illinois Department of Public Health. Mayo RNA-seq study data were provided by the following sources: the Mayo Clinic Alzheimer's Disease Genetic Studies, led by Dr. Nilufer Ertekin-Taner and Dr. Steven G. Younkin, Mayo Clinic, Jacksonville, Florida, using samples from the Mayo Clinic Study of Aging, the Mayo Clinic Alzheimer's Disease Research Center, and the Mayo Clinic Brain Bank. Data collection was supported through funding by NIA grants P50 AG016574, R01 AG032990, U01 AG046139, R01 AG018023, U01 AG006576, U01 AG006786, R01 AG025711, R01 AG017216, and R01 AG003949; National Institute of Neurological Disorders and Stroke (NINDS) grant R01 NS080820; the CurePSP Foundation; and support from Mayo Foundation. Study data include samples collected through the Sun Health Research Institute Brain and Body Donation Program of Sun City, Arizona. The Brain and Body Donation Program is supported by the National Institute of Neurological Disorders and Stroke (U24 NS072026, National Brain and Tissue Resource for Parkinson's Disease and Related Disorders), the NIA (P30 AG19610, Arizona Alzheimer's Disease Core Center), the Arizona Department of Health Services (contract 211002, Arizona Alzheimer's Research Center), the Arizona Biomedical Research Commission (contracts 4001, 0011, 05-901, and 1001 to the Arizona Parkinson's Disease Consortium), and the Michael J. Fox Foundation for Parkinson's Research. Mount Sinai Brain Bank (MSBB) data were generated from post-mortem brain tissue collected through the Mount Sinai VA Medical Center Brain Bank and were provided by Dr. Eric Schadt of the Mount Sinai School of Medicine through funding from NIA grant U01AG046170. The authors thank Dr. Bin Zhang and Dr. Erming Wang for assistance with data sharing, and members of the Raj and Crary labs for their feedback on the manuscript. We thank Jack Humphrey for his insightful comments and suggestions during this work. This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. We thank the Mount Sinai Technology Development core for help and support with performing long-read sequencing. The research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880.

Vialle RA, de Paiva Lopes K, Bennett DA, Crary JF, Raj T. Integrating whole-genome sequencing with multi-omic data reveals the impact of structural variants on gene regulation in the human brain. Nat Neurosci. 2022 Apr. doi: 10.1038/s41593-022-01031-7 PubMed link

NG00118 – AMP-AD SV WGS and SV-xQTL

Description

Data Available

Description

Sample Summary per Data Type

Available Filesets

Subject Information

Related Studies

Consent Levels

Approved Users

Acknowledgement

Acknowledgment statement for any data distributed by NIAGADS:

For investigators using any data from this dataset:

For investigators using Integrating whole-genome sequencing with multi-omic data reveals the impact of structural variants on gene regulation in the human brain - Vialle et al. 2022 (sa000028) data:

Related Publications

Cohorts