######################################################################################### ## This is a README file on the individual data giving consents to share with public. ## The study was originally called "Genomic and multi-tissue proteomic integration for understanding the biology of disease and other complex traits" ## at Medrixv https://www.medrxiv.org/content/10.1101/2020.06.25.20140277v1, ## but the study name has been changed to "Genomic and multi-tissue proteomic integration for understanding the genetic architecture of neurological diseases" in the current stage of peer-review. ######################################################################################### ######################################################################################### ## Overall, there are three parts of data to be shared in this repository: proteomics data, genotype data and consent information. ## Proteomics data are generated from three tissues, CSF, plasma, brain. ## Genotype data are measured with genotyping arrays (please check detailed array information in type-1b covariate tables). ## Consent information is curated for each participant with detailed future research category. ## The number of participants giving consents to share with public for CSF is 817; ## The number of participants giving consents to share with public for plasma is 528; ## The number of participants giving consents to share with public for brain is 343; ## 713 CSF proteins passed QC; ## 931 plasma proteins passed QC; ## 1079 brain proteins passed QC. ######################################################################################### ######################################################################################### ## As for part-1 proteomics data: ### There are three subtypes: a) proteomics-expression matrix; b) proteomics-covariate table; c) proteomics-annotation table. ######################################################################################### ### type-1a) proteomics-expression matrix (tab-delimited txt file) #### type-1a.1 proteomics_exprs_t1CSF_toSharePublic.txt #### type-1a.2 proteomics_exprs_t2plasma_toSharePublic.txt #### type-1a.3 proteomics_exprs_t3brain_toSharePublic.txt ##### content description: ##### samples (rows) by proteins (columns) ##### PA_DB_UID are sample IDs to be ready to share with public ##### proteins are denoted as SOMAseqID (see proteomics-annotation table below for details) ##### missing values are denoted as NA ######################################################################################### ### type-1b) proteomics-covariate table (tab-delimited txt file) #### type-1b.1 proteomics_covar_t1CSF_toSharePublic.txt #### type-1b.2 proteomics_covar_t2plasma_toSharePublic.txt #### type-1b.3 proteomics_covar_t3brain_toSharePublic.txt ##### content description: ##### samples (rows) by covariates (columns) ##### PA_DB_UID are sample IDs to be ready to share with public ##### columns are age, sex, and genotype_platform (dummy variables) ######################################################################################### ### type-1c) proteomics-annotation table (tab-delimited txt file) #### type-1c.1 proteomics_t1CSF_featureFile.txt #### type-1c.2 proteomics_t2plasma_featureFile.txt #### type-1c.3 proteomics_t3brain_featureFile.txt ##### content description: ##### proteins (rows) by annotations (columns) ##### SOMAseqID: SOMAmer's unique ID from SOMAscan platform, used in proteomics-expression matrix. ##### SeqId: SOMAmer's unique ID from SOMAscan platform with additional version of SOMAmers after '_'. ##### SomaId: SOMAmer's unique ID from SOMAscan platform and starting with "SL". ##### TargetFullName: SOMAmer binding to the target protein full names ##### Target: SOMAmer binding to the target protein short names ##### UniProt: protein ID from Uniprot database ##### EntrezGeneID: gene ID encoding the protein from NCBI database ##### EntrezGeneSymbol: gene symbol encoding the protein from NCBI database ######################################################################################### ## As for part-2 genotype data: ## It is in the genotype-GWAS plink binary format, and has three associated files (.bed, .bim, .fam) in total. ## PA_DB_UID are used for sampleID (FID/IID), fatherID and motherID. ######################################################################################### ## plink binary format are described in https://www.cog-genomics.org/plink/1.9/formats ### .bed (PLINK binary biallelic genotype table): https://www.cog-genomics.org/plink/1.9/formats#bed ### .bim (PLINK extended MAP file): https://www.cog-genomics.org/plink/1.9/formats#bim ### .fam (PLINK sample information file): https://www.cog-genomics.org/plink/1.9/formats#fam ######################################################################################### ## As for part-3 consent information: ## It is a csv file with two columns. ## column-1 as PA_DB_UID for sample IDs with both genotype and proteomics profiled from at least one tissue ## column-2 as FutResearchCat short for future research category #########################################################################################