You are here

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.

TitleSparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.
Publication TypeJournal Article
Year of Publication2020
AuthorsKuksa PP, Lee C-Y, Amlie-Wolf A, Gangadharan P, Mlynarski EE, Chou Y-F, Lin H-J, Issen H, Greenfest-Allen E, Valladares O, Leung YYee, San Wang L-
JournalBioinformatics
Volume36
Issue12
Pagination3879-3881
Date Published2020 06 01
ISSN1367-4811
KeywordsAlgorithms, Genome-Wide Association Study, Genomics, Quantitative Trait Loci, Software
Abstract

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources.
AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno.
CONTACT: lswang@pennmedicine.upenn.edu.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

DOI10.1093/bioinformatics/btaa246
Pubmed Linkhttps://www.ncbi.nlm.nih.gov/pubmed/32330239?dopt=Abstract
page_expoInternal
Alternate JournalBioinformatics
PubMed ID32330239
PubMed Central IDPMC7320617
Grant ListU24 AG041689 / AG / NIA NIH HHS / United States
U54 AG052427 / AG / NIA NIH HHS / United States
U01 AG032984 / AG / NIA NIH HHS / United States
T32 AG000255 / AG / NIA NIH HHS / United States

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer