Sequencing Pipelines


Sequencing Data Generation

The three Large Scale Sequencing/Analysis Centers (LSACs) sequencing ADSP data include the Human Genome Sequencing Center at Baylor College of Medicine, Broad Institute, and Genome Institute at Washington University.  The three LSACs generated paired-end sequencing BAM files (currently available in dbGaP, phs000572) using the settings below.
BAM files are collected together from dbGaP, then called as project-level VCFs by Broad Institute and Baylor College of Medicine (these are intermediate files and not available to the public).  The ADSP Quality Control Work Group combines the two project-level VCF datasets and performs QC and concordance checks into an overall ADSP VCF file.

Sequencing Pipeline Tools and Parameters

Program Baylor Broad WashU
CASAVA 1.8.3 N/A 1.8.2
Reference GRCh37-lite GRCh37 (1kg version) GRCh37-lite
Aligner BWA 0.6.2 BWA 0.5.9-tpx BWA 0.5.9
Aligner Parameters defaults; -t 8 defaults; -t N -q 5 defaults; -t 4 -q 5
Sort/Dupe/Mates Picard 1.93 Picard (latest) Picard 1.46
Merge Picard 1.41 Picard (latest) Picard 1.46
GATK indels v2.5-2; 1kG, Mills, dbSNP137 v2.6-14 v2.4; 1kG, Mills, dbSNP137
GATK recal v2.5-2 v2.6-14 v2.4
ReduceReads no yes no


Whole-Exome Target Regions

Broad Institute used the Illumina Rapid Capture Exome (ICE) kit, download target regions.
Baylor and WashU used the Nimblegen's VCRome v2.1, download target regions.

