You are here

Data Notices

 

November 4, 2016- Review and Proposed Actions for False-Positive Association Results in ADSP Case-Control Data

Some SNVs in the publicly released whole exome sequence (WES) QCed “consensus-called” data (which systematically integrated genotype calls from two pipelines: Atlas at Baylor College of Medicine and GATK at the Broad Institute) may have biased genotype calls resulting from sequence data generated/processed at the Broad Institute. This issue was identified by follow-up on likely “false-positive” genetic associations with genome-wide statistical significance in case-control analysis. It is not yet clear if this issue also affects WGS data.

Detailed information can be found here.

 


 

October 28, 2016- dbGaP News phs000572.v7.p4 ADSP Data Use

This notice describes a new issue with the whole-exome (WES) and whole-genome sequence (WGS) concordant and consensus genotype files released to Authorized Access Users.
It was recently discovered that heterogeneity in pre-variant-calling BAM processing has resulted in systematic biases in the GATK WES variant calls, and thus in the released WES concordant and consensus genotype files. 
ADSP WES data were independently and uniformly reprocessed as part of the Atlas variant pipeline, so the Atlas WES variant calls do not manifest this bias. We are currently examining the impact of this heterogeneity on the WGS data, but expect that these biases will need to be addressed in all WGS variant sets. 
Any analyses performed on ADSP BAM files downloaded from dbGaP may manifest these biases unless the data were reprocessed from a pre-mapped state (i.e., FASTQs). 
The ADSP will release an updated data set for both WGS and WES in the December 2016 dbGAP release which may consist of HGSC Atlas-only called variants or the current data sets with additional filters applied.
Researchers using the current data set are reminded that there are many possible sources of heterogeneity in any dataset/analysis that may generate false positive results and that findings should be verified by looking closely at primary data and via replication.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer