Massive genomic data processing and deep analysis

  • Abhishek Roy
  • , Yanlei Diao
  • , Evan Mauceli
  • , Yiping Shen
  • , Bai Lin Wu

Research output: Contribution to journalArticlepeer-review

Abstract

Today large sequencing centers are producing genomic data at the rate of 10 terabytes a day and require complicated processing to transform massive amounts of noisy raw data into biological information. To address these needs, we develop a system for end-to-end processing of genomic data, including alignment of short read sequences, variation discovery, and deep analysis. We also employ a range of quality control mechanisms to improve data quality and parallel processing techniques for performance. In the demo, we will use real genomic data to show details of data transformation through the workflow, the usefulness of end results (ready for use as testable hypotheses), the effects of our quality control mechanisms and improved algorithms, and finally performance improvement.

Original languageEnglish
Pages (from-to)1906-1909
Number of pages4
JournalProceedings of the VLDB Endowment
Volume5
Issue number12
DOIs
Publication statusPublished - 1 Jan 2012
Externally publishedYes

Fingerprint

Dive into the research topics of 'Massive genomic data processing and deep analysis'. Together they form a unique fingerprint.

Cite this