Abstract
Next-generation sequencing has transformed genomics into a new paradigm of data-intensive computing. The deluge of genomic data needs to undergo deep analysis to mine biological information. Deep analysis pipelines often take days to run, which entails a long cycle for algorithm and method development and hinders future application for clinic use. In this project, we aim to bring big data technology to the genomics domain and innovate in this new domain to revolutionize its data crunching power. Our work includes the development of a deep analysis pipeline, a parallel platform for pipeline execution, and a principled approach to optimizing the pipeline. We also present some initial evaluation results using existing long-running pipelines at the New York Genome Center, as well as a variety of real use cases that we plan to build in the course of this project.
| Original language | English |
|---|---|
| Publication status | Published - 1 Jan 2015 |
| Externally published | Yes |
| Event | 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 - Asilomar, United States Duration: 4 Jan 2015 → 7 Jan 2015 |
Conference
| Conference | 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 |
|---|---|
| Country/Territory | United States |
| City | Asilomar |
| Period | 4/01/15 → 7/01/15 |