Building highly-optimized, low-latency pipelines for genomic data analysis

Research output: Contribution to conferencePaperpeer-review

Abstract

Next-generation sequencing has transformed genomics into a new paradigm of data-intensive computing. The deluge of genomic data needs to undergo deep analysis to mine biological information. Deep analysis pipelines often take days to run, which entails a long cycle for algorithm and method development and hinders future application for clinic use. In this project, we aim to bring big data technology to the genomics domain and innovate in this new domain to revolutionize its data crunching power. Our work includes the development of a deep analysis pipeline, a parallel platform for pipeline execution, and a principled approach to optimizing the pipeline. We also present some initial evaluation results using existing long-running pipelines at the New York Genome Center, as well as a variety of real use cases that we plan to build in the course of this project.

Original languageEnglish
Publication statusPublished - 1 Jan 2015
Externally publishedYes
Event7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 - Asilomar, United States
Duration: 4 Jan 20157 Jan 2015

Conference

Conference7th Biennial Conference on Innovative Data Systems Research, CIDR 2015
Country/TerritoryUnited States
CityAsilomar
Period4/01/157/01/15

Fingerprint

Dive into the research topics of 'Building highly-optimized, low-latency pipelines for genomic data analysis'. Together they form a unique fingerprint.

Cite this