Passer à la navigation principale Passer à la recherche Passer au contenu principal

Massively parallel processing of whole genome sequence data: An in-depth performance study

  • Abhishek Roy
  • , Yanlei Diao
  • , Uday Evani
  • , Avinash Abhyankar
  • , Clinton Howarth
  • , Rémi Le Priol
  • , Toby Bloom

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

This paper presents a joint effort between a group of computer scientists and bioinformaticians to take an important step towards a general big data platform for genome analysis pipelines. The key goals of this study are to develop a thorough understanding of the strengths and limitations of big data technology for genomic data analysis, and to identify the key questions that the research community could address to realize the vision of personalized genomic medicine. Our platform, called Gesall, is based on the new\Wrapper Technology" that supports existing genomic data analysis programs in their native forms, without having to rewrite them. To do so, our system provides several layers of software, including a new Genome Data Parallel Toolkit (GDPT), which can be used to \wrap" existing data analysis programs. This platform offers a concrete context for evaluating big data technology for genomics: we report on super-linear speedup and sublinear speedup for various tasks, as well as the reasons why a parallel program could produce different results from those of a serial program. These results lead to key research questions that require a synergy between genomics scientists and computer scientists to find solutions.

langue originaleAnglais
titreSIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data
EditeurAssociation for Computing Machinery
Pages187-202
Nombre de pages16
ISBN (Electronique)9781450341974
Les DOIs
étatPublié - 9 mai 2017
Evénement2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 - Chicago, États-Unis
Durée: 14 mai 201719 mai 2017

Série de publications

NomProceedings of the ACM SIGMOD International Conference on Management of Data
VolumePart F127746
ISSN (imprimé)0730-8078

Une conférence

Une conférence2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017
Pays/TerritoireÉtats-Unis
La villeChicago
période14/05/1719/05/17

Empreinte digitale

Examiner les sujets de recherche de « Massively parallel processing of whole genome sequence data: An in-depth performance study ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation