Abstract
Short variant calling in whole genome sequencing data is a practice focusing on single nucleotide substitutions—most commonly referred to as single nucleotide variants—and short indels, that is, classically up to 29bp in length. This practice has become widespread in genomics, taking advantage of the decreasing costs of short-read sequencing, and the resulting huge amount of short-read archives (SRAs) publicly available. It has already allowed important advances in the characterization of biodiversity, for example, by accelerating the implementation of phylogenomic and association studies. In theory, several tools must be combined to perform it with good reliability. However, integrated (all-in-one) pipelines are increasingly offered to end-users, so that people not trained in bioinformatics can take them up. It is becoming tempting for any biologist to launch large studies based on the ever-growing body of public sequencing data. All-in-one pipelines act either as a black box or as a palette of tools from which the user must choose. To limit major inferences, it is important that the user has a good understanding of the underlying tools. This chapter of the book aims to enlighten the naive user and to compile useful information for any user, naive or expert. We will clarify which tools are essential for calling variants in short-read sequencing data and which tools are likely to measure and improve their reliability, with an emphasis on decontamination. We will then present the properties of some all-in-one pipelines. We will focus on the performance of the tools and on best practices to consider for the study of large datasets.
| Original language | English |
|---|---|
| Title of host publication | Phylogenomics |
| Subtitle of host publication | Foundations, Methods, and Pathogen Analysis |
| Publisher | Elsevier |
| Pages | 219-250 |
| Number of pages | 32 |
| ISBN (Electronic) | 9780323998864 |
| ISBN (Print) | 9780323913096 |
| DOIs | |
| Publication status | Published - 1 Jan 2024 |
Keywords
- Aligner
- NGS
- WGS
- all-in-one pipelines
- benchmarking