TY - JOUR
T1 - Deciphering the Code of Viral-Host Adaptation Through Maximum-Entropy Nucleotide Bias Models
AU - Di Gioacchino, Andrea
AU - Lecce, Ivan
AU - Greenbaum, Benjamin D.
AU - Monasson, Rémi
AU - Cocco, Simona
N1 - Publisher Copyright:
© 2025 The Author(s). Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
PY - 2025/6/1
Y1 - 2025/6/1
N2 - How viruses evolve largely depends on their hosts. To quantitatively characterize this dependence, we introduce Maximum Entropy Nucleotide Bias models (MENB) learned from single, di- and tri-nucleotide usage of viral sequences that infect a given host. We first use MENB to classify the viral family and the host of a virus from its genome, among four families of ssRNA viruses and three hosts. We show that both the viral family and the host leave a fingerprint in nucleotide motif usages that MENB models decode. Benchmarking our approach against state-of-the-art methods based on deep neural networks shows that MENB is rapid, interpretable and robust. Our approach is able to predict, with good accuracy, both the viral family and the host from a whole genomic sequence or a portion of it. MENB models also display promising out of sample generalization ability on viral sequences of new host taxa or new viral families. Our approach is also capable of identifying, within the limitations imposed by the three-host setting, intermediate hosts for well-known pathogenic strains of Influenza A subtypes and Human Coronavirus and recombinations and reassortments on specific genomic regions. Finally, MENB models can be used to track the adaptation to the new host, to shed light on the more relevant selective pressures that acted on motif usage during this process and to design new sequences with altered nucleotide usage at fixed amino-acid content.
AB - How viruses evolve largely depends on their hosts. To quantitatively characterize this dependence, we introduce Maximum Entropy Nucleotide Bias models (MENB) learned from single, di- and tri-nucleotide usage of viral sequences that infect a given host. We first use MENB to classify the viral family and the host of a virus from its genome, among four families of ssRNA viruses and three hosts. We show that both the viral family and the host leave a fingerprint in nucleotide motif usages that MENB models decode. Benchmarking our approach against state-of-the-art methods based on deep neural networks shows that MENB is rapid, interpretable and robust. Our approach is able to predict, with good accuracy, both the viral family and the host from a whole genomic sequence or a portion of it. MENB models also display promising out of sample generalization ability on viral sequences of new host taxa or new viral families. Our approach is also capable of identifying, within the limitations imposed by the three-host setting, intermediate hosts for well-known pathogenic strains of Influenza A subtypes and Human Coronavirus and recombinations and reassortments on specific genomic regions. Finally, MENB models can be used to track the adaptation to the new host, to shed light on the more relevant selective pressures that acted on motif usage during this process and to design new sequences with altered nucleotide usage at fixed amino-acid content.
KW - Coronaviridae
KW - Flaviviridae
KW - Orthomyxoviridae
KW - Picornaviridae
KW - avian
KW - human
KW - maximum entropy models
KW - nucleotide usage
KW - swine
KW - viral host adaptations
UR - https://www.scopus.com/pages/publications/105009532900
U2 - 10.1093/molbev/msaf127
DO - 10.1093/molbev/msaf127
M3 - Article
C2 - 40458044
AN - SCOPUS:105009532900
SN - 0737-4038
VL - 42
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 6
M1 - msaf127
ER -