TY - JOUR
T1 - MetGem Software for the Generation of Molecular Networks Based on the t-SNE Algorithm
AU - Olivon, Florent
AU - Elie, Nicolas
AU - Grelier, Gwendal
AU - Roussi, Fanny
AU - Litaudon, Marc
AU - Touboul, David
N1 - Publisher Copyright:
© Copyright 2018 American Chemical Society.
PY - 2018/12/4
Y1 - 2018/12/4
N2 - Molecular networking (MN) is becoming a standard bioinformatics tool in the metabolomic community. Its paradigm is based on the observation that compounds with a high degree of chemical similarity share comparable MS 2 fragmentation pathways. To afford a clear separation between MS 2 spectral clusters, only the most relevant similarity scores are selected using dedicated filtering steps requiring time-consuming parameter optimization. Depending on the filtering values selected, some scores are arbitrarily deleted and a part of the information is ignored. The problem of creating a reliable representation of MS 2 spectra data sets can be solved using algorithms developed for dimensionality reduction and pattern recognition purposes, such as t-distributed stochastic neighbor embedding (t-SNE). This multivariate embedding method pays particular attention to local details by using nonlinear outputs to represent the entire data space. To overcome the limitations inherent to the GNPS workflow and the networking architecture, we developed MetGem. Our software allows the parallel investigation of two complementary representations of the raw data set, one based on a classic GNPS-style MN and another based on the t-SNE algorithm. The t-SNE graph preserves the interactions between related groups of spectra, while the MN output allows an unambiguous separation of clusters. Additionally, almost all parameters can be tuned in real time, and new networks can be generated within a few seconds for small data sets. With the development of this unified interface (https://metgem.github.io), we fulfilled the need for a dedicated, user-friendly, local software for MS 2 comparison and spectral network generation.
AB - Molecular networking (MN) is becoming a standard bioinformatics tool in the metabolomic community. Its paradigm is based on the observation that compounds with a high degree of chemical similarity share comparable MS 2 fragmentation pathways. To afford a clear separation between MS 2 spectral clusters, only the most relevant similarity scores are selected using dedicated filtering steps requiring time-consuming parameter optimization. Depending on the filtering values selected, some scores are arbitrarily deleted and a part of the information is ignored. The problem of creating a reliable representation of MS 2 spectra data sets can be solved using algorithms developed for dimensionality reduction and pattern recognition purposes, such as t-distributed stochastic neighbor embedding (t-SNE). This multivariate embedding method pays particular attention to local details by using nonlinear outputs to represent the entire data space. To overcome the limitations inherent to the GNPS workflow and the networking architecture, we developed MetGem. Our software allows the parallel investigation of two complementary representations of the raw data set, one based on a classic GNPS-style MN and another based on the t-SNE algorithm. The t-SNE graph preserves the interactions between related groups of spectra, while the MN output allows an unambiguous separation of clusters. Additionally, almost all parameters can be tuned in real time, and new networks can be generated within a few seconds for small data sets. With the development of this unified interface (https://metgem.github.io), we fulfilled the need for a dedicated, user-friendly, local software for MS 2 comparison and spectral network generation.
U2 - 10.1021/acs.analchem.8b03099
DO - 10.1021/acs.analchem.8b03099
M3 - Article
C2 - 30335965
AN - SCOPUS:85056730347
SN - 0003-2700
VL - 90
SP - 13900
EP - 13908
JO - Analytical Chemistry
JF - Analytical Chemistry
IS - 23
ER -