TY - JOUR
T1 - Position paper
T2 - Common mistakes and solutions for a better use of correlation- and regression-based approaches in environmental sciences
AU - Tedoldi, Damien
AU - Kim, Boram
AU - Sandoval, Santiago
AU - Forquet, Nicolas
AU - Tassin, Bruno
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/8/1
Y1 - 2025/8/1
N2 - While empirical modelling remains a popular practice in environmental sciences, an alarming number of misuses of correlation- and regression-based techniques are encountered in recent research, although these techniques are described in courses and textbooks. This position paper reviews the most common issues, and provides theoretical background for understanding the interests and limitations of these methods, based on their underlying assumptions. We call for a reconsideration of misleading practices, including: the application of linear regression to data points that do not display a linear pattern, the failure to pinpoint influential points, the inappropriate extrapolation of empirical relationships, the overrated search for “statistical significance”, the pooling of data belonging to different populations, and, most importantly, calculations without data visualization. We urge reviewers to be vigilant on these aspects. We also recall the existence of alternative approaches to overcome the highlighted shortcomings, and thus contribute to a more accurate interpretation of the results.
AB - While empirical modelling remains a popular practice in environmental sciences, an alarming number of misuses of correlation- and regression-based techniques are encountered in recent research, although these techniques are described in courses and textbooks. This position paper reviews the most common issues, and provides theoretical background for understanding the interests and limitations of these methods, based on their underlying assumptions. We call for a reconsideration of misleading practices, including: the application of linear regression to data points that do not display a linear pattern, the failure to pinpoint influential points, the inappropriate extrapolation of empirical relationships, the overrated search for “statistical significance”, the pooling of data belonging to different populations, and, most importantly, calculations without data visualization. We urge reviewers to be vigilant on these aspects. We also recall the existence of alternative approaches to overcome the highlighted shortcomings, and thus contribute to a more accurate interpretation of the results.
KW - Bivariate analysis
KW - Data-driven modelling
KW - Empirical modelling
KW - Good practices
KW - Linear regression
KW - Statistical testing
UR - https://www.scopus.com/pages/publications/105007011420
U2 - 10.1016/j.envsoft.2025.106526
DO - 10.1016/j.envsoft.2025.106526
M3 - Article
AN - SCOPUS:105007011420
SN - 1364-8152
VL - 192
JO - Environmental Modelling and Software
JF - Environmental Modelling and Software
M1 - 106526
ER -