Passer à la navigation principale Passer à la recherche Passer au contenu principal

Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection

  • Louis Bethune
  • , David Grangier
  • , Dan Busbridge
  • , Eleonora Gualdoni
  • , Marco Cuturi
  • , Pierre Ablin
  • Apple Computer

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

A widespread strategy to obtain a language model that performs well on a target domain is to fine-tune a pretrained model to perform unsupervised next-token prediction on data from that target domain. Finetuning presents two challenges: (i) if the amount of target data is limited, as in most practical applications, the model will quickly over-fit, and (ii) the model will drift away from the original model, forgetting the pretraining data and the generic knowledge that comes with it. Our goal is to derive scaling laws that quantify these two phenomena for various target domains, amounts of available target data, and model scales. We measure the efficiency of injecting pretraining data into the finetuning data mixture to avoid forgetting and mitigate overfitting. A key practical takeaway from our study is that injecting as little as 1% of pretraining data in the finetuning data mixture prevents the model from forgetting the pretraining set.

langue originaleAnglais
Pages (de - à)4020-4042
Nombre de pages23
journalProceedings of Machine Learning Research
Volume267
étatPublié - 1 janv. 2025
Modification externeOui
Evénement42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Durée: 13 juil. 202519 juil. 2025

Empreinte digitale

Examiner les sujets de recherche de « Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation