TY - GEN
T1 - JuriBERT
T2 - 3rd Natural Legal Language Processing, NLLP 2021
AU - Douka, Stella
AU - Abdine, Hadi
AU - Vazirgiannis, Michalis
AU - Hamdani, Rajaa El
AU - Amariles, David Restrepo
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Language models have proven to be very useful when adapted to specific domains. Nonetheless, little research has been done on the adaptation of domain-specific BERT models in the French language. In this paper, we focus on creating a language model adapted to French legal text with the goal of helping law professionals. We conclude that some specific tasks do not benefit from generic language models pre-trained on large amounts of data. We explore the use of smaller architectures in domain-specific sub-languages and their benefits for French legal text. We prove that domain-specific pre-trained models can perform better than their equivalent generalised ones in the legal domain. Finally, we release JuriBERT, a new set of BERT models adapted to the French legal domain.
AB - Language models have proven to be very useful when adapted to specific domains. Nonetheless, little research has been done on the adaptation of domain-specific BERT models in the French language. In this paper, we focus on creating a language model adapted to French legal text with the goal of helping law professionals. We conclude that some specific tasks do not benefit from generic language models pre-trained on large amounts of data. We explore the use of smaller architectures in domain-specific sub-languages and their benefits for French legal text. We prove that domain-specific pre-trained models can perform better than their equivalent generalised ones in the legal domain. Finally, we release JuriBERT, a new set of BERT models adapted to the French legal domain.
M3 - Conference contribution
AN - SCOPUS:85130916067
T3 - Natural Legal Language Processing, NLLP 2021 - Proceedings of the 2021 Workshop
SP - 95
EP - 101
BT - Natural Legal Language Processing, NLLP 2021 - Proceedings of the 2021 Workshop
A2 - Aletras, Nikolaos
A2 - Androutsopoulos, Ion
A2 - Barrett, Leslie
A2 - Goanta, Catalina
A2 - Preotiuc-Pietro, Daniel
PB - Association for Computational Linguistics (ACL)
Y2 - 10 November 2021
ER -