Legal Domain Adaptation of Modern BERT Models
Summary
This paper explores domain adaptation of ModernBERT models in the legal domain by further pre-training on US court opinions, achieving significant improvements over the vanilla model and releasing the checkpoints publicly.
View Cached Full Text
Cached at: 06/30/26, 05:27 AM
# Legal Domain Adaptation of Modern BERT Models Source: [https://arxiv.org/abs/2606.28538](https://arxiv.org/abs/2606.28538) [View PDF](https://arxiv.org/pdf/2606.28538) > Abstract:We investigate domain adaptation of modern BERT models in the legal domain\. We further pre\-train ModernBERT on all US court opinions using the masked language modeling objective\. Although ModernBERT has been trained on roughly 500x more data than original BERT, we still find that this model benefits from further pre\-training and domain adaptation in the legal domain: we report significant improvements compared to vanilla ModernBERT on all datasets connected to US court opinions\. We find gains similar to those reported in early work on domain adaptation of BERT\-like models\. However, from scratch pre\-training does not match the performance of further pre\-training an existing ModernBERT checkpoint in our experiments\. The resulting models are capable of processing sequences up to 8,192 tokens, and can be used to compute meaningful embeddings of legal passages, or could quickly rerank hundreds of legal passages for a given search query\. We release all model checkpoints publicly\. ## Submission history From: Dominik Stammbach \[[view email](https://arxiv.org/show-email/6c06b8f6/2606.28538)\] **\[v1\]**Fri, 26 Jun 2026 18:44:11 UTC \(186 KB\)
Similar Articles
LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification
Researchers release LegalBench-BR, the first public benchmark for evaluating LLMs on Brazilian legal text classification, showing LoRA-fine-tuned BERTimbau dramatically outperforms GPT-4o mini and Claude 3.5 Haiku.
The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP
This paper introduces ChristBERT, a family of domain-specific RoBERTa-based language models for German clinical NLP, and evaluates three domain adaptation strategies (continued pre-training, pre-training from scratch, and vocabulary adaptation) on medical named entity recognition and text classification tasks, achieving state-of-the-art results.
I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]
Released en_legal_ner_ind_trf v0.1, an InLegalBERT model fine-tuned on 33,000 Indian Supreme Court judgments, achieving a 97.76% F1 score on case citations and significantly outperforming previous baselines.
A Causal Language Modeling Detour Improves Encoder Continued Pretraining
This paper demonstrates that switching from Masked Language Modeling to Causal Language Modeling during encoder adaptation improves downstream performance on biomedical texts. The authors release ModernBERT-bio and ModernCamemBERT-bio as state-of-the-art biomedical encoders.
DLawBench: Evaluating LLMs Through Multi-Turn Legal Consultation
DLawBench is a new benchmark for evaluating large language models in multi-turn legal consultation, covering Chinese and US law with four client types. Experiments show significant room for improvement, with the best model achieving only 0.562 on legal reasoning.