Legal Domain Adaptation of Modern BERT Models

arXiv cs.CL 06/30/26, 04:00 AM Papers

legal-domain-adaptation modernbert bert domain-adaptation nlp pre-training masked-language-modeling

Summary

This paper explores domain adaptation of ModernBERT models in the legal domain by further pre-training on US court opinions, achieving significant improvements over the vanilla model and releasing the checkpoints publicly.

arXiv:2606.28538v1 Announce Type: new Abstract: We investigate domain adaptation of modern BERT models in the legal domain. We further pre-train ModernBERT on all US court opinions using the masked language modeling objective. Although ModernBERT has been trained on roughly 500x more data than original BERT, we still find that this model benefits from further pre-training and domain adaptation in the legal domain: we report significant improvements compared to vanilla ModernBERT on all datasets connected to US court opinions. We find gains similar to those reported in early work on domain adaptation of BERT-like models. However, from scratch pre-training does not match the performance of further pre-training an existing ModernBERT checkpoint in our experiments. The resulting models are capable of processing sequences up to 8,192 tokens, and can be used to compute meaningful embeddings of legal passages, or could quickly rerank hundreds of legal passages for a given search query. We release all model checkpoints publicly.

Original Article

View Cached Full Text

Cached at: 06/30/26, 05:27 AM

# Legal Domain Adaptation of Modern BERT Models
Source: [https://arxiv.org/abs/2606.28538](https://arxiv.org/abs/2606.28538)
[View PDF](https://arxiv.org/pdf/2606.28538)

> Abstract:We investigate domain adaptation of modern BERT models in the legal domain\. We further pre\-train ModernBERT on all US court opinions using the masked language modeling objective\. Although ModernBERT has been trained on roughly 500x more data than original BERT, we still find that this model benefits from further pre\-training and domain adaptation in the legal domain: we report significant improvements compared to vanilla ModernBERT on all datasets connected to US court opinions\. We find gains similar to those reported in early work on domain adaptation of BERT\-like models\. However, from scratch pre\-training does not match the performance of further pre\-training an existing ModernBERT checkpoint in our experiments\. The resulting models are capable of processing sequences up to 8,192 tokens, and can be used to compute meaningful embeddings of legal passages, or could quickly rerank hundreds of legal passages for a given search query\. We release all model checkpoints publicly\.

## Submission history

From: Dominik Stammbach \[[view email](https://arxiv.org/show-email/6c06b8f6/2606.28538)\] **\[v1\]**Fri, 26 Jun 2026 18:44:11 UTC \(186 KB\)

Legal Domain Adaptation of Modern BERT Models

Similar Articles

LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

DLawBench: Evaluating LLMs Through Multi-Turn Legal Consultation

Submit Feedback

Similar Articles

LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

DLawBench: Evaluating LLMs Through Multi-Turn Legal Consultation