Legal Domain Adaptation of Modern BERT Models

arXiv cs.CL Papers

Summary

This paper explores domain adaptation of ModernBERT models in the legal domain by further pre-training on US court opinions, achieving significant improvements over the vanilla model and releasing the checkpoints publicly.

arXiv:2606.28538v1 Announce Type: new Abstract: We investigate domain adaptation of modern BERT models in the legal domain. We further pre-train ModernBERT on all US court opinions using the masked language modeling objective. Although ModernBERT has been trained on roughly 500x more data than original BERT, we still find that this model benefits from further pre-training and domain adaptation in the legal domain: we report significant improvements compared to vanilla ModernBERT on all datasets connected to US court opinions. We find gains similar to those reported in early work on domain adaptation of BERT-like models. However, from scratch pre-training does not match the performance of further pre-training an existing ModernBERT checkpoint in our experiments. The resulting models are capable of processing sequences up to 8,192 tokens, and can be used to compute meaningful embeddings of legal passages, or could quickly rerank hundreds of legal passages for a given search query. We release all model checkpoints publicly.
Original Article
View Cached Full Text

Cached at: 06/30/26, 05:27 AM

# Legal Domain Adaptation of Modern BERT Models
Source: [https://arxiv.org/abs/2606.28538](https://arxiv.org/abs/2606.28538)
[View PDF](https://arxiv.org/pdf/2606.28538)

> Abstract:We investigate domain adaptation of modern BERT models in the legal domain\. We further pre\-train ModernBERT on all US court opinions using the masked language modeling objective\. Although ModernBERT has been trained on roughly 500x more data than original BERT, we still find that this model benefits from further pre\-training and domain adaptation in the legal domain: we report significant improvements compared to vanilla ModernBERT on all datasets connected to US court opinions\. We find gains similar to those reported in early work on domain adaptation of BERT\-like models\. However, from scratch pre\-training does not match the performance of further pre\-training an existing ModernBERT checkpoint in our experiments\. The resulting models are capable of processing sequences up to 8,192 tokens, and can be used to compute meaningful embeddings of legal passages, or could quickly rerank hundreds of legal passages for a given search query\. We release all model checkpoints publicly\.

## Submission history

From: Dominik Stammbach \[[view email](https://arxiv.org/show-email/6c06b8f6/2606.28538)\] **\[v1\]**Fri, 26 Jun 2026 18:44:11 UTC \(186 KB\)

Similar Articles

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

Hugging Face Daily Papers

This paper demonstrates that switching from Masked Language Modeling to Causal Language Modeling during encoder adaptation improves downstream performance on biomedical texts. The authors release ModernBERT-bio and ModernCamemBERT-bio as state-of-the-art biomedical encoders.

DLawBench: Evaluating LLMs Through Multi-Turn Legal Consultation

arXiv cs.CL

DLawBench is a new benchmark for evaluating large language models in multi-turn legal consultation, covering Chinese and US law with four client types. Experiments show significant room for improvement, with the best model achieving only 0.562 on legal reasoning.