Tag
BamiBERT is a new BERT-based pre-trained language model for Vietnamese that addresses limitations of PhoBERT, supporting longer context and operating without word segmentation, achieving state-of-the-art results on multiple Vietnamese benchmarks.
Promotes an educational resource explaining Transformer architecture, covering token embeddings, self-attention, residual connections, and connections to GPT and BERT.
Introduces a comprehensive hate speech dataset for Turkish and Arabic, and develops state-of-the-art BERT-based models for hate speech analysis including classification, intensity prediction, target identification, and span detection.
This paper finds that 42.6% of annotator disagreement in HateXplain concentrates at the hate/offensive boundary, demonstrating that majority vote silences minority values and leads to models being wrong but highly confident on contested inputs.
This paper explores domain adaptation of ModernBERT models in the legal domain by further pre-training on US court opinions, achieving significant improvements over the vanilla model and releasing the checkpoints publicly.
SupraLabs released SupraSafety-18M, a tiny 18M-parameter BERT-style content moderation model trained on NVIDIA's Nemotron-3.5 dataset. It achieves 81.2% accuracy and runs efficiently on edge devices.
This paper compares fine-tuned BERT (gbert-large) with few-shot LLM prompting (Llama 4 Maverick) for detecting threat and solution framing in German climate news sentences. BERT achieves higher F1 scores (0.83 vs 0.78), and an ablation study shows that providing preceding sentence context improves performance.
This paper investigates the distribution and evolution of aspect-level sentiments in multi-round peer reviews from Nature Communications, using a deep learning approach (LCF-BERT-CDM) to achieve 82.65% Macro-F1, and finds that positive sentiment increases while negative sentiment decreases with more review rounds.
This paper proposes a novel pipeline for multilingual coreference resolution that uses cycle-consistent machine translation from English to low-resource languages to generate training data, validated by back-translation and BERT similarity. Experiments on four low-resource languages show significant performance gains, enabling accurate coreference resolution where no prior corpora existed.
The paper proposes a hybrid pre-training objective combining JEPA latent-space prediction with MLM reconstruction for language models, showing improved embedding uniformity and semantic-lexical balance.
This paper introduces a text-based causal inference methodology using an enhanced CausalBERT to disentangle the effects of individual aspects (e.g., school administration, academic performance) on overall online review ratings, validated on 600K+ U.S. K-12 school reviews. Key improvements include temperature scaling, hyperparameter optimization, and interpretability methods to reduce confounding bias.
This paper introduces ChristBERT, a family of domain-specific RoBERTa-based language models for German clinical NLP, and evaluates three domain adaptation strategies (continued pre-training, pre-training from scratch, and vocabulary adaptation) on medical named entity recognition and text classification tasks, achieving state-of-the-art results.
This paper presents Lepton, a fine-tuned BERT classifier that predicts whether a title in Classical Chinese wenji table-of-contents is a personal letter or a preface, leveraging 5,438 hand-labeled titles from late-Ming and early-Qing literati.
This paper uses a BERT-based large language model for sentiment analysis of Decentraland's Discord community to enhance MANA token price prediction, demonstrating that a multi-modal LSTM incorporating sentiment, trading volume, and market capitalization outperforms a price-only baseline.
This research paper investigates how shortcut solutions learned by Transformer models, specifically BERT, impair their ability to perform continual compositional reasoning. It contrasts BERT with ALBERT, finding that ALBERT's recurrent nature offers better inductive bias for continual learning tasks.
Released en_legal_ner_ind_trf v0.1, an InLegalBERT model fine-tuned on 33,000 Indian Supreme Court judgments, achieving a 97.76% F1 score on case citations and significantly outperforming previous baselines.
A foundational study on applying stylometric authorship attribution to threat intelligence, using Japanese Rakuten reviews to compare TF-IDF+LR, BERT embedding, BERT fine-tuning, and metric learning methods. BERT-FT performed best overall, but TF-IDF+LR proved more stable and efficient when scaling to hundreds of authors.
This article profiles researcher Brian Hie, highlighting how his unique background in literature and computer science informed the development of ESM, a BERT-like model for protein sequences.