domain-specific

#domain-specific

Pre-Flight: A Benchmark for Evaluating Large Language Models on Aviation Operational Knowledge

arXiv cs.AI ↗ · 16h ago Cached

This paper introduces Pre-Flight, an open-source benchmark of 300 multiple choice questions designed to evaluate large language models on aviation operational knowledge, covering international regulations and ground operations. Results show even the best models in 2026 score 82.7%, significantly below the expert reference of ~95%, highlighting a persistent reliability gap.

0 favorites 0 likes

#domain-specific

On the Utility and Factual Reliability of Pruned Mixture-of-Experts Models in the Biomedical Domain

arXiv cs.LG ↗ · 16h ago Cached

This paper investigates the effects of domain-specific expert pruning on both utility and factual reliability of Mixture-of-Experts (MoE) models in the biomedical domain. It finds that moderate pruning preserves in-domain utility without immediate reliability loss, but extreme pruning increases hallucination risks, and generalization degrades rapidly in cross-domain settings.

0 favorites 0 likes

#domain-specific

@jianxliao: That's why OSS models are so important, along with the stack to adopt those OSS models for domain-specific tasks, run t…

X AI KOLs Following ↗ · yesterday Cached

Emphasizes the importance of open-source AI models for domain-specific tasks, local deployment, and continuous improvement, advocating for owning intelligence rather than renting it.

0 favorites 0 likes

#domain-specific

Travel-Oriented Reasoning Large Language Model via Domain-Specific Knowledge Graphs

arXiv cs.CL ↗ · 3d ago Cached

This paper proposes a modular pipeline that uses a domain-specific knowledge graph to generate multi-hop QA pairs and fine-tune a reasoning LLM (Qwen3-4B) for the travel domain, achieving 82.4% exact match accuracy, significantly outperforming the baseline.

0 favorites 0 likes

#domain-specific

How Do Tool-Augmented LLM Agents Perform on Real-World Energy Analytics Tasks?

arXiv cs.AI ↗ · 2026-06-26 Cached

This paper presents an empirical study and benchmark for evaluating tool-augmented LLM agents on real-world energy analytics tasks, comprising 243 expert-curated problems across market data retrieval, knowledge interpretation, and quantitative modeling.

0 favorites 0 likes

#domain-specific

Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers

Reddit r/LocalLLaMA ↗ · 2026-06-10

Presented DV-DPO, a method to fine-tune Qwen2.5-7B on domain-specific tasks using only ~$3 in API calls and zero human labelers, achieving 96% composite performance of Claude Haiku via adversarial cross-examination.

0 favorites 0 likes

#domain-specific

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

arXiv cs.CL ↗ · 2026-06-03 Cached

This paper introduces ChristBERT, a family of domain-specific RoBERTa-based language models for German clinical NLP, and evaluates three domain adaptation strategies (continued pre-training, pre-training from scratch, and vocabulary adaptation) on medical named entity recognition and text classification tasks, achieving state-of-the-art results.

0 favorites 0 likes

#domain-specific

Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

arXiv cs.LG ↗ · 2026-05-29 Cached

Proposes KOFF, a framework that decomposes pretrained LLMs into a sparse shared backbone and domain-specific external memories using structured pruning and LoRA adapters, achieving 12% sparsity without significant performance loss.

0 favorites 0 likes

#domain-specific

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

This paper introduces MechVQA, a dataset with 3.3k high-density mechanical engineering drawings and 21k question-answer pairs, along with the MechVL model that outperforms existing baselines by 7.57 percentage points on the MechVQA total score, advancing multimodal LLM understanding of mechanical drawings.

0 favorites 0 likes

#domain-specific

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

DOMINO is a novel framework that learns minimal sufficient domain representations from reference examples to synthesize domain-specific data for LLMs, improving code benchmark performance without requiring explicit domain descriptions.

0 favorites 0 likes

#domain-specific

MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding

arXiv cs.LG ↗ · 2026-05-27 Cached

This paper introduces MultiSeismo, a large-scale multimodal seismic dataset with over 16K events integrating waveforms, intensity maps, and metadata, along with MISCE instruction set and SeisModal, a fine-tuned multimodal model for cross-modal seismic understanding.

0 favorites 0 likes

#domain-specific

FAB-Bench: A Framework for Adaptive RAG Benchmarking in Semiconductor Manufacturing

arXiv cs.CL ↗ · 2026-05-27 Cached

FAB-Bench is a benchmark framework for evaluating Retrieval-Augmented Generation (RAG) systems in semiconductor manufacturing, with six diagnostic metrics and analysis across context windows. It provides 200 curated query-answer pairs and reveals context-scaling behaviors and attention dilution issues.

0 favorites 0 likes

#domain-specific

Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs

arXiv cs.AI ↗ · 2026-05-26 Cached

Palette proposes a modular framework for selectively relaxing safety refusal behaviors in LLMs for authorized professional domains, using multi-objective search and lightweight adaptation to avoid costly retraining.

0 favorites 0 likes

#domain-specific

Agentic search models (5 minute read)

TLDR AI ↗ · 2026-05-13 Cached

Agentic search models are LLMs trained specifically for orchestrating search tasks, offering smaller, faster, and domain-specific alternatives to general models like GPT-5. They unbundle the traditional monolithic search stack by allowing an intelligent model to manage the entire retrieval process.

0 favorites 0 likes

#domain-specific

BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

BAGEL is a new benchmark for evaluating animal-related knowledge in large language models, constructed from diverse scientific sources and covering taxonomy, morphology, habitat, behavior, and species interactions through closed-book question-answer pairs. The benchmark enables fine-grained analysis across taxonomic groups and knowledge categories, providing insights into model strengths and failure modes for biodiversity applications.

0 favorites 0 likes

domain-specific

Submit Feedback