scientific-domain

#scientific-domain

Enhancing Scientific Discourse: Machine Translation for the Scientific Domain

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper presents the development of parallel and monolingual corpora for scientific machine translation across Spanish-English, French-English, and Portuguese-English, targeting four domains: Cancer Research, Energy Research, Neuroscience, and Transportation. The corpora are used to fine-tune neural machine translation systems, addressing challenges of specialized vocabulary and syntax in scientific text.

0 favorites 0 likes

#scientific-domain

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

arXiv cs.CL ↗ · 2026-04-20 Cached

MUSCAT is a new multilingual, scientific conversation benchmark dataset for evaluating ASR systems on challenging multilingual scenarios including code-switching, domain-specific vocabulary, and mixed language input. The dataset consists of bilingual discussions on scientific papers between speakers using different languages, with results showing current state-of-the-art systems struggle with these multilingual challenges.

0 favorites 0 likes

scientific-domain

Enhancing Scientific Discourse: Machine Translation for the Scientific Domain

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

Submit Feedback