cross-lingual

Tag

Cards List
#cross-lingual

How Far Do Auto-Interpretation Labels Generalize: A Controlled Study Across Languages, Scripts, and Rewordings

arXiv cs.CL · 2d ago Cached

This paper investigates whether auto-generated labels for sparse autoencoder features generalize across languages and scripts, using Serbian digraphia as a controlled testbed. It finds that while feature sets show substantial overlap across languages, the labels often fail to track the same concept in non-English inputs, particularly in less represented scripts.

0 favorites 0 likes
#cross-lingual

Anchoring LLM Gender Bias to Human Baselines: A Cross-Lingual Audit

arXiv cs.CL · 3d ago Cached

This paper audits six large language models for gender stereotyping across English, Korean, Chinese, and Japanese, anchoring against human baselines. It finds that LLM stereotyping often exceeds human cross-country variation and can compound across languages, introducing a four-pattern framework to characterize such behaviors.

0 favorites 0 likes
#cross-lingual

XLGoBench: Detecting cross-lingual skill gaps with algorithmic tasks

arXiv cs.CL · 3d ago Cached

XLGoBench introduces a synthetic benchmark of algorithmic tasks to detect cross-lingual skill gaps in LLMs, demonstrating persistent gaps across multiple state-of-the-art models.

0 favorites 0 likes
#cross-lingual

When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models

arXiv cs.CL · 3d ago Cached

This paper introduces CulturalNB, a dataset of Bengali cultural question-answer pairs, and evaluates nine LLMs for cross-lingual cultural bias. Findings show that English prompting increases global narrative substitution and reduces local perspectives, revealing that cultural failures in LLMs are grounding and prioritization issues, not just missing knowledge.

0 favorites 0 likes
#cross-lingual

Cross-Lingual Steering for Figurative Language Generation

arXiv cs.CL · 3d ago Cached

This paper explores cross-lingual transfer of internal representations for figurative language generation in multilingual LLMs, showing that activation directions learned in one language can effectively steer generation in other languages.

0 favorites 0 likes
#cross-lingual

Rethinking the Multilingual Reasoning Gap with Layer Swap

arXiv cs.CL · 2026-05-27 Cached

This paper revisits the multilingual reasoning gap in LLMs, finding it smaller than previously reported under comparable supervision. It introduces Layer Swap, which transfers mid-layer weights from an English reasoning specialist to native language specialists, nearly closing the gap while preserving native-language chain-of-thought.

0 favorites 0 likes
#cross-lingual

An In-Vitro Study on Cross-Lingual Generalization in Language Models

arXiv cs.CL · 2026-05-27 Cached

This paper introduces an in-vitro framework with two procedurally generated languages to study cross-lingual generalization in language models, finding that tokenization's preservation of reusable substructure is more critical than lexical similarity or data balance for transferring capabilities across languages.

0 favorites 0 likes
#cross-lingual

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

arXiv cs.CL · 2026-05-27 Cached

This paper introduces CroCo, a method for cross-lingual contrastive preference tuning on self-generated responses, showing that a reward model trained on English preferences can effectively rank responses in other languages, improving model performance across 14 languages without language-specific annotations.

0 favorites 0 likes
#cross-lingual

@denziideng: Another AI voice cloning 'dimensional reduction attack'... The CosyVoice I shared before can clone in 3 seconds, which I thought was already scary enough. But today's tool is even more lethal — after casually recording 1 minute of my own voice for training, it directly replicates tone, mannerisms, emotions, breathing, and pauses. It's almost like the soul of the original person possessed it! C...

X AI KOLs Timeline · 2026-05-26 Cached

GPT-SoVITS is an open-source AI voice cloning tool that supports zero-shot (5-second voice) and few-shot (1-minute training) high-fidelity voice cloning, cross-lingual inference, and comes with a complete WebUI toolchain. It has garnered 57.8k stars on GitHub, becoming the leading open-source project in the voice cloning field.

0 favorites 0 likes
#cross-lingual

HiMed: Incentivizing Hindi Reasoning in Medical LLMs

arXiv cs.CL · 2026-05-26 Cached

Introduces HiMed, a Hindi reasoning medical corpus and benchmark suite, and HiMed-8B, a Hindi-form medical reasoning LLM using decaying scaffolding reward, demonstrating improved Hindi medical reasoning and reduced English–Hindi accuracy gap.

0 favorites 0 likes
#cross-lingual

Discovering Lexical Gaps Using Embeddings from Multilingual LLMs

arXiv cs.CL · 2026-05-26 Cached

This paper proposes a data-driven framework using embeddings from multilingual LLMs to detect lexical gaps between languages, achieving high accuracy in Korean-English pairs.

0 favorites 0 likes
#cross-lingual

Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

arXiv cs.CL · 2026-05-25 Cached

This paper presents the first systematic cross-lingual, multimodal red-teaming study comparing jailbreak vulnerability in US English and Mexican Spanish across four frontier MLLMs, revealing that language does not scale vulnerability uniformly and that safety rankings are not preserved across languages.

0 favorites 0 likes
#cross-lingual

SemBridge: Language Transfer in Sparse Encoders via Multilingual Semantic Bridges

Hugging Face Daily Papers · 2026-05-25 Cached

SemBridge is a novel embedding initialization method that leverages multilingual bridge models to establish semantic alignments between source and target vocabularies, improving cross-lingual sparse encoder adaptation and retrieval performance across multiple languages.

0 favorites 0 likes
#cross-lingual

@lxfater: NetEase Youdao open-sourced ZiYue 4 model, within 27B parameters, SOTA in math and science. But what really interests me is its voice feature!! Cloning a voice is nothing new, ElevenLabs could do it long ago. But they all share a common flaw: cross-language accent. Take your Chinese voice and use it to speak Japanese — it has a Chinese accent, you can tell it's a foreigner struggling...

X AI KOLs Timeline · 2026-05-22 Cached

NetEase Youdao open-sourced the ZiYue 4 model with 27B parameters, achieving SOTA in math and science; its voice feature supports 3-second cross-language voice cloning across 14 languages with no accent issue, along with open-sourcing the all-scenario intelligent agent 'Longxia' (Lobster).

0 favorites 0 likes
#cross-lingual

Cross-Lingual Consensus: Aligning Multilingual Cultural Knowledge via Multilingual Self-Consistency

arXiv cs.CL · 2026-05-22 Cached

This paper proposes a self-supervised framework using multilingual self-consistency and a self-critique mechanism to transfer cultural knowledge across languages, achieving a 5.03% average improvement on English queries in the BLEnD benchmark by surfacing latent cultural knowledge from local-language representations.

0 favorites 0 likes
#cross-lingual

Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations

arXiv cs.CL · 2026-05-20 Cached

This paper investigates the trade-off between plausibility and faithfulness in cross-lingual explanations from LLMs, finding that English-pivot explanations achieve higher span agreement with human rationales but suffer reduced causal faithfulness compared to native-language explanations.

0 favorites 0 likes
#cross-lingual

Why Do Safety Guardrails Degrade Across Languages?

arXiv cs.CL · 2026-05-19 Cached

This paper introduces a Multi-Group Item Response Theory framework to decouple factors behind safety degradation in non-English languages, revealing that safety is primarily unidimensional and that low-resource languages produce more uncertain responses.

0 favorites 0 likes
#cross-lingual

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

arXiv cs.CL · 2026-05-15 Cached

This paper proposes two new metrics—Knowledge Separability Score (KSS) and Knowledge Persistence Score (KPS)—to evaluate cross-linguistic information removal in multilingual machine unlearning for LLMs, addressing shortcomings of prior per-language evaluation protocols.

0 favorites 0 likes
#cross-lingual

Improving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensembling

arXiv cs.CL · 2026-05-12 Cached

This paper introduces Context-Aligned Contrastive Regression to improve lexical difficulty prediction by addressing cross-lingual alignment and ordinal structure challenges in language learning datasets.

0 favorites 0 likes
#cross-lingual

MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal

Hugging Face Daily Papers · 2026-05-08 Cached

MLAIRE is a multilingual language-aware information retrieval evaluation protocol that separates semantic retrieval accuracy from query-language preference to better assess retrieval utility across mixed-language corpora.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback