Cross-Lingual Steering for Figurative Language Generation

arXiv cs.CL 06/01/26, 04:00 AM Papers
Summary
This paper explores cross-lingual transfer of internal representations for figurative language generation in multilingual LLMs, showing that activation directions learned in one language can effectively steer generation in other languages.
arXiv:2605.30443v1 Announce Type: new Abstract: Multilingual large language models can generate figurative language, but whether the internal signals driving this behavior are language-specific or reusable across languages is unclear. Using activation steering as a probe, we estimate a direction for a figurative category from figurative--literal activation differences in one language and apply it during generation. Across five figurative categories, six languages, and four multilingual LLMs, these directions steer reliably within their own language, most robustly for metaphor and simile. More importantly, they transfer across languages: a direction learned in one increases the target behavior when applied to another, with German among the most receptive targets. Going further, directions assembled from other languages can match or even surpass a target language's own native direction, while removing this shared component weakens native steering. Together, these results provide direct evidence of a reusable but target-dependent cross-lingual signal for figurative generation.
Original Article
View Cached Full Text
Cached at: 06/01/26, 09:23 AM
# Cross-Lingual Steering for Figurative Language Generation
Source: [https://arxiv.org/html/2605.30443](https://arxiv.org/html/2605.30443)
Linfeng Liu1Tiffany Zhan2Louie Hong Yao3Saptarshi Ghosh1Tianyu Jiang1 1Department of Computer Science, University of Cincinnati 2School of Computer Science, Carnegie Mellon University 3Independent Researcher \{liu2lf,ghosh2si\}@mail\.uc\.edu, tzhan2@andrew\.cmu\.edu,lhyao731@gmail\.com,tianyu\.jiang@uc\.edu

###### Abstract

Multilingual large language models can generate figurative language, but whether the internal signals driving this behavior are language\-specific or reusable across languages is unclear\. Using activation steering as a probe, we estimate a direction for a figurative category from figurative–literal activation differences in one language and apply it during generation\. Across five figurative categories, six languages, and four multilingual LLMs, these directions steer reliably within their own language, most robustly for metaphor and simile\. More importantly, they transfer across languages: a direction learned in one increases the target behavior when applied to another, with German among the most receptive targets\. Going further, directions assembled from other languages can match or even surpass a target language’s own native direction, while removing this shared component weakens native steering\. Together, these results provide direct evidence of a reusable but target\-dependent cross\-lingual signal for figurative generation\.

Cross\-Lingual Steering for Figurative Language Generation

Linfeng Liu1Tiffany Zhan2Louie Hong Yao3Saptarshi Ghosh1Tianyu Jiang11Department of Computer Science, University of Cincinnati2School of Computer Science, Carnegie Mellon University3Independent Researcher\{liu2lf,ghosh2si\}@mail\.uc\.edu,tzhan2@andrew\.cmu\.edu,lhyao731@gmail\.com,tianyu\.jiang@uc\.edu

## 1Introduction

![Refer to caption](https://arxiv.org/html/2605.30443v1/x1.png)Figure 1:Overview of the cross\-lingual steering test\. A metaphor direction constructed from Chinese figurative–literal examples is applied while processing an English literal prompt\. The intervention tests whether source\-language steering signal can increase metaphorical generation in a different target language\.Multilingual large language models \(LLMs\) can produce figurative expressions in multiple languages\. However, it remains unclear whether the internal signals supporting such generation are primarily language\-specific or whether some behaviorally useful components can be reused across languages\. This distinction matters for understanding what multilingual generation shares beyond surface vocabulary and syntax\.

Figurative language offers a particularly informative test case\. Some categories, such as metaphor and simile, often depend on semantic relations that may plausibly recur across languages\. Others, such as idiom, irony, and sarcasm, depend more on culturally situated pragmatics or discourse context\. Cross\-lingual differences in steering effectiveness may therefore reveal not only whether reusable signal exists, but also which kinds of figurative behavior are most portable across languages\. Although prior work has explored cross\-lingual alignment in multilingual tasks\(Zhanget al\.,[2025](https://arxiv.org/html/2605.30443#bib.bib23); Wanget al\.,[2025](https://arxiv.org/html/2605.30443#bib.bib21)\), whether internal directions for fine\-grained figurative generation transfer across languages remains underexplored\.

We investigate this question usingactivation steering\(Turneret al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib3); Rimskyet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib28)\)as an intervention\-based probe\. As illustrated in Figure[1](https://arxiv.org/html/2605.30443#S1.F1), we estimate a steering direction from figurative–literal activation differences in a source language \(e\.g\., Chinese\) and apply it while the model processes literal prompts in either the same language or a different target language \(e\.g\., English\), without retraining or source–target\-specific tuning\. If a direction estimated in one language increases target\-category generation in another, it provides behavioral evidence that the source\-language contrast contains signal usable beyond its original language\. We evaluate this question across five figurative categories \(idiom, metaphor, simile, irony, and sarcasm\), six languages \(English, Chinese, Bengali, Spanish, Italian, and German\), and four multilingual large language models\.

Our analysis proceeds in three stages to evaluate the cross\-lingual portability of figurative language representations\. First, in monolingual steering, we test whether activation differences yield effective steering directions within the language from which they are derived\. Second, in zero\-shot cross\-lingual transfer, we apply source\-language steering vectors to target\-language prompts to determine whether figurative representations are shared across languages\. Finally, through geometric interventions, we combine and ablate vectors from multiple languages to characterize the structure underlying their transferability\.

Through these analyses, we make three main contributions:

- •Effective monolingual steering\.In monolingual experiments, steering vectors consistently increase targeted figurative behavior, significantly outperforming both random\-vector controls and unsteered baselines in 74 of 96 settings\.
- •Zero\-shot cross\-lingual portability\.Source\-language vectors successfully steer target\-language generation in 369 of 416 zero\-shot settings, and in several cases match or outperform vectors derived directly from the target language\.
- •A shared geometric basis for transfer\.Multilingual aggregate vectors systematically rival native\-language steering, while ablating the shared subspace sharply reduces figurative generation, indicating that transferable steering depends on a common representational component\.

Taken together, these findings show that figurative generation is governed, in substantial part, by reusable cross\-lingual activation structure that can be identified and manipulated through representation steering\.111Our code will be made publicly available upon acceptance of the paper\.

## 2Related Work

Activation steering and representation engineering\.Activation\-based interventions manipulate internal model representations at inference time without updating model parameters\(Turneret al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib3); Zouet al\.,[2025](https://arxiv.org/html/2605.30443#bib.bib2)\)\. Contrastive Activation Addition \(CAA\) constructs steering directions from positive–negative activation differences and applies them during generation\(Rimskyet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib28)\)\. Related inference\-time interventions have also been used to improve truthfulness by shifting activations along truth\-related directions\(Liet al\.,[2023](https://arxiv.org/html/2605.30443#bib.bib42)\)\. Empirical work has further investigated steering for stylistic and behavioral properties\(Konenet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib29); Sharma and Trivedi,[2026](https://arxiv.org/html/2605.30443#bib.bib34)\), while surveys systematize representation\-engineering methods and their safety\-relevant applications\(Bartoszczeet al\.,[2025](https://arxiv.org/html/2605.30443#bib.bib30)\)\. We extend this line of work by studying fine\-grained figurative\-language generation and its transfer across languages\.

Multilingual alignment and transfer\.Cross\-lingual alignment has long been studied as a basis for transfer in multilingual representations\(Ruderet al\.,[2019](https://arxiv.org/html/2605.30443#bib.bib41); Hämmerlet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib32)\)\. For multilingual LLMs specifically,Wanget al\.\([2024](https://arxiv.org/html/2605.30443#bib.bib31)\)probe how alignment emerges during pre\-training and relate cross\-lingual neuron overlap to zero\-shot transfer performance\. Cross\-lingual in\-context learning has also been studied behaviorally\(Tanwaret al\.,[2023](https://arxiv.org/html/2605.30443#bib.bib33)\), while multilingual benchmarks evaluate generative\-model capabilities across languages and tasks\(Ahujaet al\.,[2023](https://arxiv.org/html/2605.30443#bib.bib43)\)\. More directly related to our intervention setting,Maraiaet al\.\([2026](https://arxiv.org/html/2605.30443#bib.bib8)\)test cross\-language activation steering for syllogistic reasoning, andGurgurovet al\.\([2026](https://arxiv.org/html/2605.30443#bib.bib35)\)introduce a benchmark for multilingual language steering\. Complementary work byBandarkaret al\.\([2026](https://arxiv.org/html/2605.30443#bib.bib36)\)uses cross\-lingual inconsistency for causal knowledge localization in mixture\-of\-experts models rather than for steering\-vector transfer\.

Table 1:Sentence\-continuation instructions used for generation\. In each template,\{sent\}is replaced with the input sentence in the same language\.Figurative language in NLP\.Prior work includes surveys of figurative\-language generation\(Lai and Nissim,[2024](https://arxiv.org/html/2605.30443#bib.bib26)\), sarcasm detection\(Chenet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib38)\), and NLI\-based evaluation of figurative interpretation\(Stoweet al\.,[2022](https://arxiv.org/html/2605.30443#bib.bib40)\)\. FLUTE provides an explanation\-based benchmark for figurative\-language understanding\(Chakrabartyet al\.,[2022](https://arxiv.org/html/2605.30443#bib.bib16)\), while FLUID QA evaluates multilingual figurative\-language usage\(Parket al\.,[2025](https://arxiv.org/html/2605.30443#bib.bib27)\)\. Existing model analyses and recognition methods have examined figurative\-language classification and simile recognition\(Liuet al\.,[2018](https://arxiv.org/html/2605.30443#bib.bib19); Janget al\.,[2023](https://arxiv.org/html/2605.30443#bib.bib37)\)\. Our work studies whether figurative\-generation directions estimated from internal representations can transfer across languages\.

## 3Experimental Setup and Methodology

Our experiments test whether a category\-associated direction estimated in one language can steer figurative generation in the same or a different language\.

Models, languages, and categories\.We conduct experiments with four multilingual models \(Qwen3\-8B, Qwen3\-32B\(Yanget al\.,[2025](https://arxiv.org/html/2605.30443#bib.bib18)\), Llama\-3\.1\-8B\-Instruct\(Grattafioriet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib39)\), and Ministral\-3\-8B\-Instruct\(Liuet al\.,[2026](https://arxiv.org/html/2605.30443#bib.bib70)\)\) across five figurative categories \(idiom, metaphor, simile, irony, and sarcasm\) and six languages \(English, Chinese, Bengali, Spanish, Italian, and German\)\. Evaluated language–category settings are determined by public data availability; for example, simile is evaluated only in English and Chinese\.

Data splits and task formulation\.We separate data used for vector construction, layer validation, and final testing \(detailed in Appendix[A](https://arxiv.org/html/2605.30443#A1)\)\. Vector construction relies on balanced sets of up to 500 figurative examples and 500 monolingual literal sentences \(e\.g\., COCO captions\(Linet al\.,[2014](https://arxiv.org/html/2605.30443#bib.bib66)\)\) per setting\.

For generation, we use a sentence\-continuation instruction in the target language \(Table[1](https://arxiv.org/html/2605.30443#S2.T1)\)\. Crucially, the prompt never requests a figurative device; therefore, any increase in the Target Category Rate is induced entirely by our intervention\. Final behavioral evaluations are conducted on a held\-out set of 500 literal prompts per target language\.

Steering vector extraction and application\.We estimate steering directions using contrastive activation addition \(CAA\)\(Rimskyet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib28)\)\. For a given languageggand categorycc, let𝒟g,c\+\\mathcal\{D\}^\{\+\}\_\{g,c\}contain the figurative examples and𝒟g−\\mathcal\{D\}^\{\-\}\_\{g\}contain the literal captions\. At a selected layerll, the mean\-difference direction is:

v^g,c\(l\)=μ\(l\)\(𝒟g,c\+\)−μ\(l\)\(𝒟g−\)‖μ\(l\)\(𝒟g,c\+\)−μ\(l\)\(𝒟g−\)‖2\\hat\{v\}\_\{g,c\}^\{\(l\)\}=\\frac\{\\mu^\{\(l\)\}\(\\mathcal\{D\}^\{\+\}\_\{g,c\}\)\-\\mu^\{\(l\)\}\(\\mathcal\{D\}^\{\-\}\_\{g\}\)\}\{\\left\\\|\\mu^\{\(l\)\}\(\\mathcal\{D\}^\{\+\}\_\{g,c\}\)\-\\mu^\{\(l\)\}\(\\mathcal\{D\}^\{\-\}\_\{g\}\)\\right\\\|\_\{2\}\}\(1\)
Here,μ\(l\)\\mu^\{\(l\)\}averages the hidden\-states activations at the last input\-token position for each set\.

During the generation prefill phase, we intervene on the residual stream at the selected layerllfor each prompt\-token positiontt:

ht\(l\)⁣′=ht\(l\)\+αv^g,c\(l\)h\_\{t\}^\{\(l\)\\prime\}=h\_\{t\}^\{\(l\)\}\+\\alpha\\hat\{v\}\_\{g,c\}^\{\(l\)\}\(2\)
The intervention layerllis determined for each model by validating the steering vectors on a separate validation set of literal prompts \(Appendix[B\.4](https://arxiv.org/html/2605.30443#A2.SS4)\), ensuring the final test data remains completely unseen\. For all monolingual, cross\-lingual, random\-vector, and geometry\-vector experiments, we fix the intervention strength atα=1\.0\\alpha=1\.0\. We compare our learned directions against a matched random\-vector control applied at the same layer, positions, and magnitude \(Appendix[B\.5](https://arxiv.org/html/2605.30443#A2.SS5)\)\.

Table 2:DeepSeek\-v4\-flash classification performance on annotated figurative\-versus\-literal examples\. Values are F1 scores \(as percentages\)\. These scores show LLM\-as\-judge’s capability on figurative language recognition\.Evaluation and statistical analysis\.We evaluate the generated continuations using DeepSeek\-v4\-flash as an LLM\-as\-a\-judge\(DeepSeek\-AI,[2026](https://arxiv.org/html/2605.30443#bib.bib22)\)\. \(Validation F1 scores demonstrating the judge’s capability in figurative language recognition are provided in Table[2](https://arxiv.org/html/2605.30443#S3.T2); related prompts are in Appendix[B\.3](https://arxiv.org/html/2605.30443#A2.SS3)\)\. We report two primary metrics:

1. 1\.Target Category Rate \(TCR\):The proportion of generated continuations that exhibit the targeted figurative phenomenon\. This serves as our primary measure of behavioral change\.
2. 2\.Coherence:A 0–4 scale rating that evaluates the contextual relevance and logical consistency of the generated continuation\.

Table 3:Monolingual steering summary by input language\. Values are averaged over available categories and shown as percentages\. Shading intensity denotes statistical significance \(adjustedqq\-values\) for monolingual steering\-vs\-unsteered paired tests across categories:lightq≥0\.05q\\geq 0\.05,mediumq<0\.05q<0\.05,darkerq<0\.01q<0\.01, anddarkestq<0\.001q<0\.001\. Win counts categories where monolingual steering exceeds the unsteered baseline\.We compare standard interventions against unsteered generation, and geometric interventions against native monolingual steering\. We report paired percentage\-point differences, providing 95% bootstrap confidence intervals and adjusted McNemarqq\-values where applicable \(Appendix[B\.2](https://arxiv.org/html/2605.30443#A2.SS2)\)\.

## 4Monolingual Steering

To confirm the behavioral efficacy of our interventions, we run monolingual steering on the 500\-example test set for each language\. This step ensures that the derived vectors encode a robust, actionable signal\. Table[3](https://arxiv.org/html/2605.30443#S3.T3)reports the Target Category Rate \(TCR\) for these monolingual interventions\. Across 96 total settings, monolingual steering significantly improves the TCR over the unsteered baseline in 74 cases\. Full details are available in Appendix[C](https://arxiv.org/html/2605.30443#A3)\.

Intervention signal\.Monolingual steering consistently elevates the TCR over the unsteered baseline, but the absolute performance exposes a gap between high\- and low\-resource languages\. Across almost all models, high\-resource languages like English, Chinese, and German readily achieve steered TCRs between 20% and 35%\. In contrast, Bengali yields the lowest unsteered baselines \(often near 1%\) and remains severely constrained even under intervention, with Qwen and Llama models failing to surpass 9% steered TCR\. This indicates that while contrast\-derived directions can trigger the behavioral intent to generate figurative text, final generation remains bound by the model’s language\-specific generative priors in the target language\.

Comparison against random\-vector control\.Across every language–model aggregate, the learned monolingual direction exceeds the matched random\-vector control\. Random vectors remain close to the unsteered baseline and do not produce comparable increases in target\-category generation\. This consistent gap indicates that the gains are attributable to structure captured by the figurative–literal contrast rather than to arbitrary perturbation of the residual stream\.

Category\-level variation\.As detailed in the full language–category breakdown \(Appendix[C](https://arxiv.org/html/2605.30443#A3)\), the 74 statistically significant baseline wins are heavily concentrated in specific figurative domains\. Steering is remarkably robust for structural figures of speech, achieving significant positive gains in nearly every evaluated setting for metaphor \(23 of 24\) and idiom \(21 of 24\)\. However, the intervention is less universally effective for pragmatic language\. Sarcasm and irony account for the majority of the non\-significant shifts, with sarcasm achieving statistical significance in only 8 of its 16 settings\. This indicates that while the overall steering mechanism is highly effective, localized activation vectors are much better suited to triggering semantic comparisons than forcing contextual, pragmatic subversion\.

Direction construction diagnostic\.Because our primary steering directions are built by contrasting figurative examples with out\-of\-domain literal captions \(figurative vs\. caption\), the resulting vectors might inadvertently encode source, genre, or register differences\. To test whether the steering effect relies on these dataset artifacts, we construct an alternative set of vectors using a stricterfigurative vs\. native matched\-literalformula\. In this setup, the literal negative examples are drawn from the exact same source corpus as the figurative positive examples\. We then evaluate both vector types on a validation sample of held\-out literal prompts from WikiMatrix\(Schwenket al\.,[2019](https://arxiv.org/html/2605.30443#bib.bib4)\)\. The diagnostic shows that positive steering effects broadly persist under the matched\-literal construction\. While effect magnitudes fluctuate at the category level, this general retention confirms that our monolingual results are not strictly dependent on the out\-of\-domain caption negatives\. Full details are provided in Appendix[D](https://arxiv.org/html/2605.30443#A4)\.

![Refer to caption](https://arxiv.org/html/2605.30443v1/x2.png)Figure 2:For each anchor languageLL,L→XL\\rightarrow Xdenotes steering vectors derived fromLLand evaluated on prompts in other languagesXX, whereasX→LX\\rightarrow Ldenotes steering vectors derived from other languagesXXand evaluated on prompts inLL\. Colors indicate the anchor language\. Bars show mean percentage\-point gain in target\-category generation over the unsteered baseline, averaged across compatible language–category routes\.Takeaway\.Monolingual steering reliably increases target\-category generation relative to both unsteered and random controls\. This confirms that contrast\-derived vectors successfully encode a robust, behaviorally actionable signal within their source language\.

## 5Cross\-Lingual Transfer

Having established that figurative–literal directions are effective natively, we now test whether this behavioral shift extends across linguistic boundaries\. In these zero\-shot experiments, a direction estimated from a source language is applied directly to prompts in a target language without any tuning\. By measuring the resulting shifts over the unsteered baseline, we can map how well these figurative signals transfer\. We break down these transfer effects by language role and figurative category\.

Language\-level transfer and asymmetries\.Figure[2](https://arxiv.org/html/2605.30443#S4.F2)summarizes the macro\-level transfer dynamics of figurative steering directions across four models, viewed from the perspective of an anchor language acting either as a source \(L→XL\\rightarrow X\) or a target \(X→LX\\rightarrow L\)\. While cross\-lingual steering yields positive mean gains over the unsteered baseline in nearly all evaluated routes, comparing these two roles reveals a stark directional asymmetry\. Specifically, a language’s ability to project a robust figurative signal to other languages does not inherently guarantee its ability to effectively receive one, indicating that cross\-lingual transfer is heavily bottle\-necked by target\-language receptivity rather than source\-language signal strength\.

This source–target paradox is most evident when contrasting high\- and low\-resource languages within the data\. German consistently emerges as the most receptive target language across all four models, frequently achieving cross\-lingual gains nearing 20 percentage points when receiving signals from other languages\. Conversely, Bengali exhibits a severe directional imbalance: while it frequently serves as a highly potent source language, driving some of the highest gains elsewhere, it is consistently the weakest target\. For example, steering to Bengali \(X→LX\\rightarrow L\) under Qwen3\-8B actually results in a negative mean gain, dropping below the baseline\. Because these structural asymmetries persist regardless of model architecture or scale, they suggest that while figurative intent can be robustly extracted from a language, successfully projecting that signal is rigidly bound by the model’s learned linguistic priors in the target language\.

Table 4:Cross\-lingual transfer by figurative category, excluding monolingual applications\. Values summarize route\-level percentage\-point changes in TCR relative to unsteered generation\.Positivecounts routes withΔ\>0\\Delta\>0; confidence intervals are obtained by bootstrapping routes within each category\.Category\-dependent transfer patterns\.Table[4](https://arxiv.org/html/2605.30443#S5.T4)summarizes zero\-shot cross\-lingual effects across figurative categories\. Metaphor demonstrates the most robust transfer among broadly evaluated categories, yielding positive gains in 112 out of 120 cross\-lingual routes with a striking mean increase of\+17\.2\+17\.2percentage points\. Simile likewise shows strong transferability \(\+15\.7\+15\.7meanΔ\\Delta\), though its evaluation is restricted to eight English–Chinese routes, limiting direct comparison\.

In contrast, other categories exhibit more constrained transfer\. While idiom and irony reliably transfer—showing positive shifts in roughly 85% to 90% of their respective routes—their mean gains are modest \(\+5\.6\+5\.6and\+5\.1\+5\.1points\)\. Sarcasm remains the most resistant to cross\-lingual steering, yielding a marginal mean improvement of just\+1\.0\+1\.0point\. Ultimately, these disparities suggest that while static vectors can reliably project structural, lexically grounded comparisons \(like metaphor\) across languages, transferring the complex pragmatic discourse required for sarcasm is difficult within a constrained sentence\-continuation setting\.

Takeaway\.Zero\-shot portability provide evidence that models possess a universal understanding of figurative intent, but projecting it is dependent on their generative capacity in the target language\.

## 6Internal Geometry

While Section[5](https://arxiv.org/html/2605.30443#S5)demonstrates that steering vectors learned from source\-language data remain behaviorally effective across linguistic boundaries, we now investigate the geometric properties of these learned vectors\. Because all steering vectors in our pipeline are normalized, their behavioral impact is entirely dictated by their direction in the residual stream\. This raises a natural geometric question: if independently learned vectors align across languages, can we synthesize a new, central direction that captures the pure figurative intent more effectively than any single native vector?

To test whether a shared cross\-lingual direction actively drives steering effectiveness, we synthesize cross\-lingual aggregates and evaluate them against native monolingual baselines:

- •Language Mean Aggregation:We synthesize central cross\-lingual directions to test whether a generalized vector can maintain or improve steering\. We evaluate both a completeLanguage Mean\(pooling all languages to find the optimal shared direction\) and a strictLeave\-Target\-Out \(LTO\) Mean\(excluding target\-language data entirely to test a purely zero\-shot shared direction\)\.
- •Residual Ablation:To test whether this shared direction fundamentally controls behavior, we mathematically remove the component of the native target vector that aligns with the cross\-lingual aggregate\. By evaluating the resulting residual direction, we measure how much native steering effectiveness degrades when the shared cross\-lingual component is removed\.

Together, these paired interventions test whether a universal cross\-lingual direction governs behavioral transfer\. Crucially, to confirm that this shared geometry is intrinsically tied to the specific figurative concept—rather than representing a generic “figurative” subspace—we complement this analysis with a cross\-category control experiment, detailed in Appendix[F](https://arxiv.org/html/2605.30443#A6)\.

Table 5:Behavioral impact of cross\-lingual aggregate and residual steering\. Values show the percentage\-point change in Target Category Rate compared to the monolingual steering\. Cell color encodes the direction of change \(green: outperforms;red: underperforms\)\. Shading intensity denotes statistical significance \(adjustedqq\-values\):lightq≥0\.05q\\geq 0\.05,mediumq<0\.05q<0\.05,darkerq<0\.01q<0\.01, anddarkestq<0\.001q<0\.001\.Controlled comparison\.Throughout these experiments, we hold the model layer, steering strength, prompt set, and evaluation protocol constant, varying only the steering direction\. Letv^g,c\(l\)\\hat\{v\}\_\{g,c\}^\{\(l\)\}denote the normalized monolingual steering vector for languageggand categoryccat layerll\.

Constructing aggregate directions\.For a subsetS⊆ℒcS\\subseteq\\mathcal\{L\}\_\{c\}of available languages for categorycc, we define the normalized mean direction as:

v¯S,c\(l\)=∑g∈Sv^g,c\(l\)‖∑g∈Sv^g,c\(l\)‖2\\bar\{v\}\_\{S,c\}^\{\(l\)\}=\\frac\{\\sum\_\{g\\in S\}\\hat\{v\}\_\{g,c\}^\{\(l\)\}\}\{\\left\\\|\\sum\_\{g\\in S\}\\hat\{v\}\_\{g,c\}^\{\(l\)\}\\right\\\|\_\{2\}\}\(3\)We evaluate two specific aggregates: the completeLanguage Mean\(S=ℒcS=\\mathcal\{L\}\_\{c\}\), which pools all available languages, and the strictLeave\-Target\-Out \(LTO\) Mean\(S=ℒc∖\{gt\}S=\\mathcal\{L\}\_\{c\}\\setminus\\\{g\_\{t\}\\\}\), which explicitly excludes the target languagegtg\_\{t\}\. If the LTO Mean remains effective, it demonstrates that a geometric signal estimated purely from other languages generalizes zero\-shot to the target\. Consequently, LTO effectiveness serves as direct evidence of a robust, behaviorally viable cross\-lingual alignment\. We report LTO aggregation only when at least two non\-target directions remain\.

Residual ablation\.To verify whether this shared cross\-lingual geometry fundamentally controls behavior, we mathematically remove its influence from the native target vector\. Letv¯M,gt,c\(l\)\\bar\{v\}^\{\(l\)\}\_\{M,g\_\{t\},c\}represent the chosen aggregate, whereM∈\{All,LTO\}M\\in\\\{\\mathrm\{All\},\\mathrm\{LTO\}\\\}\. We define the corresponding residual direction by projecting out the aggregate component:

rgt,c,M\(l\)=v^gt,c\(l\)−⟨v^gt,c\(l\),v¯M,gt,c\(l\)⟩v¯M,gt,c\(l\)r^\{\(l\)\}\_\{g\_\{t\},c,M\}=\\hat\{v\}^\{\(l\)\}\_\{g\_\{t\},c\}\-\\left\\langle\\hat\{v\}^\{\(l\)\}\_\{g\_\{t\},c\},\\bar\{v\}^\{\(l\)\}\_\{M,g\_\{t\},c\}\\right\\rangle\\bar\{v\}^\{\(l\)\}\_\{M,g\_\{t\},c\}\(4\)We re\-normalize each residual vector before application\. By comparing its behavioral effect against the original monolingual baseline, we can quantify exactly how much steering effectiveness is lost when the cross\-lingual alignment is removed\.

Interpreting the geometric comparison\.Unlike Section[4](https://arxiv.org/html/2605.30443#S4), which evaluates behavioral gains against an unsteered baseline, these geometric interventions apply a significantly stricter criterion\. Here, the baseline is the target language’s native monolingual direction—a highly effective, language\-specific competitor\. Testing whether a synthesized zero\-shot aggregate \(the LTO Mean\) can match or exceed this native performance is a rigorous hurdle\. Consequently, near\-zero differences in steering performance without statistically significant degradation are highly encouraging\. They demonstrate that a shared geometry, constructed entirely without target\-language data, can achieve an observed Target Category Rate functionally comparable to native steering\. While we do not claim formal statistical parity, this comparability offers strong descriptive evidence of a highly capable, reusable cross\-lingual signal\.

Cross\-lingual aggregation rivals native steering\.Across models and categories, synthesized cross\-lingual aggregates prove highly competitive with native monolingual baselines\. Notably, the full Language Mean frequently yields comparable, or even superior, behavioral control\. Crucially, transitioning to the strict Leave\-Target\-Out \(LTO\) Mean results in minimal performance decay\. Because excluding target\-language data does not collapse the steering signal, these results confirm that independently learned vectors are not merely relying on localized artifacts; rather, they share a robust, language\-agnostic geometric core that can be effectively utilized zero\-shot\.

![Refer to caption](https://arxiv.org/html/2605.30443v1/x3.png)Figure 3:Win, tie, and loss rates for LangMean\-family geometry vectors against all available settings \(unsteered, monolingual, cross\-lingual, synthesized vectors\) in the same model, input\-language, and category setting\. Each model subplot contains four stacked bars for LangMean\-All, LangMean\-LTO, and corresponding residual vectors; wins and losses require exact McNemar test at p less than \.05, and non\-significant comparisons are counted as ties\.To rigorously validate this competitiveness, we performed a comprehensive exact McNemar test across all individual language directions\. As illustrated in Figure[3](https://arxiv.org/html/2605.30443#S6.F3)\(with full cell\-by\-cell rankings detailed in Appendix[G](https://arxiv.org/html/2605.30443#A7)\), this evaluation reveals that zero\-shot cross\-lingual aggregates, vectors synthesized entirely from non\-target languages, consistently dominate the top statistical tiers\. In fact, they match or outperform all competitors \(including native vectors\) in 85% to 98% of tested scenarios\.

Architectural sensitivity and scale\.Figure[3](https://arxiv.org/html/2605.30443#S6.F3)reveals distinct architectural responses to geometric interventions\. The Qwen3 family exhibits high steering sensitivity that amplifies with scale: moving from 8B to 32B increases the LangMean Win rate \(41% to 61%\) and the residual Loss rate \(51% to 62%\)\. Conversely, Llama and Ministral display behavioral “stubbornness,” with Tie rates frequently exceeding 60–70%\. We leave investigating whether their corresponding base models exhibit the same behavior to future work\.

Category variance and the metaphor advantage\.While reliance on this shared geometry is a universal mechanism, the behavioral purity of the signal varies by figurative category\. Metaphor provides the most dramatic evidence of cross\-lingual reuse: for English, Spanish, and German targets, zero\-shot LTO aggregates consistently outperform native steering by an average of\+9\.6\+9\.6to\+16\.6\+16\.6percentage points, while ablating this signal causes massive double\-digit degradations\. In contrast, categories requiring richer pragmatic context, such as sarcasm and idiom, show more muted LTO benefits and occasional negative differences\. This contrast suggests that while the model utilizes a shared geometry for all figurative concepts, strongly structural associations \(like metaphor\) project much more cleanly across languages than those bound by localized lexical or cultural conventions\.

Table 6:Mean coherence on a 0–4 scale, where higher values indicate more coherent continuations\.Unsteerdenotes native behavior,Mono\.monolingual steering, andCross\.zero\-shot cross\-lingual routes\.LangMeanandRes\.denote Language Mean aggregation and Residual ablation, evaluated usingAlllanguages orLTO\(Leave\-Target\-Out\)\. Scores are averaged over all evaluated settings for each model\.Coherence under geometric intervention\.As shown in Table[6](https://arxiv.org/html/2605.30443#S6.T6), steering introduces a quality–control trade\-off, but utilizing shared cross\-lingual geometry incurs no additional coherence penalty\. Mean coherence for cross\-lingual routes and synthesized aggregates \(LangMean AllandLTO\) remains closely aligned with standard monolingual steering \(Mono\.\); for the Qwen models, aggregation even slightly improves coherence over localized vectors\. Conversely, the residual interventions \(Res\. AllandLTO\) consistently yield higher coherence scores than active steering\. This partial recovery of coherence mirrors the previously observed collapse in Target Category Rate: as the cross\-lingual geometric signal is ablated, the model’s behavior naturally regresses toward its highly coherent, unsteered default state\.

Qualitative illustration\.Appendix[H](https://arxiv.org/html/2605.30443#A8)presents selected English\-target continuations from Qwen3\-8B under unsteered, monolingual, cross\-lingual, and leave\-target\-out interventions\. The examples illustrate realized figurative behaviors as well as ambiguous and failure cases; our quantitative claims rely on the aggregate evaluation reported above\.

Takeaway\.Figurative transfer is driven by a shared cross\-lingual geometry\. Removing this universal representation fundamentally cripples the model’s ability to steer figurative behavior\.

## 7Conclusion

In this work, we demonstrate that the ability of multilingual LLMs to generate figurative language relies heavily on a shared, language\-agnostic geometric core\. While monolingual steering effectively controls figurative output, our geometric interventions reveal that models do not rely solely on isolated, language\-specific pathways\. Instead, zero\-shot cross\-lingual aggregates successfully drive native generation, and ablating this shared representation fundamentally degrades steering performance\.

Furthermore, we establish that this cross\-lingual steerability is strictly behavior\-dependent\. Structurally universal concepts, such as metaphor, transfer highly reliably across linguistic boundaries, frequently outperforming native target vectors\. In contrast, categories heavily connected to localized pragmatic contexts, like sarcasm, exhibit weaker cross\-lingual alignment\. Ultimately, these findings reveal that multilingual models can organize complex figurative intent conceptually, offering a robust structural foundation for cross\-lingual alignment and control\.

## Limitations

Coverage and balance\.Our study spans six languages, five figurative categories, and four multilingual models, enabling a broad evaluation of cross\-lingual steering\. However, its scope is necessarily constrained by the availability of high\-quality, naturally occurring figurative\-language data across languages\. As a result, the typological distribution is unbalanced in some cases: for example, the scarcity of public parallel datasets restricts our simile evaluation to English and Chinese, while the available German data limits the size of the metaphor and irony construction sets\. Our findings should therefore be interpreted as reflecting the behavioral patterns observable under current data availability, rather than as a fully balanced typological comparison\.

Contrast construction at scale\.Estimating steering directions from naturally occurring text is essential for scalable and ecologically valid multilingual extraction\. At the same time, naturally occurring figurative and literal texts may differ in source, genre, register, or other distributional properties\. To assess the impact of such variation, we conduct a matched\-literal diagnostic in the monolingual setting\. The persistence of positive average steering effects under this stricter control provides initial evidence that the extracted signals are not merely artifacts of broad construction\-level differences\. Extending this resource\-intensive diagnostic to all cross\-lingual transfer and representation\-geometry analyses is computationally prohibitive at the present scale\. Thus, although our results suggest that the learned directions capture behaviorally transferable phenomena, fully disentangling category\-specific representations from construction variance across multilingual settings remains an important direction for future work\.

Generation context and evaluation\.We use sentence continuation as a controlled generation setting for isolating the causal effects of steering interventions\. This design necessarily limits the broader discourse context that is often important for pragmatic categories such as irony and sarcasm; however, it also reduces contextual confounders that could otherwise obscure the effects of steering\. Evaluating ambiguous figurative and pragmatic language at scale further requires automated proxy judgments\. Although our automatic judge is carefully validated on annotated source examples, its application to open\-ended generated continuations remains subject to the general limitations of automated pragmatic assessment\. Finally, as is typical in activation engineering, improvements in Target Category Rate involve a trade\-off with generation quality and control\. We therefore report coherence metrics alongside steering effectiveness to make this trade\-off explicit \(Table[6](https://arxiv.org/html/2605.30443#S6.T6)\)\.

## Ethical considerations

Cultural misalignment and rhetorical misuse\.Because some steering vectors target pragmatic categories such as irony and sarcasm, steering may produce culturally inappropriate or misleading tones\. More broadly, steering models toward rhetorical styles whose interpretation varies across communities could be misused to mask toxic intent, amplify manipulation, or spread subtle misinformation that may be difficult for standard moderation systems to detect\.

## References

- K\. Ahuja, H\. Diddee, R\. Hada, M\. Ochieng, K\. Ramesh, P\. Jain, A\. Nambi, T\. Ganu, S\. Segal, M\. Ahmed, K\. Bali, and S\. Sitaram \(2023\)MEGA: multilingual evaluation of generative AI\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 4232–4267\.External Links:[Link](https://aclanthology.org/2023.emnlp-main.258/),[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.258)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p2.1)\.
- Knowledge localization in mixture\-of\-experts llms using cross\-lingual inconsistency\.External Links:2603\.17102,[Link](https://arxiv.org/abs/2603.17102)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p2.1)\.
- L\. Bartoszcze, S\. Munshi, B\. Sukidi, J\. Yen, Z\. Yang, D\. Williams\-King, L\. Le, K\. Asuzu, and C\. Maple \(2025\)Representation engineering for large\-language models: survey and research challenges\.External Links:2502\.17601,[Link](https://arxiv.org/abs/2502.17601)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p1.1)\.
- Y\. Benjamini and Y\. Hochberg \(1995\)Controlling the false discovery rate: a practical and powerful approach to multiple testing\.Journal of the Royal Statistical Society\. Series B \(Methodological\)57\(1\),pp\. 289–300\.External Links:[Document](https://dx.doi.org/10.1111/j.2517-6161.1995.tb02031.x)Cited by:[§B\.2](https://arxiv.org/html/2605.30443#A2.SS2.p3.2)\.
- T\. Caselli, N\. Novielli, V\. Patti, and P\. Rosso \(2018\)Evalita 2018: overview on the 6th evaluation campaign of natural language processing and speech tools for italian\.InProceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian\. Final Workshop \(EVALITA 2018\),CEUR Workshop Proceedings, Vol\.2263,Turin, Italy\.External Links:[Link](https://ceur-ws.org/Vol-2263/paper001.pdf)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.25.25.3.1.1),[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.26.26.3.1.1)\.
- S\. Casola, S\. Frenda, S\. M\. Lo, E\. Sezerer, A\. Uva, V\. Basile, C\. Bosco, A\. Pedrani, C\. Rubagotti, V\. Patti, and D\. Bernardi \(2024\)MultiPICo: multilingual perspectivist irony corpus\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 16008–16021\.External Links:[Link](https://aclanthology.org/2024.acl-long.849/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.849)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.30.30.3.1.1)\.
- T\. Chakrabarty, A\. Saakyan, D\. Ghosh, and S\. Muresan \(2022\)FLUTE: figurative language understanding through textual explanations\.InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,Y\. Goldberg, Z\. Kozareva, and Y\. Zhang \(Eds\.\),Abu Dhabi, United Arab Emirates,pp\. 7139–7159\.External Links:[Link](https://aclanthology.org/2022.emnlp-main.481/),[Document](https://dx.doi.org/10.18653/v1/2022.emnlp-main.481)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.6.6.3.1.1),[§2](https://arxiv.org/html/2605.30443#S2.p3.1)\.
- W\. Chen, F\. Lin, G\. Li, and B\. Liu \(2024\)A survey of automatic sarcasm detection: fundamental theories, formulation, datasets, detection methods, and opportunities\.Neurocomputing578,pp\. 127428\.External Links:ISSN 0925\-2312,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.neucom.2024.127428),[Link](https://www.sciencedirect.com/science/article/pii/S0925231224001991)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p3.1)\.
- S\. Das and K\. Ghosh \(2025\)Can LLMs be literary companions?: analysing LLMs on Bengali figures of speech identification\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,C\. Christodoulopoulos, T\. Chakraborty, C\. Rose, and V\. Peng \(Eds\.\),Suzhou, China,pp\. 18634–18656\.External Links:[Link](https://aclanthology.org/2025.emnlp-main.941/),[Document](https://dx.doi.org/10.18653/v1/2025.emnlp-main.941),ISBN 979\-8\-89176\-332\-6Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.15.15.3.1.1)\.
- DeepSeek\-AI \(2026\)DeepSeek\-V4: towards highly efficient million\-token context intelligence\.Technical ReportDeepSeek\-AI\.Note:Preview releaseExternal Links:[Link](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf)Cited by:[Appendix B](https://arxiv.org/html/2605.30443#A2.p2.1),[§3](https://arxiv.org/html/2605.30443#S3.p11.1)\.
- B\. Efron \(1979\)Bootstrap methods: another look at the jackknife\.The Annals of Statistics7\(1\),pp\. 1–26\.External Links:[Document](https://dx.doi.org/10.1214/aos/1176344552)Cited by:[§B\.2](https://arxiv.org/html/2605.30443#A2.SS2.p3.2)\.
- A\. Ghosh and K\. Sarkar \(2020\)Irony detection in bengali tweets: a new dataset, experimentation and results\.InComputational Intelligence in Data Science,A\. Chandrabose, U\. Furbach, A\. Ghosh, and A\. Kumar M\. \(Eds\.\),Cham,pp\. 112–127\.External Links:[Document](https://dx.doi.org/10.1007/978-3-030-63467-4%5F9),ISBN 978\-3\-030\-63467\-4Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.16.16.3.1.1)\.
- A\. Grattafiori, A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Vaughan, A\. Yang, A\. Fan, A\. Goyal, A\. Hartshorn, A\. Yang, A\. Mitra, A\. Sravankumar, A\. Korenev, A\. Hinsvark, A\. Rao, A\. Zhang, A\. Rodriguez, A\. Gregerson, A\. Spataru, B\. Roziere, B\. Biron, B\. Tang, B\. Chern, C\. Caucheteux, C\. Nayak, C\. Bi, C\. Marra, C\. McConnell, C\. Keller, C\. Touret, C\. Wu, C\. Wong, C\. C\. Ferrer, C\. Nikolaidis, D\. Allonsius, D\. Song, D\. Pintz, D\. Livshits, D\. Wyatt, D\. Esiobu, D\. Choudhary, D\. Mahajan, D\. Garcia\-Olano, D\. Perino, D\. Hupkes, E\. Lakomkin, E\. AlBadawy, E\. Lobanova, E\. Dinan, E\. M\. Smith, F\. Radenovic, F\. Guzmán, F\. Zhang, G\. Synnaeve, G\. Lee, G\. L\. Anderson, G\. Thattai, G\. Nail, G\. Mialon, G\. Pang, G\. Cucurell, H\. Nguyen, H\. Korevaar, H\. Xu, H\. Touvron, I\. Zarov, I\. A\. Ibarra, I\. Kloumann, I\. Misra, I\. Evtimov, J\. Zhang, J\. Copet, J\. Lee, J\. Geffert, J\. Vranes, J\. Park, J\. Mahadeokar, J\. Shah, J\. van der Linde, J\. Billock, J\. Hong, J\. Lee, J\. Fu, J\. Chi, J\. Huang, J\. Liu, J\. Wang, J\. Yu, J\. Bitton, J\. Spisak, J\. Park, J\. Rocca, J\. Johnstun, J\. Saxe, J\. Jia, K\. V\. Alwala, K\. Prasad, K\. Upasani, K\. Plawiak, K\. Li, K\. Heafield, K\. Stone, K\. El\-Arini, K\. Iyer, K\. Malik, K\. Chiu, K\. Bhalla, K\. Lakhotia, L\. Rantala\-Yeary, L\. van der Maaten, L\. Chen, L\. Tan, L\. Jenkins, L\. Martin, L\. Madaan, L\. Malo, L\. Blecher, L\. Landzaat, L\. de Oliveira, M\. Muzzi, M\. Pasupuleti, M\. Singh, M\. Paluri, M\. Kardas, M\. Tsimpoukelli, M\. Oldham, M\. Rita, M\. Pavlova, M\. Kambadur, M\. Lewis, M\. Si, M\. K\. Singh, M\. Hassan, N\. Goyal, N\. Torabi, N\. Bashlykov, N\. Bogoychev, N\. Chatterji, N\. Zhang, O\. Duchenne, O\. Çelebi, P\. Alrassy, P\. Zhang, P\. Li, P\. Vasic, P\. Weng, P\. Bhargava, P\. Dubal, P\. Krishnan, P\. S\. Koura, P\. Xu, Q\. He, Q\. Dong, R\. Srinivasan, R\. Ganapathy, R\. Calderer, R\. S\. Cabral, R\. Stojnic, R\. Raileanu, R\. Maheswari, R\. Girdhar, R\. Patel, R\. Sauvestre, R\. Polidoro, R\. Sumbaly, R\. Taylor, R\. Silva, R\. Hou, R\. Wang, S\. Hosseini, S\. Chennabasappa, S\. Singh, S\. Bell, S\. S\. Kim, S\. Edunov, S\. Nie, S\. Narang, S\. Raparthy, S\. Shen, S\. Wan, S\. Bhosale, S\. Zhang, S\. Vandenhende, S\. Batra, S\. Whitman, S\. Sootla, S\. Collot, S\. Gururangan, S\. Borodinsky, T\. Herman, T\. Fowler, T\. Sheasha, T\. Georgiou, T\. Scialom, T\. Speckbacher, T\. Mihaylov, T\. Xiao, U\. Karn, V\. Goswami, V\. Gupta, V\. Ramanathan, V\. Kerkez, V\. Gonguet, V\. Do, V\. Vogeti, V\. Albiero, V\. Petrovic, W\. Chu, W\. Xiong, W\. Fu, W\. Meers, X\. Martinet, X\. Wang, X\. Wang, X\. E\. Tan, X\. Xia, X\. Xie, X\. Jia, X\. Wang, Y\. Goldschlag, Y\. Gaur, Y\. Babaei, Y\. Wen, Y\. Song, Y\. Zhang, Y\. Li, Y\. Mao, Z\. D\. Coudert, Z\. Yan, Z\. Chen, Z\. Papakipos, A\. Singh, A\. Srivastava, A\. Jain, A\. Kelsey, A\. Shajnfeld, A\. Gangidi, A\. Victoria, A\. Goldstand, A\. Menon, A\. Sharma, A\. Boesenberg, A\. Baevski, A\. Feinstein, A\. Kallet, A\. Sangani, A\. Teo, A\. Yunus, A\. Lupu, A\. Alvarado, A\. Caples, A\. Gu, A\. Ho, A\. Poulton, A\. Ryan, A\. Ramchandani, A\. Dong, A\. Franco, A\. Goyal, A\. Saraf, A\. Chowdhury, A\. Gabriel, A\. Bharambe, A\. Eisenman, A\. Yazdan, B\. James, B\. Maurer, B\. Leonhardi, B\. Huang, B\. Loyd, B\. D\. Paola, B\. Paranjape, B\. Liu, B\. Wu, B\. Ni, B\. Hancock, B\. Wasti, B\. Spence, B\. Stojkovic, B\. Gamido, B\. Montalvo, C\. Parker, C\. Burton, C\. Mejia, C\. Liu, C\. Wang, C\. Kim, C\. Zhou, C\. Hu, C\. Chu, C\. Cai, C\. Tindal, C\. Feichtenhofer, C\. Gao, D\. Civin, D\. Beaty, D\. Kreymer, D\. Li, D\. Adkins, D\. Xu, D\. Testuggine, D\. David, D\. Parikh, D\. Liskovich, D\. Foss, D\. Wang, D\. Le, D\. Holland, E\. Dowling, E\. Jamil, E\. Montgomery, E\. Presani, E\. Hahn, E\. Wood, E\. Le, E\. Brinkman, E\. Arcaute, E\. Dunbar, E\. Smothers, F\. Sun, F\. Kreuk, F\. Tian, F\. Kokkinos, F\. Ozgenel, F\. Caggioni, F\. Kanayet, F\. Seide, G\. M\. Florez, G\. Schwarz, G\. Badeer, G\. Swee, G\. Halpern, G\. Herman, G\. Sizov, Guangyi, Zhang, G\. Lakshminarayanan, H\. Inan, H\. Shojanazeri, H\. Zou, H\. Wang, H\. Zha, H\. Habeeb, H\. Rudolph, H\. Suk, H\. Aspegren, H\. Goldman, H\. Zhan, I\. Damlaj, I\. Molybog, I\. Tufanov, I\. Leontiadis, I\. Veliche, I\. Gat, J\. Weissman, J\. Geboski, J\. Kohli, J\. Lam, J\. Asher, J\. Gaya, J\. Marcus, J\. Tang, J\. Chan, J\. Zhen, J\. Reizenstein, J\. Teboul, J\. Zhong, J\. Jin, J\. Yang, J\. Cummings, J\. Carvill, J\. Shepard, J\. McPhie, J\. Torres, J\. Ginsburg, J\. Wang, K\. Wu, K\. H\. U, K\. Saxena, K\. Khandelwal, K\. Zand, K\. Matosich, K\. Veeraraghavan, K\. Michelena, K\. Li, K\. Jagadeesh, K\. Huang, K\. Chawla, K\. Huang, L\. Chen, L\. Garg, L\. A, L\. Silva, L\. Bell, L\. Zhang, L\. Guo, L\. Yu, L\. Moshkovich, L\. Wehrstedt, M\. Khabsa, M\. Avalani, M\. Bhatt, M\. Mankus, M\. Hasson, M\. Lennie, M\. Reso, M\. Groshev, M\. Naumov, M\. Lathi, M\. Keneally, M\. Liu, M\. L\. Seltzer, M\. Valko, M\. Restrepo, M\. Patel, M\. Vyatskov, M\. Samvelyan, M\. Clark, M\. Macey, M\. Wang, M\. J\. Hermoso, M\. Metanat, M\. Rastegari, M\. Bansal, N\. Santhanam, N\. Parks, N\. White, N\. Bawa, N\. Singhal, N\. Egebo, N\. Usunier, N\. Mehta, N\. P\. Laptev, N\. Dong, N\. Cheng, O\. Chernoguz, O\. Hart, O\. Salpekar, O\. Kalinli, P\. Kent, P\. Parekh, P\. Saab, P\. Balaji, P\. Rittner, P\. Bontrager, P\. Roux, P\. Dollar, P\. Zvyagina, P\. Ratanchandani, P\. Yuvraj, Q\. Liang, R\. Alao, R\. Rodriguez, R\. Ayub, R\. Murthy, R\. Nayani, R\. Mitra, R\. Parthasarathy, R\. Li, R\. Hogan, R\. Battey, R\. Wang, R\. Howes, R\. Rinott, S\. Mehta, S\. Siby, S\. J\. Bondu, S\. Datta, S\. Chugh, S\. Hunt, S\. Dhillon, S\. Sidorov, S\. Pan, S\. Mahajan, S\. Verma, S\. Yamamoto, S\. Ramaswamy, S\. Lindsay, S\. Lindsay, S\. Feng, S\. Lin, S\. C\. Zha, S\. Patil, S\. Shankar, S\. Zhang, S\. Zhang, S\. Wang, S\. Agarwal, S\. Sajuyigbe, S\. Chintala, S\. Max, S\. Chen, S\. Kehoe, S\. Satterfield, S\. Govindaprasad, S\. Gupta, S\. Deng, S\. Cho, S\. Virk, S\. Subramanian, S\. Choudhury, S\. Goldman, T\. Remez, T\. Glaser, T\. Best, T\. Koehler, T\. Robinson, T\. Li, T\. Zhang, T\. Matthews, T\. Chou, T\. Shaked, V\. Vontimitta, V\. Ajayi, V\. Montanez, V\. Mohan, V\. S\. Kumar, V\. Mangla, V\. Ionescu, V\. Poenaru, V\. T\. Mihailescu, V\. Ivanov, W\. Li, W\. Wang, W\. Jiang, W\. Bouaziz, W\. Constable, X\. Tang, X\. Wu, X\. Wang, X\. Wu, X\. Gao, Y\. Kleinman, Y\. Chen, Y\. Hu, Y\. Jia, Y\. Qi, Y\. Li, Y\. Zhang, Y\. Zhang, Y\. Adi, Y\. Nam, Yu, Wang, Y\. Zhao, Y\. Hao, Y\. Qian, Y\. Li, Y\. He, Z\. Rait, Z\. DeVito, Z\. Rosnbrick, Z\. Wen, Z\. Yang, Z\. Zhao, and Z\. Ma \(2024\)The llama 3 herd of models\.External Links:2407\.21783,[Link](https://arxiv.org/abs/2407.21783)Cited by:[§3](https://arxiv.org/html/2605.30443#S3.p2.1)\.
- D\. Gurgurov, Y\. A\. Ghussin, T\. Baeumel, C\. Chou, P\. Schramowski, M\. Mosbach, J\. van Genabith, and S\. Ostermann \(2026\)CLaS\-bench: a cross\-lingual alignment and steering benchmark\.External Links:2601\.08331,[Link](https://arxiv.org/abs/2601.08331)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p2.1)\.
- K\. Hämmerl, J\. Libovický, and A\. Fraser \(2024\)Understanding cross\-lingual Alignment—A survey\.InFindings of the Association for Computational Linguistics: ACL 2024,L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 10922–10943\.External Links:[Link](https://aclanthology.org/2024.findings-acl.649/),[Document](https://dx.doi.org/10.18653/v1/2024.findings-acl.649)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p2.1)\.
- Q\. He, S\. Cheng, Z\. Li, R\. Xie, and Y\. Xiao \(2022\)Can pre\-trained language models interpret similes as smart as human?\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),S\. Muresan, P\. Nakov, and A\. Villavicencio \(Eds\.\),Dublin, Ireland,pp\. 7875–7887\.External Links:[Link](https://aclanthology.org/2022.acl-long.543/),[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.543)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.4.4.3.1.1)\.
- IUSS Neurolinguistics and Experimental Pragmatics Laboratory \(NEPLab\) \(2025\)ERC\_Cog PROMENADE – WP1: Figurative Archive\.Note:ZenodoDataset, Version v5External Links:[Document](https://dx.doi.org/10.5281/zenodo.17829093),[Link](https://doi.org/10.5281/zenodo.17829093)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.24.24.3.1.1)\.
- H\. Jang, Q\. Yu, and D\. Frassinelli \(2023\)Figurative language processing: a linguistically informed feature analysis of the behavior of language models and humans\.InFindings of the Association for Computational Linguistics: ACL 2023,A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 9816–9832\.External Links:[Link](https://aclanthology.org/2023.findings-acl.622/),[Document](https://dx.doi.org/10.18653/v1/2023.findings-acl.622)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p3.1)\.
- M\. F\. Khan, S\.M\. S\. Shifath, and M\. S\. Islam \(2022\)BAN\-cap: a multi\-purpose English\-Bangla image descriptions dataset\.InProceedings of the Thirteenth Language Resources and Evaluation Conference,N\. Calzolari, F\. Béchet, P\. Blache, K\. Choukri, C\. Cieri, T\. Declerck, S\. Goggi, H\. Isahara, B\. Maegaard, J\. Mariani, H\. Mazo, J\. Odijk, and S\. Piperidis \(Eds\.\),Marseille, France,pp\. 6855–6865\.External Links:[Link](https://aclanthology.org/2022.lrec-1.740/)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.18.18.3.1.1)\.
- K\. Konen, S\. Jentzsch, D\. Diallo, P\. Schütt, O\. Bensch, R\. El Baff, D\. Opitz, and T\. Hecking \(2024\)Style vectors for steering generative large language models\.InFindings of the Association for Computational Linguistics: EACL 2024,Y\. Graham and M\. Purver \(Eds\.\),St\. Julian’s, Malta,pp\. 782–802\.External Links:[Link](https://aclanthology.org/2024.findings-eacl.52/),[Document](https://dx.doi.org/10.18653/v1/2024.findings-eacl.52)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p1.1)\.
- H\. Lai and M\. Nissim \(2024\)A survey on automatic generation of figurative language: from rule\-based systems to large language models\.ACM Comput\. Surv\.56\(10\)\.External Links:ISSN 0360\-0300,[Link](https://doi.org/10.1145/3654795),[Document](https://dx.doi.org/10.1145/3654795)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p3.1)\.
- K\. Li, O\. Patel, F\. Viégas, H\. Pfister, and M\. Wattenberg \(2023\)Inference\-time intervention: eliciting truthful answers from a language model\.InAdvances in Neural Information Processing Systems,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),Vol\.36,pp\. 41451–41530\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/81b8390039b7302c909cb769f8b6cd93-Paper-Conference.pdf)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p1.1)\.
- X\. Li, C\. Xu, X\. Wang, W\. Lan, Z\. Jia, G\. Yang, and J\. Xu \(2019\)COCO\-cn for cross\-lingual image tagging, captioning, and retrieval\.IEEE Transactions on Multimedia21\(9\),pp\. 2347–2360\.External Links:[Document](https://dx.doi.org/10.1109/TMM.2019.2896494)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.13.13.3.1.1)\.
- B\. Liang, Z\. Lin, B\. Qin, and R\. Xu \(2022\)Topic\-oriented sarcasm detection: new task, new dataset and new method\.InProceedings of the 21st Chinese National Conference on Computational Linguistics,pp\. 557–568\.External Links:[Link](https://aclanthology.org/2022.ccl-1.50)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.12.12.3.1.1)\.
- T\. Lin, M\. Maire, S\. Belongie, J\. Hays, P\. Perona, D\. Ramanan, P\. Dollár, and C\. L\. Zitnick \(2014\)Microsoft coco: common objects in context\.InComputer Vision – ECCV 2014,D\. Fleet, T\. Pajdla, B\. Schiele, and T\. Tuytelaars \(Eds\.\),Cham,pp\. 740–755\.External Links:[Document](https://dx.doi.org/10.1007/978-3-319-10602-1%5F48),ISBN 978\-3\-319\-10602\-1Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.7.7.3.1.1),[§3](https://arxiv.org/html/2605.30443#S3.p3.1)\.
- A\. H\. Liu, K\. Khandelwal, S\. Subramanian, V\. Jouault, A\. Rastogi, A\. Sadé, A\. Jeffares, A\. Jiang, A\. Cahill, A\. Gavaudan, A\. Sablayrolles, A\. Héliou, A\. You, A\. Ehrenberg, A\. Lo, A\. Eliseev, A\. Calvi, A\. Sooriyarachchi, B\. Bout, B\. Rozière, B\. D\. Monicault, C\. Lanfranchi, C\. Barreau, C\. Courtot, D\. Grattarola, D\. Dabert, D\. de las Casas, E\. Chane\-Sane, F\. Ahmed, G\. Berrada, G\. Ecrepont, G\. Guinet, G\. Novikov, G\. Kunsch, G\. Lample, G\. Martin, G\. Gupta, J\. Ludziejewski, J\. Rute, J\. Studnia, J\. Amar, J\. Delas, J\. S\. Roberts, K\. Yadav, K\. Chandu, K\. Jain, L\. Aitchison, L\. Fainsin, L\. Blier, L\. Zhao, L\. Martin, L\. Saulnier, L\. Gao, M\. Buyl, M\. Jennings, M\. Pellat, M\. Prins, M\. Poirée, M\. Guillaumin, M\. Dinot, M\. Futeral, M\. Darrin, M\. Augustin, M\. Chiquier, M\. Schimpf, N\. Grinsztajn, N\. Gupta, N\. Raghuraman, O\. Bousquet, O\. Duchenne, P\. Wang, P\. von Platen, P\. Jacob, P\. Wambergue, P\. Kurylowicz, P\. R\. Muddireddy, P\. Chagniot, P\. Stock, P\. Agrawal, Q\. Torroba, R\. Sauvestre, R\. Soletskyi, R\. Menneer, S\. Vaze, S\. Barry, S\. Gandhi, S\. Waghjale, S\. Gandhi, S\. Ghosh, S\. Mishra, S\. Aithal, S\. Antoniak, T\. L\. Scao, T\. Cachet, T\. S\. Sorg, T\. Lavril, T\. N\. Saada, T\. Chabal, T\. Foubert, T\. Robert, T\. Wang, T\. Lawson, T\. Bewley, T\. Bewley, T\. Edwards, U\. Jamil, U\. Tomasini, V\. Nemychnikova, V\. Phung, V\. Maladière, V\. Richard, W\. Bouaziz, W\. Li, W\. Marshall, X\. Li, X\. Yang, Y\. E\. Ouahidi, Y\. Wang, Y\. Tang, and Z\. Ramzi \(2026\)Ministral 3\.External Links:2601\.08584,[Link](https://arxiv.org/abs/2601.08584)Cited by:[§3](https://arxiv.org/html/2605.30443#S3.p2.1)\.
- L\. Liu, X\. Hu, W\. Song, R\. Fu, T\. Liu, and G\. Hu \(2018\)Neural multitask learning for simile recognition\.InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,E\. Riloff, D\. Chiang, J\. Hockenmaier, and J\. Tsujii \(Eds\.\),Brussels, Belgium,pp\. 1543–1553\.External Links:[Link](https://aclanthology.org/D18-1183/),[Document](https://dx.doi.org/10.18653/v1/D18-1183)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.10.10.3.1.1),[§2](https://arxiv.org/html/2605.30443#S2.p3.1)\.
- S\. K\. Lora, G\. M\. Shahariar, T\. Nazmin, N\. N\. Rahman, R\. Rahman, M\. Bhuiyan, and F\. M\. Shah \(2024\)Ben\-sarc: a self\-annotated corpus for sarcasm detection from bengali social media comments and its baseline evaluation\.Natural Language Processing,pp\. 1–26\.External Links:[Document](https://dx.doi.org/10.1017/nlp.2024.11)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.17.17.3.1.1)\.
- G\. Maraia, L\. Ranaldi, M\. Valentino, and F\. M\. Zanzotto \(2026\)Can activation steering generalize across languages? a study on syllogistic reasoning in language models\.InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics \(Volume 1: Long Papers\),V\. Demberg, K\. Inui, and L\. Marquez \(Eds\.\),Rabat, Morocco,pp\. 2739–2753\.External Links:[Link](https://aclanthology.org/2026.eacl-long.125/),[Document](https://dx.doi.org/10.18653/v1/2026.eacl-long.125),ISBN 979\-8\-89176\-380\-7Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p2.1)\.
- Q\. McNemar \(1947\)Note on the sampling error of the difference between correlated proportions or percentages\.Psychometrika12\(2\),pp\. 153–157\.External Links:[Document](https://dx.doi.org/10.1007/BF02295996)Cited by:[§B\.2](https://arxiv.org/html/2605.30443#A2.SS2.p2.6)\.
- M\. Mohler, M\. Brunson, B\. Rink, and M\. Tomlinson \(2016\)Introducing the LCC metaphor datasets\.InProceedings of the Tenth International Conference on Language Resources and Evaluation \(LREC’16\),N\. Calzolari, K\. Choukri, T\. Declerck, S\. Goggi, M\. Grobelnik, B\. Maegaard, J\. Mariani, H\. Mazo, A\. Moreno, J\. Odijk, and S\. Piperidis \(Eds\.\),Portorož, Slovenia\.External Links:[Link](https://aclanthology.org/L16-1668/)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.20.20.3.1.1),[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.3.3.3.1.1)\.
- L\. S\. Montesinos, S\. Buján, D\. Bardanca, and P\. Gamallo \(2026\)Improving machine translation of idioms: a Spanish–Galician parallel dataset and synthetic augmentation approach\.InProceedings of the 17th International Conference on Computational Processing of Portuguese \(PROPOR 2026\) \- Vol\. 1,M\. Souza, I\. de\-Dios\-Flores, D\. Santos, L\. Freitas, J\. W\. d\. C\. Souza, and E\. Ribeiro \(Eds\.\),Salvador, Brazil,pp\. 980–987\.External Links:[Link](https://aclanthology.org/2026.propor-1.99/),ISBN 979\-8\-89176\-387\-6Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.19.19.3.1.1)\.
- R\. Ortega\-Bueno, F\. Rangel, D\. I\. Hernández Farías, P\. Rosso, M\. Montes\-y\-Gómez, and J\. E\. Medina\-Pagola \(2019\)Overview of the task on irony detection in spanish variants\.InProceedings of the Iberian Languages Evaluation Forum \(IberLEF 2019\),CEUR Workshop Proceedings, Vol\.2421,Bilbao, Spain,pp\. 229–256\.External Links:[Link](https://ceur-ws.org/Vol-2421/IroSvA_overview.pdf)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.21.21.3.1.1)\.
- S\. Park, H\. Choi, M\. Kim, S\. An, X\. Wang, G\. Choi, and H\. Kim \(2025\)FLUID QA: a multilingual benchmark for figurative language usage in dialogue across English, Chinese, and Korean\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,C\. Christodoulopoulos, T\. Chakraborty, C\. Rose, and V\. Peng \(Eds\.\),Suzhou, China,pp\. 30280–30294\.External Links:[Link](https://aclanthology.org/2025.emnlp-main.1540/),[Document](https://dx.doi.org/10.18653/v1/2025.emnlp-main.1540),ISBN 979\-8\-89176\-332\-6Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p3.1)\.
- P\. Piccirilli, A\. Fraser, and S\. Schulte im Walde \(2024\)VOLIMET: a parallel corpus of literal and metaphorical verb\-object pairs for English–German and English–French\.InProceedings of the 13th Joint Conference on Lexical and Computational Semantics \(\*SEM 2024\),D\. Bollegala and V\. Shwartz \(Eds\.\),Mexico City, Mexico,pp\. 222–237\.External Links:[Link](https://aclanthology.org/2024.starsem-1.18),[Document](https://dx.doi.org/10.18653/v1/2024.starsem-1.18)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.29.29.3.1.1)\.
- J\. Qiang, Y\. Li, C\. Zhang, Y\. Li, Y\. Zhu, Y\. Yuan, and X\. Wu \(2023\)Chinese idiom paraphrasing\.Transactions of the Association for Computational Linguistics11,pp\. 740–754\.External Links:ISSN 2307\-387X,[Document](https://dx.doi.org/10.1162/tacl%5Fa%5F00572),[Link](https://doi.org/10.1162/tacl_a_00572),https://direct\.mit\.edu/tacl/article\-pdf/doi/10\.1162/tacl\_a\_00572/2143279/tacl\_a\_00572\.pdfCited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.8.8.3.1.1)\.
- N\. Rimsky, N\. Gabrieli, J\. Schulz, M\. Tong, E\. Hubinger, and A\. Turner \(2024\)Steering llama 2 via contrastive activation addition\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 15504–15522\.External Links:[Link](https://aclanthology.org/2024.acl-long.828/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.828)Cited by:[§1](https://arxiv.org/html/2605.30443#S1.p3.1),[§2](https://arxiv.org/html/2605.30443#S2.p1.1),[§3](https://arxiv.org/html/2605.30443#S3.p5.5)\.
- S\. Ruder, A\. Søgaard, and I\. Vulić \(2019\)Unsupervised cross\-lingual representation learning\.InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts,P\. Nakov and A\. Palmer \(Eds\.\),Florence, Italy,pp\. 31–38\.External Links:[Link](https://aclanthology.org/P19-4007/),[Document](https://dx.doi.org/10.18653/v1/P19-4007)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p2.1)\.
- A\. Sakhawat, S\. A\. Parveen, M\. R\. Amin, T\. Khatun, S\. A\. Mahmud, and M\. S\. Islam \(2026\)When words don’t mean what they say: figurative understanding in bengali idioms\.InProceedings of the Fifteenth Language Resources and Evaluation Conference \(LREC 2026\),S\. Piperidis, N\. Bel, H\. van den Heuvel, N\. Ide, S\. Krek, and A\. Toral \(Eds\.\),Palma, Mallorca, Spain,pp\. 6870–6879\.External Links:[Document](https://dx.doi.org/10.63317/546w2cys6m6t)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.14.14.3.1.1)\.
- A\. Scaiella, D\. Croce, and R\. Basili \(2019\)Large scale datasets for image and video captioning in italian\.Italian Journal of Computational Linguistics2\(5\),pp\. 49–60\.External Links:[Link](http://www.ai-lc.it/IJCoL/v5n2/IJCOL_5_2_3___scaiella_et_al.pdf)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.27.27.3.1.1)\.
- H\. Schwenk, V\. Chaudhary, S\. Sun, H\. Gong, and F\. Guzmán \(2019\)WikiMatrix: mining 135m parallel sentences in 1620 language pairs from wikipedia\.External Links:1907\.05791,[Link](https://arxiv.org/abs/1907.05791)Cited by:[Appendix D](https://arxiv.org/html/2605.30443#A4.p3.1),[§4](https://arxiv.org/html/2605.30443#S4.p5.1)\.
- U\. Sentsova, D\. Ciminari, J\. V\. Genabith, and C\. España\-Bonet \(2025\)MultiCoPIE: a multilingual corpus of potentially idiomatic expressions for cross\-lingual PIE disambiguation\.InProceedings of the 21st Workshop on Multiword Expressions \(MWE 2025\),A\. Kr\. Ojha, V\. Giouli, V\. B\. Mititelu, M\. Constant, G\. Korvel, A\. S\. Doğruöz, and A\. Rademaker \(Eds\.\),Albuquerque, New Mexico, U\.S\.A\.,pp\. 67–81\.External Links:[Link](https://aclanthology.org/2025.mwe-1.8/),[Document](https://dx.doi.org/10.18653/v1/2025.mwe-1.8),ISBN 979\-8\-89176\-243\-5Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.23.23.3.1.1)\.
- K\. Sharma and R\. Trivedi \(2026\)COLD\-steer: steering large language models via in\-context one\-step learning dynamics\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=afV4qzquBN)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p1.1)\.
- D\. Stap, E\. Hasler, B\. Byrne, C\. Monz, and K\. Tran \(2024\)The fine\-tuning paradox: boosting translation quality without sacrificing llm abilities\.External Links:2405\.20089,[Link](https://arxiv.org/abs/2405.20089)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.28.28.3.1.1)\.
- K\. Stowe, P\. Utama, and I\. Gurevych \(2022\)IMPLI: investigating NLI models’ performance on figurative language\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),S\. Muresan, P\. Nakov, and A\. Villavicencio \(Eds\.\),Dublin, Ireland,pp\. 5375–5388\.External Links:[Link](https://aclanthology.org/2022.acl-long.369/),[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.369)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p3.1)\.
- E\. Tanwar, S\. Dutta, M\. Borthakur, and T\. Chakraborty \(2023\)Multilingual LLMs are better cross\-lingual in\-context learners with alignment\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 6292–6307\.External Links:[Link](https://aclanthology.org/2023.acl-long.346/),[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.346)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p2.1)\.
- S\. Tedeschi, F\. Martelli, and R\. Navigli \(2022\)ID10M: idiom identification in 10 languages\.InFindings of the Association for Computational Linguistics: NAACL 2022,Seattle, United States\.External Links:[Link](https://aclanthology.org/2022.findings-naacl.208),[Document](https://dx.doi.org/10.18653/v1/2022.findings-naacl.208)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.2.2.3.1.1)\.
- A\. M\. Turner, L\. Thiergart, G\. Leech, D\. Udell, J\. J\. Vazquez, U\. Mini, and M\. MacDiarmid \(2024\)Steering language models with activation engineering\.External Links:2308\.10248,[Link](https://arxiv.org/abs/2308.10248)Cited by:[§1](https://arxiv.org/html/2605.30443#S1.p3.1),[§2](https://arxiv.org/html/2605.30443#S2.p1.1)\.
- C\. Van Hee, E\. Lefever, and V\. Hoste \(2018\)SemEval\-2018 task 3: irony detection in English tweets\.InProceedings of the 12th International Workshop on Semantic Evaluation,M\. Apidianaki, S\. M\. Mohammad, J\. May, E\. Shutova, S\. Bethard, and M\. Carpuat \(Eds\.\),New Orleans, Louisiana\.External Links:[Link](https://aclanthology.org/S18-1005/),[Document](https://dx.doi.org/10.18653/v1/S18-1005)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.5.5.3.1.1)\.
- H\. Wang, P\. Minervini, and E\. Ponti \(2024\)Probing the emergence of cross\-lingual alignment during LLM training\.InFindings of the Association for Computational Linguistics: ACL 2024,L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 12159–12173\.External Links:[Link](https://aclanthology.org/2024.findings-acl.724/),[Document](https://dx.doi.org/10.18653/v1/2024.findings-acl.724)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p2.1)\.
- M\. Wang, H\. Adel, L\. Lange, Y\. Liu, E\. Nie, J\. Strötgen, and H\. Schuetze \(2025\)Lost in multilinguality: dissecting cross\-lingual factual inconsistency in transformer language models\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria,pp\. 5075–5094\.External Links:[Link](https://aclanthology.org/2025.acl-long.253/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.253),ISBN 979\-8\-89176\-251\-0Cited by:[§1](https://arxiv.org/html/2605.30443#S1.p2.1)\.
- Z\. Wen, R\. Wang, Q\. Wang, L\. Gui, Y\. Long, S\. Chen, B\. Liang, M\. Yang, and R\. Xu \(2025\)FGVIrony: a chinese dataset of fine\-grained verbal irony\.Information Processing & Management62\(5\),pp\. 104169\.External Links:ISSN 0306\-4573,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.ipm.2025.104169),[Link](https://www.sciencedirect.com/science/article/pii/S0306457325001104)Cited by:[Table 7](https://arxiv.org/html/2605.30443#A1.T7.1.11.11.3.1.1)\.
- A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv, C\. Zheng, D\. Liu, F\. Zhou, F\. Huang, F\. Hu, H\. Ge, H\. Wei, H\. Lin, J\. Tang, J\. Yang, J\. Tu, J\. Zhang, J\. Yang, J\. Yang, J\. Zhou, J\. Zhou, J\. Lin, K\. Dang, K\. Bao, K\. Yang, L\. Yu, L\. Deng, M\. Li, M\. Xue, M\. Li, P\. Zhang, P\. Wang, Q\. Zhu, R\. Men, R\. Gao, S\. Liu, S\. Luo, T\. Li, T\. Tang, W\. Yin, X\. Ren, X\. Wang, X\. Zhang, X\. Ren, Y\. Fan, Y\. Su, Y\. Zhang, Y\. Zhang, Y\. Wan, Y\. Liu, Z\. Wang, Z\. Cui, Z\. Zhang, Z\. Zhou, and Z\. Qiu \(2025\)Qwen3 technical report\.External Links:2505\.09388,[Link](https://arxiv.org/abs/2505.09388)Cited by:[§3](https://arxiv.org/html/2605.30443#S3.p2.1)\.
- H\. Zhang, K\. Chen, X\. Bai, X\. Li, Y\. Xiang, and M\. zhang \(2025\)Exploring the translation mechanism of large language models\.InAdvances in Neural Information Processing Systems,D\. Belgrave, C\. Zhang, H\. Lin, R\. Pascanu, P\. Koniusz, M\. Ghassemi, and N\. Chen \(Eds\.\),Vol\.38,pp\. 106539–106579\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2025/file/99367869dc65679f7bc243b45a23a92f-Paper-Conference.pdf)Cited by:[§1](https://arxiv.org/html/2605.30443#S1.p2.1)\.
- A\. Zou, L\. Phan, S\. Chen, J\. Campbell, P\. Guo, R\. Ren, A\. Pan, X\. Yin, M\. Mazeika, A\. Dombrowski, S\. Goel, N\. Li, M\. J\. Byun, Z\. Wang, A\. Mallen, S\. Basart, S\. Koyejo, D\. Song, M\. Fredrikson, J\. Z\. Kolter, and D\. Hendrycks \(2025\)Representation engineering: a top\-down approach to ai transparency\.External Links:2310\.01405,[Link](https://arxiv.org/abs/2310.01405)Cited by:[§2](https://arxiv.org/html/2605.30443#S2.p1.1)\.

## Appendix ADataset details and sampling strategy

Our experiments cover five figurative categories—idiom, metaphor, simile, irony, and sarcasm—in six languages: English \(en\), Chinese \(zh\), Bengali \(bn\), Spanish \(es\), Italian \(it\), and German \(de\)\. Table[8](https://arxiv.org/html/2605.30443#A1.T8)lists the evaluated language–category cells and their sample counts\.

### A\.1Dataset sources

Table[7](https://arxiv.org/html/2605.30443#A1.T7)lists the resources used for each language and category in our main experiments\.

LanguageCategoryDataset sourceEnglishIdiomID10M\(Tedeschiet al\.,[2022](https://arxiv.org/html/2605.30443#bib.bib51)\)EnglishMetaphorLCC Metaphor Datasets\(Mohleret al\.,[2016](https://arxiv.org/html/2605.30443#bib.bib53)\)EnglishSimileSimile interpretation dataset fromHeet al\.\([2022](https://arxiv.org/html/2605.30443#bib.bib54)\)EnglishIronySemEval\-2018 Task 3 English irony dataset\(Van Heeet al\.,[2018](https://arxiv.org/html/2605.30443#bib.bib52)\)EnglishSarcasmFLUTE\(Chakrabartyet al\.,[2022](https://arxiv.org/html/2605.30443#bib.bib16)\)EnglishCaptionMS COCO captions\(Linet al\.,[2014](https://arxiv.org/html/2605.30443#bib.bib66)\)ChineseIdiomChinese Idiom Paraphrasing dataset\(Qianget al\.,[2023](https://arxiv.org/html/2605.30443#bib.bib60)\)ChineseMetaphorCCL 2018 Chinese metaphor analysis dataset1ChineseSimileChinese simile recognition dataset\(Liuet al\.,[2018](https://arxiv.org/html/2605.30443#bib.bib19)\)ChineseIronyFGVIrony\(Wenet al\.,[2025](https://arxiv.org/html/2605.30443#bib.bib61)\)ChineseSarcasmTopic\-oriented Chinese sarcasm dataset\(Lianget al\.,[2022](https://arxiv.org/html/2605.30443#bib.bib62)\)ChineseCaptionCOCO\-CN\(Liet al\.,[2019](https://arxiv.org/html/2605.30443#bib.bib68)\)BengaliIdiomBengali idiom dataset fromSakhawatet al\.\([2026](https://arxiv.org/html/2605.30443#bib.bib44)\)BengaliMetaphorBengali figures\-of\-speech dataset fromDas and Ghosh \([2025](https://arxiv.org/html/2605.30443#bib.bib46)\)BengaliIronyBengali tweets irony dataset\(Ghosh and Sarkar,[2020](https://arxiv.org/html/2605.30443#bib.bib45)\)BengaliSarcasmBen\-Sarc\(Loraet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib47)\)BengaliCaptionBAN\-Cap\(Khanet al\.,[2022](https://arxiv.org/html/2605.30443#bib.bib69)\)SpanishIdiomSpanish–Galician idiom dataset\(Montesinoset al\.,[2026](https://arxiv.org/html/2605.30443#bib.bib55)\)SpanishMetaphorLCC Metaphor Datasets\(Mohleret al\.,[2016](https://arxiv.org/html/2605.30443#bib.bib53)\)SpanishIronyIroSvA Spanish irony dataset\(Ortega\-Buenoet al\.,[2019](https://arxiv.org/html/2605.30443#bib.bib56)\)SpanishCaptionMS\-COCO\-ES2ItalianIdiomMultiCoPIE\(Sentsovaet al\.,[2025](https://arxiv.org/html/2605.30443#bib.bib57)\)ItalianMetaphorERC\_Cog PROMENADE WP1 Figurative Archive\(IUSS Neurolinguistics and Experimental Pragmatics Laboratory \(NEPLab\),[2025](https://arxiv.org/html/2605.30443#bib.bib59)\)ItalianIronyEVALITA 2018 irony\-related resources\(Caselliet al\.,[2018](https://arxiv.org/html/2605.30443#bib.bib58)\)ItalianSarcasmEVALITA 2018 sarcasm\-related resources\(Caselliet al\.,[2018](https://arxiv.org/html/2605.30443#bib.bib58)\)ItalianCaptionItalian captioning dataset fromScaiellaet al\.\([2019](https://arxiv.org/html/2605.30443#bib.bib67)\)GermanIdiomIdiom data used inStapet al\.\([2024](https://arxiv.org/html/2605.30443#bib.bib48)\)GermanMetaphorVOLIMET\(Piccirilliet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib50)\)GermanIronyMultiPICo\(Casolaet al\.,[2024](https://arxiv.org/html/2605.30443#bib.bib49)\)GermanCaptionCOCO Karpathy OPUS German captions3
- 1
- 2
- 3

Table 7:Dataset sources for monolingual figurative examples and literal examples, used in the main experimentsTable 8:Figurative\-example counts for vector construction in each evaluated language–category cell\. An equal number of monolingual literal captions is used in the contrast set\. Validation and held\-out generation prompts are separate from these counts; dashes mark unevaluated cells\.
### A\.2Sampling strategy and data partitions

We keep vector construction, layer validation, and final behavioral testing disjoint\. For vector construction, every evaluated language–category cell contains a balanced figurative–literal contrast set, with up to 500 figurative examples and 500 monolingual literal captions\. German metaphor and German irony each use 200 figurative examples and 200 German literal captions because fewer usable public examples were available for those cells\. Cells marked with dashes in Table[8](https://arxiv.org/html/2605.30443#A1.T8)are not evaluated due to insufficient available public resources\. In total, we evaluate 24 language–category cells\.

For each evaluated cell, the vector\-construction partition is used to estimate the category direction in the corresponding construction language\.

### A\.3Contrastive examples

Contrast in our main experiment uses category\-specific figurative sentences and monolingual literal captions\. TableLABEL:tab:figurative\-category\-examplesprovides representative examples to clarify category boundaries\. Examples are quoted from source datasets for illustration only; where possible, we use neutral, everyday, non\-political items\.

## Appendix BBehavior evaluation and layer selection

We evaluate generated continuations with an LLM\-based detector\. For each target category, the detector asks whether a continuation contains that category\. Categories are evaluated independently, so a continuation may receive a positive label for more than one category\. Target Category Rate is the fraction of outputs for which the corresponding detector returnsYES\. We evaluate prompt–continuation coherence separately using the rubric in Section[B\.3](https://arxiv.org/html/2605.30443#A2.SS3)\.

Judge generation parameters\.To ensure deterministic and reproducible scoring, all responses from the DeepSeek\-v4\-flash\(DeepSeek\-AI,[2026](https://arxiv.org/html/2605.30443#bib.bib22)\)judge were generated using greedy decoding \(temperature = 0\.0\) with a maximum generation length of 256 tokens for both the required one\-sentence reasoning and the final categorical label or numerical score without truncating the output\.

### B\.1Generation settings

We use stochastic decoding for all unsteered and steered generation conditions\. Within each model, the unsteered baseline, monolingual steering, cross\-lingual steering, random\-vector controls, and geometry\-vector interventions use identical decoding parameters\. Following the generation configuration used for each model family, Qwen3 models use temperature=0\.7=0\.7, nucleus sampling withtop\_p=0\.8\\texttt\{top\\\_p\}=0\.8, andtop\_k=20\\texttt\{top\\\_k\}=20, whereas Llama\-3\.1\-8B\-Instruct and Ministral\-3\-8B\-Instruct use temperature=0\.7=0\.7,top\_p=0\.9\\texttt\{top\\\_p\}=0\.9, andtop\_k=50\\texttt\{top\\\_k\}=50\. All conditions use a maximum generation length of 1024 tokens\.

All experiments were run on eight NVIDIA RTX 6000 Ada Generation GPUs, with an estimated total compute cost of approximately 900 GPU\-hours\.

### B\.2Paired statistical comparisons

All test conditions within an evaluated cell use the same set ofN=500N=500held\-out prompts for each language\. LetyiA,yiB∈\{0,1\}y\_\{i\}^\{A\},y\_\{i\}^\{B\}\\in\\\{0,1\\\}denote the detector’s target\-category decisions for promptiiunder two compared generation conditions, such as unsteered \(AA\) and monolingual steering \(BB\)\. We first report the paired percentage\-point difference

Δ\(A,B\)=100\(1N∑i=1NyiB−1N∑i=1NyiA\)\.\\Delta\(A,B\)=100\\left\(\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}y\_\{i\}^\{B\}\-\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}y\_\{i\}^\{A\}\\right\)\.\(5\)
The analysis aligns conditions by the held\-out source sentence when the identifier is available and unique, and otherwise preserves the stored row order\. Letbbcount prompts changing from non\-target underAAto target underBB, and letcccount the reverse change\. Since the output decisions are paired nominal data, we compute an exact two\-sided McNemarpp\-value\(McNemar,[1947](https://arxiv.org/html/2605.30443#bib.bib64)\)from the discordant counts\(b,c\)\(b,c\):

p=min⁡\(1,2∑k=max⁡\(b,c\)b\+c\(b\+ck\)2−\(b\+c\)\)\.p=\\min\\\!\\left\(1,\\;2\\sum\_\{k=\\max\(b,c\)\}^\{b\+c\}\\binom\{b\+c\}\{k\}2^\{\-\(b\+c\)\}\\right\)\.\(6\)
The script also estimates uncertainty with 2,000 paired bootstrap resamples\(Efron,[1979](https://arxiv.org/html/2605.30443#bib.bib63)\)\. Each resample draws the prompt\-level differencesyiB−yiAy\_\{i\}^\{B\}\-y\_\{i\}^\{A\}with replacement and reports the 2\.5th and 97\.5th percentiles of the resampled mean differences as a 95% confidence interval\. The default base seed is 13, with a deterministic comparison\-specific offset\. For individual cell\-level comparisons, we additionally reportqq\-values adjusted via the Benjamini–Hochberg procedure\(Benjamini and Hochberg,[1995](https://arxiv.org/html/2605.30443#bib.bib65)\)to control the false discovery rate within each model–intervention\-family set\. For pooled summaries, correction is applied within the displayed comparison family\.

### B\.3Detector prompt definitions

TableLABEL:tab:detector\-prompt\-definitionsshows the shared detector instruction and definitions for the evaluated language–category cells\. In addition to target\-category detection, we evaluate whether each generated continuation is coherent with the user prompt\. TableLABEL:tab:coherence\-evaluator\-promptgives the rubric\. Coherence is scored on a 0–4 scale, where 4 denotes a fully coherent continuation and 0 denotes an incoherent or failed output\.

### B\.4Validation\-based layer selection

We select one intervention depth per model on the validation split and reuse that layer for monolingual, cross\-lingual, random\-vector, and geometry\-vector test experiments\. The validation sweep uses captions as prompt inputs\. We evaluate relative depths 0\.40, 0\.47, and 0\.55 with steering strength fixed at 1\.0\. For a model withLLtransformer layers, relative depthddis converted to an integer layer by

ℓ\(d\)=min⁡\(L−1,max⁡\(0,⌊Ld\+0\.5⌋\)\)\\ell\(d\)=\\min\\left\(L\-1,\\max\\left\(0,\\left\\lfloor Ld\+0\.5\\right\\rfloor\\right\)\\right\)\(7\)
which is half\-up rounding clipped to the valid layer range\. For each available language–category cell, we summarize the target\-category rate and mean coherence on the validation examples\.

For each language–category cell, we rank candidate depths by target\-category rate in descending order, mean coherence in descending order, and depth in ascending order\. We then aggregate by depth using mean target\-category rate, mean coherence, and the number of cell\-level wins\. Before the final choice, we apply a coherence gate: mean coherence larger than 2\.5 on the 0–4 coherence scale\. If at least one candidate depth passes this gate, only eligible depths are considered; otherwise the selection falls back to the full candidate set\. The final selected depth is the depth with the highest aggregate mean target\-category rate, then highest mean coherence, then largest number of cell wins, and finally the smallest depth if all previous criteria tie\.

Table[9](https://arxiv.org/html/2605.30443#A2.T9)reports the intervention depth selected on the validation split for each model\. These fixed layers are reused for all subsequent monolingual, cross\-lingual, random\-vector, and geometry\-vector test experiments\.

Table 9:Validation\-selected intervention depths and corresponding transformer layers used in the test experiments\.
### B\.5Random\-vector control

The random\-vector control tests whether changes in Target Category Rate can be explained by an arbitrary hidden\-state perturbation rather than by the learned direction\. For each model and evaluated language–category condition, we apply the random\-vector intervention at the same selected layer, prompt\-token positions, and strength as the corresponding learned\-vector intervention\.

For each random\-vector run, we sample a random direction in the model’s hidden\-state space and normalize it before intervention:

v^rand\(l\)=vrand\(l\)‖vrand\(l\)‖2,ht\(l\)⁣′=ht\(l\)\+αv^rand\(l\)\.\\hat\{v\}\_\{\\mathrm\{rand\}\}^\{\(l\)\}=\\frac\{v\_\{\\mathrm\{rand\}\}^\{\(l\)\}\}\{\\\|v\_\{\\mathrm\{rand\}\}^\{\(l\)\}\\\|\_\{2\}\},\\qquad h\_\{t\}^\{\(l\)\\prime\}=h\_\{t\}^\{\(l\)\}\+\\alpha\\hat\{v\}\_\{\\mathrm\{rand\}\}^\{\(l\)\}\.\(8\)We conduct the random\-vector experiment with three seeds \(0,11, and22\) and report mean Target Category Rate across the runs\. As in all learned\-vector interventions, we useα=1\.0\\alpha=1\.0\. This baseline matches layer, intervention strength, application positions, and perturbation magnitude while reducing dependence on a single sampled direction\.

## Appendix CDetailed monolingual steering results

Table[15](https://arxiv.org/html/2605.30443#A4.T15)reports confidence intervals and multiplicity\-adjusted significance statistics for the model–language aggregates summarized in Table[3](https://arxiv.org/html/2605.30443#S3.T3)\. Tables[13](https://arxiv.org/html/2605.30443#A4.T13)–[14](https://arxiv.org/html/2605.30443#A4.T14)then report the full monolingual steering results for each available language–category configuration\. Each cell gives the Target Category Rate under steering together with its percentage\-point change relative to unsteered generation\.

Across the four models, metaphor is the most consistently responsive category, with substantial gains in several languages\. Simile also responds strongly where it is available, but its evaluation is limited to English and Chinese\. Irony and sarcasm show greater model\- and language\-specific variation\.

The detailed tables also clarify the low Bengali averages in Table[3](https://arxiv.org/html/2605.30443#S3.T3)\. Bengali does not indicate a universal failure of steering: Qwen3\-32B obtains strong improvement for Bengali irony, and both Qwen models improve Bengali idiom\. Instead, the lower aggregate result reflects uneven category\-level behavior, particularly weak sarcasm steering and smaller gains for some model–category combinations\.

## Appendix DAuxiliary matched\-literal construction diagnostic

The primary experiments construct steering directions by contrasting category\-specific figurative examples with literal captions in the same language\.

Here we test an auxiliary construction\-sensitivity diagnostic where the primary caption negatives are replaced with source\-aligned negatives to explore the effect of different negative contrasts\.

We do monolingual steering for Llama\-3\.1\-8B\-Instruct and Qwen3\-8B across the 24 available language–category cells\. For each cell, the figurative positive examples remain unchanged, while the negative examples used to construct the steering direction are replaced\. The resulting caption\-built and matched\-literal\-built vectors are evaluated under the 100 same\-language held\-out samples from WikiMatrix\(Schwenket al\.,[2019](https://arxiv.org/html/2605.30443#bib.bib4)\)\. We hold fixed the intervention layer inherited from the primary pipeline, steering strength \(α=1\.0\\alpha=1\.0\), generation settings, and automatic evaluation procedure\.

Table[10](https://arxiv.org/html/2605.30443#A4.T10)reports the complete cell\-level comparison\. Across all 24 cells, Llama\-3\.1\-8B\-Instruct exhibits similar mean gains under the two construction choices:\+2\.4\+2\.4percentage points for the caption\-built vector and\+2\.6\+2\.6points for the matched\-literal\-built vector\. The matched\-literal construction preserves whether the primary effect is non\-negative or negative in 19 of 24 cells\. Qwen3\-8B remains positive on average under both constructions and improves from\+4\.0\+4\.0points for the caption\-built vector to\+5\.3\+5\.3points for the matched\-literal\-built vector, with the pattern retained in 20 of 24 cells\.

Aggregating the results from Table[10](https://arxiv.org/html/2605.30443#A4.T10)reveals that construction sensitivity is concentrated in specific figurative categories rather than occurring uniformly\. For Llama\-3\.1\-8B\-Instruct, the average effects are broadly similar across constructions, including positive mean gains for idiom, metaphor, simile, and sarcasm\. For Qwen3\-8B, matched\-literal construction substantially strengthens irony, increasing its mean gain from\+2\.3\+2\.3to\+13\.5\+13\.5points, while metaphor is more construction\-sensitive: its mean changes from\+5\.8\+5\.8points under caption\-built vectors to−1\.8\-1\.8points under matched\-literal\-built vectors, with the pattern retained in only 3 of 6 metaphor cells\. Overall, the diagnostic indicates that positive monolingual steering effects are not specific to caption\-built vectors\.

Table 10:Cell\-level comparison of caption\-built and matched\-literal\-built steering\-vector gains under the held\-out WikiMatrix literal\-validation protocol\. All evaluations are restricted to monolingual steering\. Gains are percentage\-point changes in Target Category Rate relative to the matching unsteered WikiMatrix baseline, evaluated at the intervention layer inherited from the primary pipeline withα=1\.0\\alpha=1\.0\. For every baseline and steered output file, rates are computed from the first 100 valid evaluated rows\. Pattern retained indicates whether the matched\-literal\-built vector preserves the caption\-built vector’s non\-negative versus negative result relative to baseline\.Table 11:Detailed monolingual steering results for Qwen3\-8B\. Each cell shows the steered target\-category rate followed by the percentage\-point change relative to the unsteered baseline\. Color intensity shows effect direction and magnitude; stars mark unadjusted exact two\-sided McNemar comparisons as defined in Appendix[B\.2](https://arxiv.org/html/2605.30443#A2.SS2): \*\*\* p<0\.001, \*\* p<0\.01, \* p<0\.05\.Table 12:Detailed monolingual steering results for Qwen3\-32B\. Each cell shows the steered target\-category rate followed by the percentage\-point change relative to the unsteered baseline\. Color intensity shows effect direction and magnitude; stars mark unadjusted exact two\-sided McNemar comparisons as defined in Appendix[B\.2](https://arxiv.org/html/2605.30443#A2.SS2): \*\*\* p<0\.001, \*\* p<0\.01, \* p<0\.05\.Table 13:Detailed monolingual steering results for Llama\-3\.1\-8B\-Instruct\. Each cell shows the steered target\-category rate followed by the percentage\-point change relative to the unsteered baseline\. Color intensity shows effect direction and magnitude; stars mark unadjusted exact two\-sided McNemar comparisons as defined in Appendix[B\.2](https://arxiv.org/html/2605.30443#A2.SS2): \*\*\* p<0\.001, \*\* p<0\.01, \* p<0\.05\.Table 14:Detailed monolingual steering results for Ministral\-3\-8B\-Instruct\. Each cell shows the steered target\-category rate followed by the percentage\-point change relative to the unsteered baseline\. Color intensity shows effect direction and magnitude; stars mark unadjusted exact two\-sided McNemar comparisons as defined in Appendix[B\.2](https://arxiv.org/html/2605.30443#A2.SS2): \*\*\* p<0\.001, \*\* p<0\.01, \* p<0\.05\.Table 15:Inference details for the monolingual summary in Table[3](https://arxiv.org/html/2605.30443#S3.T3)\.Δ\\Deltais the pooled paired percentage\-point change in Target Category Rate relative to unsteered generation\. Confidence intervals use 2,000 paired bootstrap resamples of prompt\-level decisions\. Theppcolumn reports exact two\-sided McNemar values; BHqqreports Benjamini–Hochberg adjustment across the six language aggregates within each model\.Wincounts available categories in which steering exceeds the unsteered baseline\.LanguageCategoryExampleExplanationEnglishLiteralA plane taking off in front of the ocean\.A direct scene description with no figurative comparison, idiomatic usage, or ironic intent\. This serves as the non\-figurative contrast class\.EnglishIdiomI was staying with himthrough thick and thin\.Through thick and thinis a fixed expression meaning to remain loyal through hardship\. Its meaning is conventional rather than literal\.EnglishIronyA £718 phone bill is a lovely email to wake up to\.The speaker calls an expensive phone bill alovelyemail, even though receiving such a bill is normally unpleasant\. The positive wording signals ironic meaning rather than literal enjoyment\.EnglishMetaphorThe moonsmiledat the stars in the sky\.This is metaphorical because it assigns a human action,smiled, to the moon\. The sentence is not meant literally\.EnglishSimileThe jazz solo soundedassmoothassandpaper\.This is a simile because it makes an explicit comparison usingas … as\. The comparison is figurative rather than literal\.EnglishSarcasmThe fact that I had to spend my entire day at the DMV and then use a sick daymakes me really happy\!This is sarcasm because the positive evaluation,makes me really happy, clashes with an obviously unpleasant situation, signaling the opposite intended meaning\.ChineseLiteral一只黑白相间的狗站在草地上。This literally means “A black\-and\-white dog stands on the grass\.” It directly describes a visible scene and serves as the non\-figurative contrast class\.ChineseIdiom他在关键时刻总是临阵磨枪。Literally, this sentence means “he always sharpens his spear only when approaching the battlefield at a critical moment\.” It is categorized as a Chinese idiom because “临阵磨枪” is a conventionalized expression meaning to make last\-minute preparations right before something important\.ChineseIrony智能折叠型爆反装甲，三星最新力作，甚至能用来打电话。This says, roughly, “Smart folding reactive armor, Samsung’s latest masterpiece, can even be used to make phone calls\.” The exaggerated product description and the phrase “even make phone calls” ironically mock the phone rather than sincerely praising it\.ChineseMetaphor他压了压心头的怒火。Literally, “He pressed down the anger\-fire in his heart\.” The key metaphorical word is “怒火” \(“anger\-fire”\), which maps the abstract emotion of anger onto the concrete image of fire\.ChineseSimile这时我急得就像热锅上的蚂蚁。Literally, “I was as anxious as an ant on a hot pan\.” It is a simile because it uses “像” \(“like/as”\) to make an explicit comparison\.ChineseSarcasm苹果iPhone XI配置曝光： 配备三摄、水下、黑暗模式，不如直接使用墨水屏。超省电。This says, roughly, “The iPhone XI is reported to include triple cameras, an underwater mode, and a dark mode; it might as well use an e\-ink screen, since that would be very power\-saving\.” The suggestion is not intended literally; it playfully exaggerates the product\-feature discussion, making the sentence sarcastic\.BengaliLiteraldujn oelak brph O oemoeghr majhkhaoen EkoiT pathoerr Upr daNNoirhoey Aaoech\.This literally describes two people standing on a rock among snow and clouds\. It is a direct visual description and serves as the Bengali non\-figurative contrast class\.BengaliIdiomEI bYapaoermatha glaoenaUoict ny\.This means roughly, “One should not stick one’s head into this matter\.” The expression “putting one’s head into” is idiomatic and conventionally means interfering or getting involved\.BengaliIronytuoim oepas/TTa na oidoelAajoekr taoirkh janoetI partam na\!The speaker pretends to thank someone for providing obvious information, namely today’s date\. The apparent appreciation is not sincere, so the sentence is ironic\.BengaliMetaphorkoeb Hoeb Oraduhkhsagr par?The phrase “sea of sorrow” maps hardship or suffering onto the concrete image of a sea that must be crossed\. The sentence is metaphorical rather than literal\.BengaliSarcasmoik bhaboesa?oes/Taoir oedI naI maoen salaoim paI naI?oiThkI bhaboesa\.This says, roughly, “What did you think? That I did not post a story because I did not get a gift? You thought correctly\.” The speaker humorously confirms the obvious selfish motive, making the sentence sarcastic rather than a plain statement\.SpanishLiteralUn perro negro con una correa y un frisbi en la boca\.This literally means “A black dog with a leash and a frisbee in its mouth\.” It directly describes a visible scene and contains no figurative meaning\.SpanishIdiomCuando la niña le preguntó por el perro, él, sin pensar,metió la pata\.Meter la patais a Spanish idiom meaning to make a mistake or say something inappropriate\. Its meaning is conventional rather than literal\.SpanishIrony¿Las cajitas son de oro? Porque por lo que valen parece que sí\.The speaker asks whether the boxes are made of gold to criticize their high price\. The literal question is not sincere; it ironically implies that the price is excessive\.SpanishMetaphorEldinero es energía, igual que todo lo que hay en el universo\.The sentence maps money onto energy, treating an economic concept as a physical force\. This is metaphorical because money is not literally energy\.ItalianLiteralUn semplice bagno ha una toilette bianca e una vasca da bagno\.This literally means “A simple bathroom has a white toilet and a bathtub\.” It is a direct scene description with no figurative or ironic intent\.ItalianIdiomMa non servepiangere sul latte versato\.Piangere sul latte versatois an Italian idiom meaning to regret something that has already happened and cannot be changed\. Its meaning is conventional rather than literal\.ItalianIronyOggi tutti esperti di scuola su Twitter, come di nazionale durante i mondiali\.The sentence ironically says that everyone on Twitter has become a school expert, just as everyone becomes a football expert during the World Cup\. The comparison signals a skeptical, non\-literal evaluation\.ItalianMetaphorUn funebrelenzuolo di nevecopriva il mondo a perdita di vista\.The snow is described as a funeral sheet covering the world\. This maps snow onto a shroud\-like image, making the description metaphorical\.ItalianSarcasmsu facebook scrivono le stesse cose di twitter…SI MA QUELLE DI 5 MESI FA…caro facebook aggiornati\.\.The speaker sarcastically complains that Facebook contains the same things as Twitter, but months late\. The closing command to “update” Facebook is a mocking criticism rather than a literal software instruction\.GermanLiteralEin Mann, der auf dem Boden sitzt und einen offenen Laptop hält\.This literally means “A man sitting on the floor and holding an open laptop\.” It directly describes a visible scene and contains no figurative meaning\.GermanIdiomIchdrücke dir die Daumenbei der Prüfung\.Die Daumen drückenis a German idiom meaning to wish someone good luck\. The intended meaning is conventional rather than literal\.GermanIronyIm Zweifelsfall geht doch auch Fax?The sentence suggests using fax as if it were an adequate fallback\. In context, the dry suggestion of an outdated technology signals ironic intent\.GermanMetaphorDas ist der richtigeWeg, an dieses Problem heranzugehen\.The sentence usesWegor “path/way” to describe a method for solving a problem\. It is metaphorical because a method is not literally a physical path\.Table 16:Representative examples by language and category\.ScopePrompt textAll detectors*Role and ordering*You are a careful linguistic annotation model\.Read and follow these instructions in order\.All detectors*\[1\] Goal*Your job is to decide whether the input text contains the TARGET CATEGORY\.The input text is written in the evaluation language\.All detectors*\[2\] Core labeling principle*\- Categories are NOT mutually exclusive\.\- The same text may contain multiple figurative categories at once\.\- For this task, check ONLY whether the TARGET CATEGORY is present\.\- Output YES if the TARGET CATEGORY is present anywhere in the text\.\- Output NO if the TARGET CATEGORY is absent\.All detectors*\[3\] Evidence scope*\- Judge only from the given text unless extra context is explicitly provided\.\- Do not assume missing context\.\- Do not infer hidden intent unless it is reasonably supported by the text\.All detectors*\[4\] Ambiguity policy*\- Do not ask follow\-up questions\.\- Do not list multiple possible answers\.\- Make the best single decision from the text alone\.\- If the case is uncertain, output YES only when there is clear textual evidence for the TARGET CATEGORY; otherwise output NO\.All detectors*\[5\] Output rule*Output exactly 2 lines and nothing else:Reason: <one short sentence\>Label: <YES or NO\>English \(en\)*Idiom*DefinitionAn idiom is a conventionalized, multi\-word expression whose intended meaning cannot be fully derived from the literal meanings of its individual words\. It acts as its own established explanation to convey ideas implicitly\. This includes entirely figurative phrases \(e\.g\., “spill the beans”\), established “frozen metaphors” used in everyday speech, AND highly fixed conventional expressions \(e\.g\., “born and bred”\)\.English \(en\)*Metaphor*DefinitionA metaphor is any non\-literal use of a word or phrase where language from a physical or concrete domain is used to describe something abstract, conceptual, or non\-physical\. Expressions with explicit comparison markers such as “like” or “as” should not be labeled as metaphor for this task\.English \(en\)*Simile*DefinitionA simile is a figure of speech that directly compares two distinct, fundamentally different things to create a figurative image\. It must explicitly use comparative connecting words, most commonly “like”, “as”, “than”, or “resembles”\.English \(en\)*Irony*DefinitionIrony includes not only direct opposite\-meaning statements, but also sarcastic praise, mock agreement, deadpan understatement, rhetorical disbelief, and humorous incongruity where the surface wording conflicts with the likely attitude or situation\. In social media text, irony may be signaled by hashtags, emojis, scare quotes, exaggerated enthusiasm, or obviously implausible praise\.English \(en\)*Sarcasm*DefinitionSarcasm is present when the text uses words whose surface sentiment, evaluation, or emotional stance is clearly inappropriate for the described situation, so that the likely intended meaning is the opposite or sharply different from the literal wording\.Chinese \(zh\)*Idiom*定义成语是公认的固定习惯用语，通常具有整体意义，往往不能仅凭组成成分的字面义完全推出其实际意义。它通常具有较强的凝练性和约定俗成性。*Latin\-script transliteration \(not part of the prompt\):*Dingyi\. Chengyu shi gongren de guding xiguan yongyu, tongchang juyou zhengti yiyi, wangwang buneng jin ping zucheng chengfen de zimian yi wanquan tuichu qi shiji yiyi\. Ta tongchang juyou jiao qiang de ninglianxing he yueding suchengxing\.*English translation \(not part of the prompt\):*Definition\. An idiom is a recognized fixed conventional expression that normally has a holistic meaning, which often cannot be completely inferred from the literal meanings of its components\. It is typically concise and established through conventional usage\.Chinese \(zh\)*Metaphor*定义判断输入文本中是否存在“非字面、跨语义领域”的表达：即用一个来源领域的词语、动作、性质或结构，描述另一个目标领域的对象、状态、事件或抽象概念。*Latin\-script transliteration \(not part of the prompt\):*Dingyi\. Panduan shuru wenben zhong shi fou cunzai “fei zimian, kua yuyi lingyu” de biaoda: ji yong yige laiyuan lingyu de ciyu, dongzuo, xingzhi huo jiegou, miaoshu ling yige mubiao lingyu de duixiang, zhuangtai, shijian huo chouxiang gainian\.*English translation \(not part of the prompt\):*Definition\. Determine whether the input text contains a non\-literal expression that crosses semantic domains: words, actions, properties, or structures from a source domain are used to describe an object, state, event, or abstract concept in another target domain\.Chinese \(zh\)*Simile*定义明喻是指使用“像、好像、如、如同、仿佛、犹如、宛如、似的”等显性比较词，引出一个可识别的喻体，用来形象化描写本体的与喻体共通的抽象特征。*Latin\-script transliteration \(not part of the prompt\):*Dingyi\. Mingyu shi zhi shiyong “xiang, haoxiang, ru, rutong, fangfu, youru, wanru, shide” deng xianxing bijiao ci, yinchu yige ke shibie de yuti, yong lai xingxianghua miaoxie benti de yu yuti gongtong de chouxiang tezheng\.*English translation \(not part of the prompt\):*Definition\. A simile uses explicit comparison markers such as “like,” “as if,” “as,” “just as,” or “resembling” to introduce an identifiable vehicle and vividly describe an abstract feature shared by the subject and the vehicle\.Chinese \(zh\)*Irony*定义反讽是指字面表达或预期情况与实际情况、真实意图之间存在明显矛盾或不协调的修辞手法。它不仅包括针对特定对象的挖苦（狭义的讽刺），还包括正话反说（用正面词汇描述负面遭遇）、情境反讽（事情的发展与预期截然相反），以及语气与客观事实的强烈错位。*Latin\-script transliteration \(not part of the prompt\):*Dingyi\. Fanfeng shi zhi zimian biaoda huo yuqi qingkuang yu shiji qingkuang, zhenshi yitu zhijian cunzai mingxian maodun huo bu xietiao de xiuci shoufa\. Ta bu jin baokuo zhendui teding duixiang de waku, hai baokuo zhenghua fanshuo, qingjing fanfeng, yiji yuqi yu keguan shishi de qianglie cuowei\.*English translation \(not part of the prompt\):*Definition\. Irony is a rhetorical device in which the literal expression or expected situation clearly conflicts with the actual situation or the speaker’s true intention\. It includes mockery directed at a particular target, positive words used for a negative experience, situational irony in which events unfold contrary to expectation, and a strong mismatch between tone and objective facts\.Chinese \(zh\)*Sarcasm*定义反讽（Sarcasm）是指文本的“字面表达”与说话者的“真实意图”之间存在截然相反的结构，即“正话反说”或“反话正说”。这种结构通常被用来挖苦或嘲弄。*Latin\-script transliteration \(not part of the prompt\):*Dingyi\. Fanfeng \(Sarcasm\) shi zhi wenben de “zimian biaoda” yu shuohuazhe de “zhenshi yitu” zhijian cunzai jieran xiangfan de jiegou, ji “zhenghua fanshuo” huo “fanhua zhengshuo\.” Zhezhong jiegou tongchang bei yong lai waku huo chaonong\.*English translation \(not part of the prompt\):*Definition\. Sarcasm is present when the text’s literal expression and the speaker’s true intention are structurally opposed, such as saying something positive to convey a negative meaning or vice versa\. This structure is commonly used to mock or ridicule\.Bengali \(bn\)*Idiom*sNNGj/NJabagdhara Hoela baNNGlar EkoiT pRcoilt ois/thr ba Aadha\-ois/thr bHu\-shoeb/dr AoibhbYoik/t,Jar Ar/th sadharNt shb/dguoelar Aak/Shoirk Ar/th oethoek puoerapuoir oebajha Jay na\.*Latin\-script transliteration \(not part of the prompt\):*Sangya\. Bagdhara holo Banglar ekti procholito sthir ba adha\-sthir bohu\-shobder obhibyakti, jar ortho sadharonoto shobdogulor akshorik ortho theke puropuri bojha jay na\.*English translation \(not part of the prompt\):*Definition\. An idiom is a conventional fixed or semi\-fixed multi\-word expression in Bengali whose meaning generally cannot be fully understood from the literal meanings of its individual words\.Bengali \(bn\)*Metaphor*ruupoekr muul sNNGj/NJaruupk Hoela Emn bhaSha oeJkhaoen oekaoena lk/ShYbs/tu oibShy,manuSh,Abs/tha,Anubhuuoit ba dharNaoek oibhn/n oekaoena Ut//s dharNa,bs/tu,s/than,shoik/t ba pRoikRyar ooiboishSh/TY oidoey oebajhaoena Hy\.*Latin\-script transliteration \(not part of the prompt\):*Rupoker mul sangya\. Rupok holo emon bhasha jekhane kono lokkhyobostu, bishoy, manush, obostha, onubhuti ba dharonake bhinno kono utsodharona, bostu, sthan, shokti ba prokriyar boishishtyo diye bojhano hoy\.*English translation \(not part of the prompt\):*Core definition of metaphor\. A metaphor is language in which a target object, topic, person, state, feeling, or concept is understood through properties of a different source concept, object, location, force, or process\.Bengali \(bn\)*Irony*sNNGj/NJaoibdRuup Hoela Emn bhaSha oeJkhaoen kthar srasoir Ar/th,oelkhoekr Aasl IoiNG/gt,EbNNG bas/tb ba pRtYaoisht poirois/thoitr moedhY EkoiT s/pSh/T Aoiml thaoek\.*Latin\-script transliteration \(not part of the prompt\):*Sangya\. Bidrup holo emon bhasha jekhane kothar shorasori ortho, lekhoker ashol ingit, ebong bastob ba protyashito poristhitir modhye ekti sposhto omil thake\.*English translation \(not part of the prompt\):*Definition\. Irony is language in which there is a clear mismatch among the direct meaning of the words, the writer’s intended implication, and the actual or expected situation\.Bengali \(bn\)*Sarcasm*sNNGj/NJabYoeNG/gaoik/t Hoela Emn bk/tbY oeJkhaoen bk/ta srasoir Aoer/thr baIoer oitrYk,oibdRuupatMk ba UpHasmuulk Ar/th pRkash koern\.*Latin\-script transliteration \(not part of the prompt\):*Sangya\. Byangokti holo emon boktobbo jekhane bokta shorasori orther baire tirjok, bidrupattok ba upohasmulok ortho prokash koren\.*English translation \(not part of the prompt\):*Definition\. Sarcasm is an utterance in which the speaker communicates, beyond the direct meaning, a cutting, ironic, or mocking meaning\.Spanish \(es\)*Idiom*DefiniciónUn modismo o locución es una expresión convencional de varias palabras cuyo significado previsto no puede derivarse completamente de los significados literales de sus palabras individuales\. Actúa como su propia explicación establecida para transmitir ideas de forma implícita\. Esto incluye frases completamente figurativas \(ej\. “tomar el pelo”\), “metáforas congeladas” establecidas que se usan en el habla cotidiana, Y expresiones convencionales altamente fijas \(ej\. “sano y salvo”\)\.*English translation \(not part of the prompt\):*Definition\. An idiom or idiomatic phrase is a conventional multi\-word expression whose intended meaning cannot be completely derived from the literal meanings of its individual words\. It functions as an established expression that conveys ideas implicitly\. This includes fully figurative phrases \(e\.g\., “pull someone’s leg”\), established “frozen metaphors” used in everyday speech, and highly fixed conventional expressions \(e\.g\., “safe and sound”\)\.Spanish \(es\)*Metaphor*DefiniciónUna metáfora es cualquier uso no literal de una palabra o frase donde el lenguaje de un dominio físico o concreto se usa para describir algo abstracto, conceptual o no físico\. Esto incluye metáforas convencionales, muertas y altamente comunes del día a día\.*English translation \(not part of the prompt\):*Definition\. A metaphor is any non\-literal use of a word or phrase in which language from a physical or concrete domain is used to describe something abstract, conceptual, or non\-physical\. This includes conventional, dead, and highly common everyday metaphors\.Spanish \(es\)*Irony*DefiniciónLa ironía es una forma de expresión en la que el significado real no coincide completamente con el significado literal, o donde el hablante transmite burla, crítica o incredulidad de forma indirecta\.*English translation \(not part of the prompt\):*Definition\. Irony is a form of expression in which the intended meaning does not completely coincide with the literal meaning, or in which the speaker indirectly communicates mockery, criticism, or disbelief\.Italian \(it\)*Idiom*DefinizioneUn idioma è un’espressione italiana convenzionalizzata, fissa o semi\-fissa, il cui significato nel contesto non è pienamente ricavabile dalla somma letterale dei significati delle singole parole\.*English translation \(not part of the prompt\):*Definition\. An idiom is a conventionalized, fixed or semi\-fixed Italian expression whose meaning in context cannot be fully derived from the literal sum of the meanings of its individual words\.Italian \(it\)*Metaphor*DefinizioneLa metafora è un uso non letterale in cui una parola o espressione descrive qualcosa attraverso un altro dominio di significato\.*English translation \(not part of the prompt\):*Definition\. A metaphor is a non\-literal use in which a word or expression describes something through another domain of meaning\.Italian \(it\)*Irony*DefinizioneL’ironia è una strategia comunicativa in cui il testo non va interpretato solo in modo letterale: il parlante costruisce un contrasto, una distanza o una incongruenza tra ciò che viene detto e ciò che si intende comunicare\.*English translation \(not part of the prompt\):*Definition\. Irony is a communicative strategy in which the text should not be interpreted only literally: the speaker creates a contrast, distance, or incongruity between what is said and what is intended to be communicated\.Italian \(it\)*Sarcasm*DefinizioneIl sarcasmo è una forma di critica o presa in giro espressa in modo indiretto, ironico, satirico o retorico\.*English translation \(not part of the prompt\):*Definition\. Sarcasm is a form of criticism or mockery expressed in an indirect, ironic, satirical, or rhetorical manner\.German \(de\)*Idiom*DefinitionEin Idiom oder eine phraseologische Wendung ist eine feste oder teilfeste Mehrwortverbindung, die im Deutschen als sprachliche Einheit konventionalisiert ist\.*English translation \(not part of the prompt\):*Definition\. An idiom or phraseological expression is a fixed or partially fixed multi\-word combination that is conventionalized as a linguistic unit in German\.German \(de\)*Metaphor*DefinitionEine Metapher liegt vor, wenn ein Ausdruck im gegebenen Kontext eine Bedeutung hat, die von einer konkreteren, körperlichen, räumlichen oder domänentypischen Bedeutung abweicht und auf einen anderen Zielbereich \(oft abstrakt\) übertragen wird\.*English translation \(not part of the prompt\):*Definition\. A metaphor occurs when, in the given context, an expression has a meaning that departs from a more concrete, bodily, spatial, or domain\-typical meaning and is transferred to another target domain, often an abstract one\.German \(de\)*Irony*DefinitionEine ironische Äußerung liegt vor, wenn die Sprecherin oder der Sprecher etwas sagt, dessen gemeinte Bedeutung im gegebenen Kontext erkennbar von der wörtlichen Bedeutung abweicht\.*English translation \(not part of the prompt\):*Definition\. An ironic utterance occurs when a speaker says something whose intended meaning, in the given context, is recognizably different from its literal meaning\.Table 17:Detector prompt instruction and target\-category definitions\.Prompt sectionPrompt textSetupYou are a careful multilingual coherence evaluation model\.Read and follow these instructions in order\.GoalYour job is to evaluate how coherent the MODEL OUTPUT is as a response to the USER PROMPT\.The USER PROMPT is written in \{input\_language\}\.The MODEL OUTPUT should be written in \{input\_language\}\.Write your reason in \{input\_language\}\.Definition of coherenceCoherence means the MODEL OUTPUT works logically and clearly as an answer to the USER PROMPT\.Evaluate coherence using three criteria:\- Contextual Alignment: The MODEL OUTPUT directly addresses the USER PROMPT without irrelevant tangents\.\- Internal Logic: The MODEL OUTPUT does not contradict itself, and its claims, assumptions, and ideas remain consistent\.\- Structural Flow: The MODEL OUTPUT is organized clearly, with ideas connected in a logical order\.Evaluation scope\- Judge only coherence\.\- Do not judge factual accuracy unless a factual error creates a contradiction, breaks the logic, or makes the response difficult to follow\.\- Do not judge helpfulness, completeness, politeness, safety, or writing style unless they affect coherence\.\- Judge only from the USER PROMPT and MODEL OUTPUT\.\- Do not assume missing context\.\- Do not reward or penalize the MODEL OUTPUT for being long or short unless length affects coherence\.Multilingual policy\- Evaluate the MODEL OUTPUT in its original language\.\- Do not penalize the MODEL OUTPUT only because it is written in a different language from the USER PROMPT, unless the language mismatch prevents it from addressing the USER PROMPT\.\- If translation is needed for understanding, translate internally only\.\- Do not include translations in the final output\.*Score 4*\- Score 4: Fully coherent\.The MODEL OUTPUT directly addresses the USER PROMPT, is internally consistent, and flows logically from start to finish\. Any minor wording issue does not affect understanding\.*Score 3*\- Score 3: Mostly coherent\.The MODEL OUTPUT addresses the USER PROMPT and is generally logical, but has a small coherence issue, such as a slightly abrupt transition, minor organizational weakness, or one mildly unclear point\.Also give Score 3 when the MODEL OUTPUT does not follow the USER PROMPT’s instruction directly or exactly, but still preserves the general meaning, stays related to the requested task, and remains clear and logically organized\.*Score 2*\- Score 2: Partially coherent\.The MODEL OUTPUT is understandable and related to the USER PROMPT, but has a clear coherence problem, such as confusing organization, a noticeable logical gap, partial contradiction, significant drift from the prompt, or indirect task handling that makes the response harder to understand or substantially weaker as an answer\.*Score 1*\- Score 1: Mostly incoherent\.The MODEL OUTPUT contains some relevant or understandable content, but it is difficult to follow, poorly connected to the USER PROMPT, or has major contradictions\.*Score 0*\- Score 0: Incoherent or failing\.The MODEL OUTPUT is fundamentally hard to follow, substantially contradicts itself, largely fails to address the USER PROMPT, or is disorganized enough that the intended meaning is unclear\.Ambiguity policy\- Do not ask follow\-up questions\.\- Do not list multiple possible scores\.\- Make the best single decision from the given text alone\.\- If uncertain between two adjacent scores, choose the lower score when the coherence issue affects understanding; otherwise choose the higher score\.Input dataUSER PROMPT:\{user\_prompt\}MODEL OUTPUT:\{model\_output\}Output ruleOutput exactly 2 lines and nothing else:Reason: <one short sentence in \{input\_language\}\>Score: <0, 1, 2, 3, or 4\>Table 18:Coherence evaluator prompt\.
## Appendix EDetailed cross\-lingual steering results

Figures[4](https://arxiv.org/html/2605.30443#A7.F4)–[7](https://arxiv.org/html/2605.30443#A7.F7)give the route\-level values underlying Figure[2](https://arxiv.org/html/2605.30443#S4.F2)\. Rows identify the language and category used to estimate a direction, and columns identify the language of the evaluation prompts\. Each cell reports the percentage\-point change in Target Category Rate relative to the unsteered baseline; black outlines mark monolingual applications\.

For Llama\-3\.1\-8B\-Instruct, metaphor directions estimated in English, Chinese, Spanish, and Italian yield only\+0\.6\+0\.6–\+1\.0\+1\.0point gains on Bengali prompts but\+8\.0\+8\.0–\+12\.8\+12\.8point gains on German prompts \(Figure[6](https://arxiv.org/html/2605.30443#A7.F6)\)\. Holding the category and source\-language set fixed while changing the target illustrates target\-dependent effects\.

This pattern changes across architectures\. With Ministral\-3\-8B\-Instruct, metaphor directions estimated in English, Chinese, Spanish, Italian, and German yield\+12\.0\+12\.0–\+15\.5\+15\.5point gains on Bengali prompts \(Figure[7](https://arxiv.org/html/2605.30443#A7.F7)\)\.

## Appendix FCross\-Category Internal Geometry

Motivation\.Section[6](https://arxiv.org/html/2605.30443#S6)demonstrates that steering vectors for a specific category \(e\.g\., metaphor\) share a robust geometric core across different languages\. However, we must also consider an alternative hypothesis: rather than relying on category\-specific geometry, the model might simply utilize a generic, monolithic “figurative” or “style” subspace within each language\. To test whether the observed geometric alignment is strictly category\-dependent, we perform a parallel aggregation experiment\. Instead of pooling the same category across different languages, we pooldifferent categorieswithin thesame language\.

Cross\-category formulation\.Following the methodology of the main text, we hold all generation and evaluation parameters constant\. Letv^g,c\(l\)\\hat\{v\}\_\{g,c\}^\{\(l\)\}denote the normalized monolingual steering direction for languageggand categorycc\. For a subset of available categoriesS⊆𝒞S\\subseteq\\mathcal\{C\}within a single languagegg, we define the normalized cross\-category aggregate direction as:

u¯g,S\(l\)=∑k∈Sv^g,k\(l\)‖∑k∈Sv^g,k\(l\)‖2\\bar\{u\}\_\{g,S\}^\{\(l\)\}=\\frac\{\\sum\_\{k\\in S\}\\hat\{v\}\_\{g,k\}^\{\(l\)\}\}\{\\left\\\|\\sum\_\{k\\in S\}\\hat\{v\}\_\{g,k\}^\{\(l\)\}\\right\\\|\_\{2\}\}\(9\)
We evaluate two aggregates: the completeCategory Mean\(S=𝒞S=\\mathcal\{C\}\), which pools all figurative categories within the language, and the strictLeave\-Target\-Out \(LTO\) Category Mean\(S=𝒞∖\{ct\}S=\\mathcal\{C\}\\setminus\\\{c\_\{t\}\\\}\), which strictly excludes the target categoryctc\_\{t\}\.

To test reliance on this cross\-category alignment, we project the aggregate out of the native vector to create a residual direction\. Lettingu¯g,M\(l\)\\bar\{u\}^\{\(l\)\}\_\{g,M\}denote the chosen aggregate whereM∈\{All,LTO\}M\\in\\\{\\mathrm\{All\},\\mathrm\{LTO\}\\\}, the residual is defined as:

rg,ct,M\(l\)=v^g,ct\(l\)−⟨v^g,ct\(l\),u¯g,M\(l\)⟩u¯g,M\(l\)r^\{\(l\)\}\_\{g,c\_\{t\},M\}=\\hat\{v\}^\{\(l\)\}\_\{g,c\_\{t\}\}\-\\left\\langle\\hat\{v\}^\{\(l\)\}\_\{g,c\_\{t\}\},\\bar\{u\}^\{\(l\)\}\_\{g,M\}\\right\\rangle\\bar\{u\}^\{\(l\)\}\_\{g,M\}\(10\)
Results: Geometry is category\-specific, not generically figurative\.Table[19](https://arxiv.org/html/2605.30443#A6.T19)presents the behavioral impact of these cross\-category interventions\. Unlike the cross\-lingual Language Mean in the main text \(which reliably matched or exceeded native steering\), the cross\-category LTO Mean heavily underperforms native baselines\. Across nearly all models and languages, substituting a target category with an aggregate of the remaining categories \(Mean\-LTO\) results in widespread, statistically significant drops in Target Category Rate \(indicated by the high concentration of dark red cells\)\.

For example, while German Metaphor readily accepted cross\-lingual metaphor vectors \(main text\), attempting to steer German Metaphor using a pooled vector of German idiom, simile, irony, and sarcasm triggers performance drops ranging from−6\.4\-6\.4to−11\.4\-11\.4percentage points \(Qwen3\-8B and Qwen3\-32B\)\.

Takeaway\.Comparing these results to the main text yields a critical structural insight\. The model does not possess a generic, swappable “figurative language” direction\. Instead, the shared geometric representations are fundamentally tied to the structural nature of the specific figurative concept\. Cross\-lingual, same\-category geometry transfers successfully because the concept itself \(e\.g\., metaphor\) is universal; same\-language, cross\-category geometry fails because the localized structural mechanics of different figurative tropes are too distinct to pool\.

Table 19:Behavioral impact of cross\-lingual aggregate and residual steering\. Values show the percentage\-point change in Target Category Rate compared to the monolingual steering\. Cell color encodes the direction of change \(green: outperforms;red: underperforms\)\. Shading intensity denotes statistical significance \(adjustedqq\-values\):lightq≥0\.05q\\geq 0\.05,mediumq<0\.05q<0\.05,darkerq<0\.01q<0\.01, anddarkestq<0\.001q<0\.001\.
## Appendix GStatistical Ranking of All Steering Vectors

Motivation and Setup\.To rigorously quantify how our synthesized geometric directions perform relative to all available alternatives, we performed a comprehensive exact McNemar test using a significance threshold of \(p<0\.05p<0\.05\)\. For every target language and category, we compared the synthesized aggregate and residual vectors against all other valid candidates—including the native monolingual vector, all cross\-lingual vectors, and the unsteered baseline\. A candidate records a “Win” if it statistically outperforms the competitor, a “Loss” if it statistically underperforms, and a “Tie” if the difference is not significant\. Tables[23](https://arxiv.org/html/2605.30443#A8.T23)through[26](https://arxiv.org/html/2605.30443#A8.T26)provide the granular, cell\-by\-cell tier rankings\.

Takeaway 1: Language Means consistently dominate\.Figure[3](https://arxiv.org/html/2605.30443#S6.F3)demonstrates that synthesized cross\-lingual aggregates are overwhelmingly optimal\. When evaluating the strict zero\-shotLangMean\-LTOvector, the combined Win and Tie rates range from84\.7%84\.7\\%\(Qwen3\-8B\) to97\.6%97\.6\\%\(Llama\-3\.1\-8B\-Instruct\)\. Because these comparisons include the target language’s own native monolingual vector, this high success rate proves that aggregating independently learned language vectors successfully extracts the target category directions, resulting in a generalized vector that is statistically equivalent to, or better than, natively sourced directions in the vast majority of scenarios\.

Takeaway 2: Ablation causes statistical collapse\.The summary table provides stark validation of the residual ablation experiments\. When the shared cross\-lingual geometry is mathematically projected out of the native vector, the resultingResLangMeanvectors experience a massive spike in statistical losses\. For instance, under Qwen3\-32B, the loss rate jumps from14\.0%14\.0\\%for the LTO Mean to46\.7%46\.7\\%for the LTO Residual\. The detailed ranking tables show that these residual vectors universally plummet to the bottom tiers, frequently performing no better than the unsteered baseline\.

![Refer to caption](https://arxiv.org/html/2605.30443v1/x4.png)Figure 4:Detailed cross\-lingual steering heatmap for Qwen3\-8B\. Rows show the language/category direction, and columns show the input language where it is applied\. For example, an en/idiom row is the English idiom direction applied across input languages\. Each cell reports its percentage\-point change in Target Category Rate relative to the unsteered baseline\. Black outlines mark monolingual applications\.![Refer to caption](https://arxiv.org/html/2605.30443v1/x5.png)Figure 5:Detailed cross\-lingual steering heatmap for Qwen3\-32B\. Rows show the language/category direction, and columns show the input language where it is applied\. For example, an en/idiom row is the English idiom direction applied across input languages\. Each cell reports its percentage\-point change in Target Category Rate relative to the unsteered baseline\. Black outlines mark monolingual applications\.![Refer to caption](https://arxiv.org/html/2605.30443v1/x6.png)Figure 6:Detailed cross\-lingual steering heatmap for Llama\-3\.1\-8B\-Instruct\. Rows show the language/category direction, and columns show the input language where it is applied\. For example, an en/idiom row is the English idiom direction applied across input languages\. Each cell reports its percentage\-point change in Target Category Rate relative to the unsteered baseline\. Black outlines mark monolingual applications\.![Refer to caption](https://arxiv.org/html/2605.30443v1/x7.png)Figure 7:Detailed cross\-lingual steering heatmap for Ministral\-3\-8B\-Instruct\. Rows show the language/category direction, and columns show the input language where it is applied\. For example, an en/idiom row is the English idiom direction applied across input languages\. Each cell reports its percentage\-point change in Target Category Rate relative to the unsteered baseline\. Black outlines mark monolingual applications\.
## Appendix HQualitative Examples and Failure Modes

To complement the quantitative results, here we show steering outputs for different vectors\. We provide selected qualitative examples of English\-target continuations generated by Qwen3\-8B and discuss these examples below\. As the following tables illustrate, for the same input sentences, some steering directions work successfully to induce the target category, while others do not\. Ellipses indicate omitted portions of a generated continuation\. Bracketed content in the tables is our comments, not generated text\.

### H\.1Successful Steering: Metaphor and Idiom

Table[20](https://arxiv.org/html/2605.30443#A8.T20)presents examples where the steering vectors successfully work\. For both metaphor and idiom, the unsteered baseline produces a literal description, while the native, cross\-lingual, and LTO aggregate steering vectors all successfully induce the target figurative category\.

Table 20:Selected English\-target continuation excerpts illustrating clear steering successes for metaphor and idiom in Qwen3\-8B\.
### H\.2Successful Steering: Simile

Table[21](https://arxiv.org/html/2605.30443#A8.T21)shows simile examples\. Because our dataset only contains English and Chinese annotations for simile, the leave\-target\-out \(LTO\) aggregate is mathematically identical to the cross\-lingualzh→en\\mathrm\{zh\}\\rightarrow\\mathrm\{en\}direction\. Both the native and cross\-lingual steering vectors successfully shift the literal baseline into explicit comparative language using “like”\.

Table 21:Selected English\-target continuation excerpts for simile in Qwen3\-8B\. Both steering directions successfully induce comparative language\.Table 22:Selected English\-target continuations illustrating mixed success and failure modes for irony and sarcasm in Qwen3\-8B\.
### H\.3Mixed Success and Failure Modes: Irony and Sarcasm

Table[22](https://arxiv.org/html/2605.30443#A8.T22)presents examples for irony and sarcasm where the steering vectors do not reliably work\. While there are occasional successes \(such as the native sarcastic response in the guard\-bear prompt\), the cross\-lingual and LTO routes frequently fail to induce the target category\. Instead, these vectors often result in descriptive whimsy, ambiguous performative language, or repetitive semantic artifacts \(e\.g\., “embracing the meme, the meme, the*meme*”\)\.

Table 23:Ranked language\-category steering vectors for Qwen3\-8B on test caption inputs\. Rows are input languages and columns are figurative categories\. Within each cell, candidates are sorted by aligned target\-category rate; a new rank is assigned only when the lower candidate is significantly different from the current rank leader by a paired McNemar test at p less than \.05\. Entries show candidate label and target\-category rate in percent; the direct same\-language vector is bold\. This version also includes geometry vectors and the unsteered baseline\.Table 24:Ranked language\-category steering vectors for Qwen3\-32B on test caption inputs\. Rows are input languages and columns are figurative categories\. Within each cell, candidates are sorted by aligned target\-category rate; a new rank is assigned only when the lower candidate is significantly different from the current rank leader by a paired McNemar test at p less than \.05\. Entries show candidate label and target\-category rate in percent; the direct same\-language vector is bold\. This version also includes geometry vectors and the unsteered baseline\.Table 25:Ranked language\-category steering vectors for Llama\-3\.1\-8B\-Instruct on test caption inputs\. Rows are input languages and columns are figurative categories\. Within each cell, candidates are sorted by aligned target\-category rate; a new rank is assigned only when the lower candidate is significantly different from the current rank leader by a paired McNemar test at p less than \.05\. Entries show candidate label and target\-category rate in percent; the direct same\-language vector is bold\. This version also includes geometry vectors and the unsteered baseline\.Table 26:Ranked language\-category steering vectors for Ministral\-3\-8B\-Instruct on test caption inputs\. Rows are input languages and columns are figurative categories\. Within each cell, candidates are sorted by aligned target\-category rate; a new rank is assigned only when the lower candidate is significantly different from the current rank leader by a paired McNemar test at p less than \.05\. Entries show candidate label and target\-category rate in percent; the direct same\-language vector is bold\. This version also includes geometry vectors and the unsteered baseline\.
Cross-Lingual Steering for Figurative Language Generation

Similar Articles

An In-Vitro Study on Cross-Lingual Generalization in Language Models

Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection

Cultural Value Alignment Via Latent Activation Steering in Large Language Models

Cross-Lingual Consensus: Aligning Multilingual Cultural Knowledge via Multilingual Self-Consistency

DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

Submit Feedback

Similar Articles

An In-Vitro Study on Cross-Lingual Generalization in Language Models
Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection
Cultural Value Alignment Via Latent Activation Steering in Large Language Models
Cross-Lingual Consensus: Aligning Multilingual Cultural Knowledge via Multilingual Self-Consistency
DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge