Choosing features for classifying multiword expressions
Summary
This paper discusses methods for selecting features to improve the classification of multiword expressions.
View Cached Full Text
Cached at: 05/13/26, 06:18 AM
# Choosing features for classifying multiword expressions Source: [https://arxiv.org/abs/2605.11779](https://arxiv.org/abs/2605.11779) Bibliographic Tools ## Bibliographic and Citation Tools Bibliographic Explorer Toggle Code, Data, Media ## Code, Data and Media Associated with this Article Demos ## Demos Related Papers ## Recommenders and Search Tools About arXivLabs ## arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website\. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy\. arXiv is committed to these values and only works with partners that adhere to them\. Have an idea for a project that will add value for arXiv's community?[**Learn more about arXivLabs**](https://info.arxiv.org/labs/index.html)\.
Similar Articles
Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches
This systematic review of 139 studies proposes a unified framework and meta-analysis for document classification via multimodal and multiview information fusion, finding that fusion improves accuracy (mean gain of +5.28 percentage points) but highlights reproducibility challenges.
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
This paper demonstrates that sparse autoencoders can extract interpretable features from Claude 3 Sonnet, a production-scale language model, addressing scalability concerns for dictionary learning. The features are multilingual, multimodal, and include safety-relevant concepts like deception and sycophancy, with causal influence on model outputs.
Improving Selective Classification with Pairwise Queries for Binary Classification
This paper proposes using pairwise queries to improve selective classification for binary classification, particularly where confidence estimates are inconsistent, as in LLM in-context learning. Theoretical conditions and experiments on synthetic and real datasets show that pairwise query-based algorithms achieve better accuracy-cost tradeoffs than raw confidence estimates.
A Data-Driven Approach to Idiomaticity Based on Experts' Criteria in Theoretical Linguistics
This paper presents a data-driven analysis of multi-word expressions (MWEs) based on 16 theoretical criteria, annotated by linguistics experts, finding that no expressions are absolutely idiomatic and that lexical criteria are most influential.
EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions
This paper introduces the EDU-CIRCUIT-HW dataset for evaluating multimodal large language models on real-world university-level STEM handwritten solutions, revealing significant recognition limitations and proposing a hybrid approach that combines automated recognition with minimal human oversight to enhance grading robustness.