Choosing features for classifying multiword expressions

arXiv cs.CL Papers

Summary

This paper discusses methods for selecting features to improve the classification of multiword expressions.

arXiv:2605.11779v1 Announce Type: new Abstract: Multiword expressions (MWEs) are a heterogeneous set with a glaring need for classifications. Designing a satisfactory classification involves choosing features. In the case of MWEs, many features are a priori available. Not all features are equal in terms of how reliably MWEs can be assigned to classes. Accordingly, resulting classifications may be more or less fruitful for computational use. I outline an enhanced classification. In order to increase its suitability for many languages, I use previous works taking into account various languages.
Original Article
View Cached Full Text

Cached at: 05/13/26, 06:18 AM

# Choosing features for classifying multiword expressions
Source: [https://arxiv.org/abs/2605.11779](https://arxiv.org/abs/2605.11779)
Bibliographic Tools

## Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Code, Data, Media

## Code, Data and Media Associated with this Article

Demos

## Demos

Related Papers

## Recommenders and Search Tools

About arXivLabs

## arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website\.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy\. arXiv is committed to these values and only works with partners that adhere to them\.

Have an idea for a project that will add value for arXiv's community?[**Learn more about arXivLabs**](https://info.arxiv.org/labs/index.html)\.

Similar Articles

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

arXiv cs.AI

This paper demonstrates that sparse autoencoders can extract interpretable features from Claude 3 Sonnet, a production-scale language model, addressing scalability concerns for dictionary learning. The features are multilingual, multimodal, and include safety-relevant concepts like deception and sycophancy, with causal influence on model outputs.

Improving Selective Classification with Pairwise Queries for Binary Classification

arXiv cs.LG

This paper proposes using pairwise queries to improve selective classification for binary classification, particularly where confidence estimates are inconsistent, as in LLM in-context learning. Theoretical conditions and experiments on synthetic and real datasets show that pairwise query-based algorithms achieve better accuracy-cost tradeoffs than raw confidence estimates.