Choosing features for classifying multiword expressions
Summary
This paper discusses methods for selecting features to improve the classification of multiword expressions.
View Cached Full Text
Cached at: 05/13/26, 06:18 AM
# Choosing features for classifying multiword expressions Source: [https://arxiv.org/abs/2605.11779](https://arxiv.org/abs/2605.11779) Bibliographic Tools ## Bibliographic and Citation Tools Bibliographic Explorer Toggle Code, Data, Media ## Code, Data and Media Associated with this Article Demos ## Demos Related Papers ## Recommenders and Search Tools About arXivLabs ## arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website\. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy\. arXiv is committed to these values and only works with partners that adhere to them\. Have an idea for a project that will add value for arXiv's community?[**Learn more about arXivLabs**](https://info.arxiv.org/labs/index.html)\.
Similar Articles
EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions
This paper introduces the EDU-CIRCUIT-HW dataset for evaluating multimodal large language models on real-world university-level STEM handwritten solutions, revealing significant recognition limitations and proposing a hybrid approach that combines automated recognition with minimal human oversight to enhance grading robustness.
XPERT: Expert Knowledge Transfer for Effective Training of Language Models
The paper introduces XPERT, a framework that extracts and reuses expert knowledge from pre-trained Mixture-of-Experts (MoE) language models to improve training efficiency and performance in downstream models.
Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback
This paper introduces three parameter-efficient methods for multi-view proficiency estimation on the Ego-Exo4D dataset, shifting from discriminative classification to generative feedback. The proposed models achieve state-of-the-art accuracy with significantly fewer parameters and training epochs than video-transformer baselines.
Data Mixing for Large Language Models Pretraining: A Survey and Outlook
This paper presents a comprehensive survey of data mixing methods for LLM pretraining, formalizing the problem as bilevel optimization and introducing a taxonomy that distinguishes static (rule-based and learning-based) from dynamic (adaptive and externally guided) mixing approaches. The authors analyze trade-offs, identify cross-cutting challenges, and outline future research directions including finer-grained domain partitioning and pipeline-aware designs.
MEDSYN: Benchmarking Multi-Evidence Synthesis in Complex Clinical Cases for Multimodal Large Language Models
MEDSYN is a multilingual multimodal benchmark for evaluating MLLMs on complex clinical cases with up to 7 distinct visual evidence types per case. The study reveals that while frontier models match human experts on differential diagnosis generation, all MLLMs show significant gaps in final diagnosis selection due to poor synthesis of heterogeneous clinical evidence.