computational-linguistics

#computational-linguistics

Svarna: An Open Corpus Workbench for Modern Greek

arXiv cs.CL ↗ · 3d ago Cached

Svarna is an open-source web-based corpus workbench for Modern Greek, integrating multiple databases with over 507 million words and providing various linguistic analysis tools, released under MIT license.

0 favorites 0 likes

#computational-linguistics

How Ethos and Pathos Appeals Resonate in Reader Interpretations of Social Media Messages

arXiv cs.CL ↗ · 3d ago Cached

This paper investigates how ethos and pathos appeals in social media messages resonate with silent readers, finding that rhetorical content leads to greater interpretive divergence and can predict audience attitudes toward the author.

0 favorites 0 likes

#computational-linguistics

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

arXiv cs.CL ↗ · 2026-06-24 Cached

This paper introduces UD_Czech-PDTC, a large and genre-diverse treebank for Czech in the Universal Dependencies framework, derived from the Prague Dependency Treebank-Consolidated. It describes the conversion process and differences between annotation schemes.

0 favorites 0 likes

#computational-linguistics

A P\={a}ninian Foundation for Indic Language Processing

arXiv cs.CL ↗ · 2026-06-24 Cached

This paper proposes a benchmark suite grounded in Pāṇinian grammar to unify Indic language processing across languages, aiming to improve accuracy, data efficiency, and transferability.

0 favorites 0 likes

#computational-linguistics

@omershapira: TIL Jurafsky & Martin, the textbook I used for Computational Linguistics in undergrad many years ago (when TAU didn't o…

X AI KOLs Following ↗ · 2026-06-21 Cached

The third edition of the Speech and Language Processing textbook by Jurafsky and Martin was released in January 2026, featuring a clear explanation of Transformers and various updates including new chapters on ASR, TTS, and DPO.

0 favorites 0 likes

#computational-linguistics

Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition

arXiv cs.CL ↗ · 2026-06-18 Cached

Dango is a 1.8B-parameter LLM trained strictly on Japanese (L1) then fine-tuned on English (L2) to study language transfer effects in second language acquisition. The model filters English contamination from the pretraining corpus and shows human-like L2 production patterns.

0 favorites 0 likes

#computational-linguistics

Agent-based models for the evolution of morphological alternation patterns

arXiv cs.CL ↗ · 2026-06-12 Cached

This paper presents multi-agent simulations of the emergence of morphological alternation patterns (like 'go/went') in language, using an AI Historical Linguist (LLM-driven) to evaluate plausibility of evolved morphologies against real languages.

0 favorites 0 likes

#computational-linguistics

CAF-Gen: A Multi-Agent System for Enriching Argumentation Structures

arXiv cs.CL ↗ · 2026-06-08 Cached

CAF-Gen is a multi-agent LLM-driven framework that enriches shallow argument structures into formal Carneades Argumentation Framework models using an iterative Creator-Reviewer pipeline, achieving improved structural alignment and quality.

0 favorites 0 likes

#computational-linguistics

GlossAssist -- A Tool to Simplify Corpus Creation and Study the Effect of NLP Models in Low-Resource Documentation Settings

arXiv cs.CL ↗ · 2026-06-04 Cached

GlossAssist is a tool for creating interlinear glossed text (IGT) corpora in low-resource language documentation settings, built around the CWoMP retrieval-based architecture with an active learning feedback loop that improves predictions as annotators make corrections without retraining the model.

0 favorites 0 likes

#computational-linguistics

French parsing enhanced with a word clustering method based on a syntactic lexicon

arXiv cs.CL ↗ · 2026-06-02 Cached

This article evaluates the integration of data from the French syntactic lexicon Lexicon-Grammar into a probabilistic parser, using word clustering methods on verbs to improve parsing accuracy for French.

0 favorites 0 likes

#computational-linguistics

A Modular Architecture for Typologically Controlled Lexicon Generation

arXiv cs.CL ↗ · 2026-05-29 Cached

This paper presents a modular framework for generating artificial lexicons that are pronounceable, typologically plausible, and semantically structured, using phoneme inventories from PHOIBLE and probabilistic grammars, outperforming deterministic baselines.

0 favorites 0 likes

#computational-linguistics

Scene Abstraction for Lexical Semantics: Structured Representations of Situated Meaning

arXiv cs.CL ↗ · 2026-05-22 Cached

This paper proposes Scene Abstraction, a framework for constructing structured representations of the interpretive scenes that words evoke in context, using few-shot prompting of large language models. The authors introduce COCA-Scenes, a dataset of 520 usage instances, and provide empirical evidence that scenes are reliably identifiable and align better with human interpretation than alternatives.

0 favorites 0 likes

#computational-linguistics

Pattern-and-root inflectional morphology: the Arabic broken plural

arXiv cs.CL ↗ · 2026-05-22 Cached

Presents a novel pattern-and-root model for describing Arabic noun inflection, focusing on broken plurals, with a taxonomy of 160 classes and an encoding scheme applied to 3,200 entries, aiming to improve computational language resources.

0 favorites 0 likes

#computational-linguistics

A Data-Driven Approach to Idiomaticity Based on Experts' Criteria in Theoretical Linguistics

arXiv cs.CL ↗ · 2026-05-20 Cached

This paper presents a data-driven analysis of multi-word expressions (MWEs) based on 16 theoretical criteria, annotated by linguistics experts, finding that no expressions are absolutely idiomatic and that lexical criteria are most influential.

0 favorites 0 likes

#computational-linguistics

IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis

arXiv cs.CL ↗ · 2026-05-20 Cached

The paper introduces IMLJD, a computational dataset designed for analyzing Indian matrimonial litigation, supporting natural language processing and legal analytics research.

0 favorites 0 likes

#computational-linguistics

A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research

arXiv cs.CL ↗ · 2026-05-19 Cached

This paper presents a computational approach using large language models and RoBERTa to identify manner and result verbs in sentence context, achieving up to 89.6% accuracy. It aims to provide a scalable measurement tool for developmental language research.

0 favorites 0 likes

#computational-linguistics

A Computational Operationalisation of Competing Maturational Theories of Syntactic Development via Statistical Grammar Induction

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper presents a computational framework to test competing maturational theories of syntactic development in children, specifically comparing bottom-up versus inward accounts using statistical grammar induction.

0 favorites 0 likes

#computational-linguistics

A Reproducible Multi-Architecture Baseline for Token-Level Chinese Metaphor Identification under the MIPVU Framework

arXiv cs.CL ↗ · 2026-05-11 Cached

This paper establishes a reproducible multi-architecture baseline for token-level Chinese metaphor identification using the MIPVU framework and the PSU Chinese Metaphor Corpus. It compares encoder models like RoBERTa and MelBERT against the Qwen3.5-9B generative model, releasing code and data to facilitate future research.

0 favorites 0 likes

#computational-linguistics

A Community-Based Approach for Stance Distribution and Argument Organization

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from the University of British Columbia propose an unsupervised graph-based system for organizing arguments from online debates by constructing interaction graphs and applying community detection to reveal diverse viewpoint distributions. The approach requires no training data and aims to help users navigate complex argumentative landscapes and combat filter bubbles.

0 favorites 0 likes

#computational-linguistics

Measuring the Semantic Structure and Evolution of Conspiracy Theories

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper measures the semantic structure and evolution of conspiracy theories using 169.9M Reddit comments from r/politics (2012-2022), introducing the concept of "semantic objects" bounded by semantic neighborhoods to track how conspiracy theory meanings change over time beyond simple keyword-based approaches.

0 favorites 0 likes

computational-linguistics

Submit Feedback