linguistics

#linguistics

The Grammar Does the Work: Functional vs. Lexical Dependency Length Minimization Across Universal Dependencies

arXiv cs.CL ↗ · 18h ago Cached

This paper analyzes 122 languages to show that dependency length minimization operates differently for functional dependencies (short and invariant) versus lexical dependencies (longer and variable), suggesting that grammar provides local scaffolding for processing.

0 favorites 0 likes

#linguistics

Svarna: An Open Corpus Workbench for Modern Greek

arXiv cs.CL ↗ · yesterday Cached

Svarna is an open-source web-based corpus workbench for Modern Greek, integrating multiple databases with over 507 million words and providing various linguistic analysis tools, released under MIT license.

0 favorites 0 likes

#linguistics

When transformers learn "impossible" languages, what do they learn?

arXiv cs.CL ↗ · 2d ago Cached

This paper investigates how transformer language models learn 'impossible' languages with unnatural properties, finding that while grammatical sensitivity degrades gradually, generative production shows pronounced failures, suggesting a linking hypothesis for non-attestation.

0 favorites 0 likes

#linguistics

@sentient_agency: 10 FREE TOOLS BUILT BY UNIVERSITIES THAT BEAT MOST PAID SAAS Bookmark every single one. Universities quietly fund softw…

X AI KOLs Timeline ↗ · 6d ago Cached

A tweet highlights 10 free, open-source software tools developed by universities that outperform or rival expensive paid alternatives, covering reference management, text analysis, network visualization, GIS, statistics, speech analysis, biological networks, data cleaning, research archiving, and note-taking.

0 favorites 0 likes

#linguistics

Phonetic and semantic analyses of spoken corpora of Beijing and Taiwan Mandarin indicate that the neutral tone is a lexical tone

arXiv cs.CL ↗ · 2026-06-26 Cached

This paper presents a corpus-based study showing that the neutral tone in Mandarin Chinese is a lexical tone with its own tonal target, based on phonetic and semantic analyses of Beijing and Taiwan Mandarin spoken corpora using generalized additive models and contextualized embeddings.

0 favorites 0 likes

#linguistics

MorfFlex: Handling Rich Morphology

arXiv cs.CL ↗ · 2026-06-24 Cached

This paper presents MorfFlex, a morphological dictionary architecture for languages with rich inflection and derivation, exemplified by MorfFlex CZ for Czech, which contains over 100 million wordforms and supports annotation consistency and NLP tools.

0 favorites 0 likes

#linguistics

AI Engineer Claims to Have Cracked Linear A (6 minute read)

TLDR AI ↗ · 2026-06-22 Cached

Tom Di Mino, an AI engineer and amateur linguist, claims to have deciphered the ancient Minoan script Linear A, which has eluded experts for over a century. His solution maps Linear A to an extinct Semitic language and is currently under review by linguistics experts at Rutgers and Cambridge.

0 favorites 0 likes

#linguistics

Translating the Untranslatable: An Operationalizable Ontology for Untranslatability

arXiv cs.CL ↗ · 2026-06-17 Cached

This paper introduces a structured ontology for untranslatability in machine translation, along with a taxonomy of compensation strategies and a multilingual dataset. Human preference studies show translator quality depends on the strategy used, with a preference for explanatory translations.

0 favorites 0 likes

#linguistics

The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

arXiv cs.CL ↗ · 2026-06-15 Cached

This paper proposes using data from Linguistics Olympiads to create a new corpus for linguistics research, aiming to advance the field.

0 favorites 0 likes

#linguistics

The Transformer Pill

Reddit r/ArtificialInteligence ↗ · 2026-06-12

A reflection on the broad implications of transformer architectures beyond LLMs, including potential impacts on linguistics, genetics, and causal modeling, comparing their significance to the Haber-Bosch process.

0 favorites 0 likes

#linguistics

Large Language Models as Modal Models in Linguistics

arXiv cs.CL ↗ · 2026-06-10 Cached

This paper applies philosophy of science to argue that LLMs offer epistemic value as minimal models for how-possibly explanations in linguistics, but do not yet qualify as how-actually explanations of human language.

0 favorites 0 likes

#linguistics

Word Class Representations Spontaneously Emerge from Successor Representations Trained on Natural Language

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper applies successor representations from reinforcement learning to natural language, training a neural network to predict the expected distribution of future words. It shows that linguistic categories like parts of speech and lexical subclasses emerge spontaneously without explicit supervision.

0 favorites 0 likes

#linguistics

A Data-Driven Approach to Idiomaticity Based on Experts' Criteria in Theoretical Linguistics

arXiv cs.CL ↗ · 2026-05-20 Cached

This paper presents a data-driven analysis of multi-word expressions (MWEs) based on 16 theoretical criteria, annotated by linguistics experts, finding that no expressions are absolutely idiomatic and that lexical criteria are most influential.

0 favorites 0 likes

#linguistics

DiscoExplorer: An Open Interface for the Study of Multilingual Discourse Relations

arXiv cs.CL ↗ · 2026-05-18 Cached

Presents DiscoExplorer, an open source web interface for searching and visualizing discourse relation datasets across 16 languages, making DISRPT shared task data publicly accessible.

0 favorites 0 likes

#linguistics

Concordance Comparison as a Means of Assembling Local Grammars

arXiv cs.CL ↗ · 2026-05-13 Cached

This paper presents a method for comparing concordances of local grammars to optimize Named Entity Recognition for person names in Portuguese, achieving improved F-measure scores on the HAREM dataset.

0 favorites 0 likes

#linguistics

Improving understanding with language

MIT News — Artificial Intelligence ↗ · 2026-05-01 Cached

This article profiles MIT senior Olivia Honeycutt, highlighting her interdisciplinary research at the intersection of linguistics, computation, and cognition, with a focus on comparing human language processing with large language models.

0 favorites 0 likes

#linguistics

Markov reads Pushkin, again: A statistical journey into the poetic world of Evgenij Onegin

arXiv cs.CL ↗ · 2026-04-23 Cached

Researchers use four-state Markov chains to model vowel/consonant patterns in Pushkin’s Evgenij Onegin and its Italian translation, revealing structural asymmetries and narrative-linked phonological cues.

0 favorites 0 likes

#linguistics

A Linguistics-Aware LLM Watermarking via Syntactic Predictability

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces STELA, a linguistics-aware watermarking framework for LLMs that leverages syntactic predictability via POS n-grams to balance text quality and detection robustness. The method enables publicly verifiable watermark detection without requiring access to model logits, demonstrating superior performance across typologically diverse languages (English, Chinese, Korean).

0 favorites 0 likes

linguistics

Submit Feedback