Tag
Presents DiscoExplorer, an open source web interface for searching and visualizing discourse relation datasets across 16 languages, making DISRPT shared task data publicly accessible.
This paper presents a method for comparing concordances of local grammars to optimize Named Entity Recognition for person names in Portuguese, achieving improved F-measure scores on the HAREM dataset.
This article profiles MIT senior Olivia Honeycutt, highlighting her interdisciplinary research at the intersection of linguistics, computation, and cognition, with a focus on comparing human language processing with large language models.
Researchers use four-state Markov chains to model vowel/consonant patterns in Pushkin’s Evgenij Onegin and its Italian translation, revealing structural asymmetries and narrative-linked phonological cues.
This paper introduces STELA, a linguistics-aware watermarking framework for LLMs that leverages syntactic predictability via POS n-grams to balance text quality and detection robustness. The method enables publicly verifiable watermark detection without requiring access to model logits, demonstrating superior performance across typologically diverse languages (English, Chinese, Korean).