ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

arXiv cs.CL Papers

Summary

ConlangCrafter is a multi-hop LLM pipeline that automates constructed language (conlang) creation by decomposing the process into modular stages including phonology, morphology, syntax, lexicon generation, and translation. The system leverages LLMs' metalinguistic reasoning with randomness injection and self-refinement to produce coherent and typologically diverse constructed languages.

arXiv:2508.06094v4 Announce Type: replace Abstract: Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages -- phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' metalinguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We construct a novel, scalable evaluation framework for this task, evaluating metrics measuring consistency and typological diversity. Automatic and manual evaluations demonstrate ConlangCrafter's ability to produce coherent and varied conlangs without human linguistic expertise.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:31 AM

# ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline
Source: https://arxiv.org/abs/2508.06094
View PDF (https://arxiv.org/pdf/2508.06094)

> Abstract: Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages -- phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' metalinguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We construct a novel, scalable evaluation framework for this task, evaluating metrics measuring consistency and typological diversity. Automatic and manual evaluations demonstrate ConlangCrafter's ability to produce coherent and varied conlangs without human linguistic expertise.

## Submission history

From: Morris Alper [view email (https://arxiv.org/show-email/b54f65b4/2508.06094)] **[[v1]](https://arxiv.org/abs/2508.06094v1)** Fri, 8 Aug 2025 07:36:48 UTC (1,348 KB) **[[v2]](https://arxiv.org/abs/2508.06094v2)** Thu, 9 Oct 2025 22:34:49 UTC (1,272 KB) **[[v3]](https://arxiv.org/abs/2508.06094v3)** Thu, 22 Jan 2026 13:54:42 UTC (1,433 KB) **[v4]** Fri, 17 Apr 2026 16:51:16 UTC (1,423 KB)

Similar Articles

Learning to reason with LLMs

OpenAI Blog

OpenAI publishes an article exploring reasoning techniques with LLMs through cipher-decoding examples, demonstrating step-by-step problem-solving approaches and pattern recognition in language models.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Hugging Face Daily Papers

This paper introduces AutoTTS, an environment-driven framework that automates the discovery of test-time scaling strategies for LLMs by formulating it as controller synthesis. It demonstrates improved accuracy-cost tradeoffs on mathematical reasoning benchmarks with minimal computational overhead.