ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

arXiv cs.CL 04/20/26, 04:00 AM Papers

Summary

ConlangCrafter is a multi-hop LLM pipeline that automates constructed language (conlang) creation by decomposing the process into modular stages including phonology, morphology, syntax, lexicon generation, and translation. The system leverages LLMs' metalinguistic reasoning with randomness injection and self-refinement to produce coherent and typologically diverse constructed languages.

arXiv:2508.06094v4 Announce Type: replace Abstract: Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages -- phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' metalinguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We construct a novel, scalable evaluation framework for this task, evaluating metrics measuring consistency and typological diversity. Automatic and manual evaluations demonstrate ConlangCrafter's ability to produce coherent and varied conlangs without human linguistic expertise.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 08:31 AM

# ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline
Source: https://arxiv.org/abs/2508.06094
View PDF (https://arxiv.org/pdf/2508.06094)

> Abstract: Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages -- phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' metalinguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We construct a novel, scalable evaluation framework for this task, evaluating metrics measuring consistency and typological diversity. Automatic and manual evaluations demonstrate ConlangCrafter's ability to produce coherent and varied conlangs without human linguistic expertise.

## Submission history

From: Morris Alper [view email (https://arxiv.org/show-email/b54f65b4/2508.06094)] **[[v1]](https://arxiv.org/abs/2508.06094v1)** Fri, 8 Aug 2025 07:36:48 UTC (1,348 KB) **[[v2]](https://arxiv.org/abs/2508.06094v2)** Thu, 9 Oct 2025 22:34:49 UTC (1,272 KB) **[[v3]](https://arxiv.org/abs/2508.06094v3)** Thu, 22 Jan 2026 13:54:42 UTC (1,433 KB) **[v4]** Fri, 17 Apr 2026 16:51:16 UTC (1,423 KB)

ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

Similar Articles

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Learning to reason with LLMs

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

@tom_doerr: Fine-tunes LLMs with a no-code GUI https://github.com/h2oai/h2o-llmstudio…

@0xSero: Here's everything you need to know about inference and hosting LLMs. Have you ever seen: - vllm - sglang - llama.cpp - …

Submit Feedback

Similar Articles

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

@tom_doerr: Fine-tunes LLMs with a no-code GUI https://github.com/h2oai/h2o-llmstudio…

@0xSero: Here's everything you need to know about inference and hosting LLMs. Have you ever seen: - vllm - sglang - llama.cpp - …