ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline
Summary
ConlangCrafter is a multi-hop LLM pipeline that automates constructed language (conlang) creation by decomposing the process into modular stages including phonology, morphology, syntax, lexicon generation, and translation. The system leverages LLMs' metalinguistic reasoning with randomness injection and self-refinement to produce coherent and typologically diverse constructed languages.
View Cached Full Text
Cached at: 04/20/26, 08:31 AM
# ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline Source: https://arxiv.org/abs/2508.06094 View PDF (https://arxiv.org/pdf/2508.06094) > Abstract: Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages -- phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' metalinguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We construct a novel, scalable evaluation framework for this task, evaluating metrics measuring consistency and typological diversity. Automatic and manual evaluations demonstrate ConlangCrafter's ability to produce coherent and varied conlangs without human linguistic expertise. ## Submission history From: Morris Alper [view email (https://arxiv.org/show-email/b54f65b4/2508.06094)] **[[v1]](https://arxiv.org/abs/2508.06094v1)** Fri, 8 Aug 2025 07:36:48 UTC (1,348 KB) **[[v2]](https://arxiv.org/abs/2508.06094v2)** Thu, 9 Oct 2025 22:34:49 UTC (1,272 KB) **[[v3]](https://arxiv.org/abs/2508.06094v3)** Thu, 22 Jan 2026 13:54:42 UTC (1,433 KB) **[v4]** Fri, 17 Apr 2026 16:51:16 UTC (1,423 KB)
Similar Articles
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs
This paper proposes Multi-Stream LLMs, which transition from sequential message-based instruction tuning to parallel stream processing. This approach allows language models to simultaneously read, think, and generate across multiple concurrent data flows, addressing bottlenecks in autonomous agent applications.
Learning to reason with LLMs
OpenAI publishes an article exploring reasoning techniques with LLMs through cipher-decoding examples, demonstrating step-by-step problem-solving approaches and pattern recognition in language models.
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
This paper introduces AutoTTS, an environment-driven framework that automates the discovery of test-time scaling strategies for LLMs by formulating it as controller synthesis. It demonstrates improved accuracy-cost tradeoffs on mathematical reasoning benchmarks with minimal computational overhead.
@tom_doerr: Fine-tunes LLMs with a no-code GUI https://github.com/h2oai/h2o-llmstudio…
H2O LLM Studio is an open-source framework and no-code GUI that simplifies the fine-tuning of large language models, supporting techniques like LoRA, DPO, and integration with Hugging Face.
@0xSero: Here's everything you need to know about inference and hosting LLMs. Have you ever seen: - vllm - sglang - llama.cpp - …
An overview of popular open-source inference engines including vLLM, SGLang, llama.cpp, and ExLlamaV3 for hosting and running large language models.