No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages
Summary
This paper tackles code generation for no-resource programming languages by building benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced cost.
View Cached Full Text
Cached at: 06/20/26, 02:27 PM
Paper page - No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages
Source: https://huggingface.co/papers/2606.16827
Abstract
Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced computational cost.
Large Language Models(LLMs) have significantly advanced the automation of software engineering tasks. One prominent example iscode generation, where an LLM produces code in a specified programming language based on a natural language description. Most research in this area has focused on high-resource languages, such as Python or Java, which benefit from abundant training data. A smaller body of work has explored low-resource languages, which are underrepresented in training corpora. In contrast,no-resource languagesfor which LLMs have seen virtually no training data remain largely unstudied. These languages often emerge in industry, where organizations develop proprietary or domain-specific languages unsupported by commercial tools like GitHub Copilot. This results in the need for companies to deploy their own in-house code recommenders. To investigate possible solutions in this context, we build and release threecode generationbenchmarks forno-resource languages, based on two recently proposed programming languages for which very little training data is available. Using these benchmarks, we experiment several solutions to teach LLMs aboutno-resource languages, includingprompt-based techniquesas well aspre-trainingandfine-tuningexploiting the little data available. While furtherpre-traininggives the largest performance gains forno-resource languages, applying it directly toinstruction-tuned modelsharms their ability to follow instructions. To address this, we start from a base model, furtherpre-trainingit on the target language, and then injectinstruction-following capabilitiesviaweight diff transferfrom an instruction model. Such an approach significantly improvescode generationcapabilities in no-resource settings, allowing companies to cheaply deploy a specialized instruct model without dealing with the computational cost of instructionfine-tuning.
View arXiv pageView PDFGitHub1Add to collection
Get this paper in your agent:
hf papers read 2606\.16827
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.16827 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.16827 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.16827 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Testing Local LLMs in Practice: Code Generation, Quality vs. Speed
The author built a benchmark harness to evaluate local LLMs for autonomous Go code generation, focusing on log parser generation for SIEM pipelines, and published results comparing quality vs. speed.
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages
This tutorial paper provides an overview of building multilingual and multimodal LLMs for low-resource languages, covering data creation, model alignment, fine-tuning, and evaluation, with a focus on practical recipes and hands-on resources.
Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning
Proposes Chunk-Level Guided Generation, a training-free method using off-the-shelf LLMs as process scorers to select fixed-length candidate chunks during small model generation, significantly improving mathematical reasoning accuracy over majority voting and PRM guided search.
SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks
SkillLearnBench introduces the first benchmark for evaluating continual skill learning in LLM agents across 20 real-world tasks, revealing that no method dominates and scaling LLMs does not guarantee better skills.
@polynoamial: https://x.com/polynoamial/status/2064210146558136827
This article argues that LLM benchmark performance is increasingly a function of test-time compute, and that current evaluation methods fail to capture capability improvements when controlling for inference budget. It advocates for plotting performance vs. tokens, cost, or time, and discusses implications for safety evaluations.