Tag
This paper evaluates six frontier coding agents on esoteric programming languages and finds that stronger agents use metaprogramming—writing Python programs to generate and debug code in the unfamiliar target language. Forbidding this strategy causes large performance drops, while providing Python helper code improves weaker agents.