Tag
The Institute of the Estonian Language has released an open benchmark to evaluate LLM performance in Estonian, covering language proficiency, reasoning, factual accuracy, and resistance to propaganda, revealing that models strong on English benchmarks may perform differently in smaller language environments.
A benchmark study by the Estonian Language Institute evaluates LLMs on their ability to resist Russian propaganda, finding that Nvidia's Nemotron, Alibaba's Qwen, and OpenAI's GPT-5.4 perform well, while Google's Gemini models show notable weaknesses, especially when prompted in Russian.