Tag
A reflection arguing that in multi-model setups, the consensus output is less valuable than the disagreements, which reveal genuinely contested parts of a problem. The post questions whether consensus should be the goal and how to distinguish productive disagreement from noise.
This paper compares multiple machine learning and transformer models for sentiment classification on movie reviews, finding RoBERTa achieves 93.02% accuracy, and a soft voting ensemble improves performance.
This paper presents the winning system for SemEval-2026 Task 8's generation subtask, using a heterogeneous ensemble of seven LLMs with dual prompting strategies and a GPT-4o-mini judge to select the best response. The system achieved first place with a conditioned harmonic mean of 0.7827, outperforming all baselines and demonstrating the value of model diversity.