Tag
The paper proposes a delegation-based aggregator called Propagational Proxy Voting (PPV) that uses letter entropy and reasoning geometry to improve over majority voting for multi-sample LLM inference, achieving gains on MMLU-Pro without requiring gold labels or auxiliary training.
This arXiv paper presents a protocol for evaluating ChatGPT's ability to generate and verify biomedical associations using a RAG-enabled, cross-model majority voting workflow to address hallucination and ontology limitations.
This paper analyzes inference-time optimization techniques for AIMO 3, finding that model capability dominates over prompt engineering and diverse sampling strategies. The study reveals that high-temperature sampling already decorrelates errors maximally, leaving no room for prompt-based improvements, and identifies a 6-point selection loss gap between individual model pass@20 and majority voting consensus.