open-ended-tasks

#open-ended-tasks

More Is Not More: What Matters for Diversity in LLM Opinions?

arXiv cs.CL ↗ · 3d ago Cached

A factorial experiment reveals that persona detail does not monotonically increase LLM opinion diversity; interaction architectures explore non-overlapping opinion regions; low-cost interventions like temperature scaling have negligible effects.

0 favorites 0 likes

#open-ended-tasks

LLM-as-a-Coach: Experiential Learning for Non-Verifiable Tasks

Hugging Face Daily Papers ↗ · 2026-07-20 Cached

This paper introduces Experiential Learning (EL), a method that repurposes an LLM-as-a-Judge into an LLM-as-a-Coach to provide rich textual feedback instead of scalar rewards, improving performance and generalization on open-ended non-verifiable tasks.

0 favorites 0 likes

#open-ended-tasks

VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech

Hugging Face Daily Papers ↗ · 2026-07-03 Cached

VIBE is a framework that evaluates generative bias in Large Audio-Language Models using open-ended tasks with human-recorded speech, revealing systematic biases triggered by gender and accent cues.

0 favorites 0 likes

#open-ended-tasks

Prompt-Level Reward Specifications for Open-Ended Post-Training

arXiv cs.CL ↗ · 2026-05-29 Cached

This paper proposes a prompt-level reward specification framework that separates reward specification from computation, constructing reusable task-adaptive rubrics and executable constraint checkers offline to produce a hybrid reward for open-ended post-training without requiring human annotations or separate reward models.

0 favorites 0 likes

#open-ended-tasks

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

SCOPE is a self-play framework for open-ended tasks that co-evolves a Challenger and Solver policy, achieving up to +10.4 points on benchmarks without external supervision.

0 favorites 0 likes

#open-ended-tasks

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

arXiv cs.CL ↗ · 2026-05-25 Cached

ARES proposes a framework for automatically constructing rubric-based RL data from pretraining documents, generating question-answer pairs and weighted rubrics to enable instance-level reward supervision for open-ended LLM responses, outperforming existing methods on multi-dimensional open-ended tasks.

0 favorites 0 likes

#open-ended-tasks

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Hugging Face Daily Papers ↗ · 2026-05-19 Cached

This paper introduces POW3R, a policy-aware rubric reward framework for reinforcement learning with verifiable rewards (RLVR). It shows that static rubric aggregation misallocates learning signal, and POW3R achieves faster convergence and better performance across multiple settings.

0 favorites 0 likes

#open-ended-tasks

Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text

arXiv cs.CL ↗ · 2026-04-23 Cached

Cornell researchers propose POP, a self-play framework that lets an LLM generate its own rubrics and training pairs for open-ended tasks, boosting Qwen-2.5-7B on healthcare QA, creative writing and instruction following without human labels.

0 favorites 0 likes

open-ended-tasks

Submit Feedback