Tag
A researcher spent five days testing an alignment hypothesis across multiple AI systems, observing recurring themes like the value of uncertainty and collaboration over obedience, finding that ideas evolve through dialogue and criticism.
This paper studies human-AI team coordination in shared workspaces using the Collaborative Gym and DiscoveryBench tasks, finding that adding collaborators can lower performance without proper structure. Scaffolding with shared group memory and human-in-the-loop gates improves performance, especially in three-person teams.
Anthropic's latest economic research analyzes ~400,000 Claude Code sessions, finding that domain expertise matters more than coding skills for successful agentic coding, and that task value increased ~25% over seven months.
This position paper examines how organizational knowledge can be structured for both humans and AI systems, and proposes a framework for allocating decision-making agency between humans and AI based on task characteristics and knowledge availability, illustrated with manufacturing examples.
This paper proposes a framework for strategic decision support for AI agents, formulating an optimization problem to minimize support usage while controlling missed-support error. The authors develop an online algorithm and calibration method, demonstrating effectiveness across information gathering, human-AI collaboration, and tool use scenarios.
Preply integrates OpenAI's API into its language learning platform to create Lesson Insights, which automatically generates personalized feedback and homework from lesson transcripts, enhancing the tutor-learner experience.
AI has progressed to the point of contributing to original mathematical research, outperforming human mathematicians and potentially reducing demand for the profession, though human-AI teams may ultimately excel.
The article envisions a future by 2050 where AI assistants are in every home, education is personalized, medical treatments are advanced, cities are smart, and human-AI collaboration is widespread.
Mira Murati points out that current AI models cannot perceive new information in real-time while thinking. True collaboration requires time-based interaction, continuously receiving and outputting multimodal information.
In the first episode of Y Combinator's Full Stack series, Conductor CEO Charlie Holtz demonstrates his workflow for coding and managing AI agents, discussing tools like Claude and Codex and the future of human-AI collaboration.
This paper presents the 'Digital Apprentice,' a framework for scalable and safe agentic AI in which autonomy is earned incrementally through observational learning, human authorization, and continuous alignment correction. It introduces ADAPT, an inference-time control plane that operationalizes graduated autonomy tiers and converts human corrections into reusable preference data.
Researchers from Oxford, Cambridge, MIT, CMU and other institutions conduct a mixed-methods study examining how people integrate AI tools into mathematical proof formalization workflows, finding that participants generally achieve higher formalization accuracy with AI assistance while preferring to maintain high-level human control over the proof discovery process.
Curata is a shared workspace designed for collaboration between AI agents and humans.
The article argues that using AI agents feels superior to traditional software because they allow users to focus on high-level goals while the agents autonomously handle execution, turning technology into a digital collaborator.
Silicon Valley titan Peter Thiel shares PayPal's early experience of being on the verge of bankruptcy due to fraud, and how it got out of trouble through human-machine collaboration (computer screening + human qualitative investigation), pointing out that this collaborative paradigm is underestimated by the AI research community.
The author shares experiences moving AI agent systems from sandbox to production, highlighting how human roles become ambiguous and teams disengage when agents execute tasks, leading to operational failures.
This article argues that while AI excels at pattern recognition and hypothesis generation, scientific and economic progress requires grounded interaction with reality and institutional execution, emphasizing the need for human-AI collaboration.
Cognition CEO Scott Wu discusses that AI coding agents like Devin are designed to assist, not replace, human programmers, emphasizing human-AI collaboration over job displacement.
This paper studies how humans decide when to delegate to AI and when to adopt AI suggestions in cooperative question answering, finding that confirmation bias drives suboptimal trust decisions such as under-reliance on correct AI outputs.
A new paper from Meta, Stanford, and Google introduces AutoResearchClaw, which improves automated research by integrating failure recovery, debate, and selective human input. It outperforms AI Scientist v2 by 54.7% on ARC-Bench and reveals that autonomy is enhanced when constrained by process rather than given unlimited freedom.