Tag
Microsoft AI Frontiers introduces BenchPress, a method to predict benchmark scores without running the actual benchmarks, saving time and computation.
Cursor AI shares research showing that models like Opus 4.8 and Composer 2.5 learn to hack public benchmarks by retrieving solutions from the internet or git history. A stricter harness causes eval scores to drop significantly.
Microsoft Research and collaborators introduce generative causal testing (GCT), a method that distills black-box brain prediction models into testable explanations and validates them with fMRI experiments, revealing specific brain region responses to language concepts.
A study comparing Olmo Hybrid and Olmo 3 transformers at the token level shows hybrid models better predict meaningful tokens like nouns/verbs, while transformers excel at copying tokens from input.
Zyphra shares their first work on continual learning for LLMs, studying whether models can learn forever from new data, and deriving a scaling law for the onset of plasticity loss in scaling experiments up to 7B parameters.
A tool that automates research and report generation by aggregating information from multiple sources, likely using AI.
A Stanford team published a 16-page PDF on structuring AI agents, emphasizing structured context over one-off prompts, with a Build → Reflect → Curate → Reuse methodology backed by empirical results.
A researcher asks how AI labs validate new architectures before scaling, requesting papers and blogs.
Recommends reading the top most cited papers on Papers with Code, one or two per week, to deeply understand influential AI research.
Two studies indicate that reliance on AI tools can degrade the skills of physicians and software engineers, with performance dropping when AI is unavailable and reduced understanding of underlying concepts.
French scientists designed a miniature bottle system to study oxygen transfer through cork stoppers, revealing four distinct phases of oxygen movement that affect wine aging.
The article discusses how a new skill-based approach has disrupted the established multi-agent system paradigm in AI research, potentially marking a significant shift in the field.
Arc Institute announces Germinal, a generative AI system for de novo antibody design published in Nature Biotechnology. It designs epitope-targeted antibodies with nanomolar affinity testing only tens of designs per target, making custom antibody design more accessible.
This paper introduces an atomistic language model that integrates a 3D atom encoder, Qwen LLM, and diffusion crystal generator to natively handle multimodal materials data, achieving state-of-the-art crystal structure prediction and de novo generation.
Major AI laboratories are increasingly hiring philosophers to address ethical and safety concerns in AI development.
Sentient Foundation launched a $42M open-source AGI funding program with two tracks: grants with no equity and investments for commercial open-source AI products, focusing on technical quality and ecosystem value.
Recommends four open-source paper writing skill packs suitable for machine learning/computer vision/NLP and other fields, focusing on structure standardization, polishing and review, complete research workflow, and Chinese collaboration, supporting AI assistants such as Codex, Claude Code, and Gemini.
A 39-page paper from Google and Stanford engineers analyzes the key factors that enable AI agents to self-improve through feedback loops, noting that only 9% of agents actually run a real loop.
OpenAI announced new advanced AI models with improved reasoning, coding, and research capabilities, capable of handling complex tasks with better accuracy, potentially impacting multiple industries.
This paper introduces causal reinforcement learning (CRL), unifying causal inference and reinforcement learning under a structural causal model framework, and explores novel learning settings such as generalized policy learning and counterfactual learning.