Tag
This paper introduces BehaviorBench, a comprehensive benchmark for evaluating foundation models on behavioral science tasks including behavior prediction, strategic decision-making, subject-trait inference, and behavioral knowledge application. It also presents Be.FM-1.5, a fine-tuned model that achieves strong distributional alignment, highlighting the gap between general-purpose and behaviorally adapted models.
PORTER is a language-grounded structured EHR foundation model that represents clinical events through text descriptions and numeric values, enabling vocabulary-independent transfer across institutions without retraining. On pediatric prediction tasks, PORTER matches fixed-vocabulary models and recovers 97.1% of AUROC when transferred to unseen event descriptions.
The NAIRR pilot program, powered by NVIDIA AI infrastructure, has supported over 700 research projects, including the development of the Walrus foundation model for fluid simulations and the MIST molecular foundation models for energy storage.
This article deeply analyzes the problem that AI's sample efficiency is far lower than that of humans, pointing out that frontier models require massive amounts of domain-specific data, while humans can learn from just a few examples. This data black hole is a core bottleneck in current AI development. Through multiple comparisons (annotation volume, robot manipulation, driving) and refuting common objections, the article demonstrates the severity of this gap and explores its impact on the goals of AI automation.
Poolside introduces Laguna, a foundation model for agentic coding and long-horizon work.
An AI researcher announces joining AmiLabs as Director of Research in Paris, working with Yann LeCun and a team focused on world modeling and foundation models.
This paper introduces regime-stratified evaluation for time series foundation models, revealing that aggregate metrics hide severe failures during traffic regime transitions, and proposes bimodal mixture augmentation to improve coverage while preserving overall accuracy.
Introduces DeFAb, a verifiable benchmark for defeasible abduction in foundation models, comprising over 372K instances and revealing that current frontier models perform poorly on this form of logical reasoning, with accuracy as low as 23.5% under robust evaluation.
This paper finds that egocentric human video, when processed with a filtering and labeling pipeline, can outperform teleoperated real-robot data for pretraining embodied foundation models, achieving lower validation loss and higher success rates on real-robot tasks.
This paper introduces DeepInsight, a unified evaluation infrastructure for Physical AI stacks that spans from foundation model decoding to whole-body control, preserving heterogeneity through three narrow abstractions to enable cross-layer diagnostics.
This paper systematically evaluates foundation model representations for multimodal cancer analysis, benchmarking unimodal and multimodal fusion strategies on real-world cohorts, and assessing trustworthiness via conformal prediction.
fm-proxy is a drop-in proxy that lets any app accepting an OpenAI API URL run macOS 27's local and Private Cloud Compute Foundation models, with no extra servers or keys.
This paper formalizes the 'Impedance Mismatch' between foundation models and knowledge graphs, and proposes a theoretical roadmap for neuro-symbolic fusion using structured residual streams, vector symbolic architectures, and orthogonal subspace editing.
This paper systematically surveys the core components of medical embodied AI, emphasizing the coordinated integration of perception, decision-making, and action in clinical environments, and reviews representative applications, datasets, and future research directions.
A paper presenting The AI Scientist, a system that automates the entire research lifecycle from idea generation to peer review, demonstrating AI's growing capacity for scientific contribution.
This paper investigates explicit encoding of ICD-10-CM hierarchy in EHR foundation models, using hierarchical token augmentation and graph-based code representations. Experiments on MIMIC-IV and eICU show improvements over flat code representations for in-domain and cross-dataset prediction tasks.
Apple has developed its own foundation models for AI, signaling its entry into the large language model space with proprietary technology.
This paper proposes ORCA, a method for black-box online adaptation of time series foundation models by learning the context of predictive errors. It demonstrates effectiveness across five TSFMs and eight datasets, addressing the challenge of adapting closed-source API-based models.
This paper introduces a Multi-Modal Agent framework for power distribution defect detection, evaluating foundation models on perception, reasoning, and tool usage capabilities, with a new domain-specific dataset and benchmark.
This tutorial presents a coherent framework unifying diverse world modeling approaches for physical AI, covering explicit and implicit world models and their role in prediction, reasoning, and planning.