Tag
Introduces TableVista, a comprehensive benchmark for evaluating foundation models on multimodal table reasoning under visual and structural complexity, comprising 3,000 problems expanded into 30,000 multimodal samples. Evaluation of 29 models reveals performance degradation on complex layouts and vision-only settings.
This survey paper provides a comprehensive review of audio-visual intelligence within large foundation models, establishing a unified taxonomy, synthesizing core methodologies, and outlining key datasets, benchmarks, and open research challenges.
Discussion on how compute access is becoming the primary driver of AI progress, creating a divide between organizations that can train large models and those limited to fine-tuning existing foundation models.
The Swiss AI Initiative, launched in December 2023 with over 10m GPU hours and 20m CHF funding, is a major open science effort for developing AI foundation models involving 800+ researchers across Swiss institutions. Backed by the Alps supercomputer and collaborative support from ETH and EPFL, it aims to provide transparent models and datasets for Swiss stakeholders.
OpenProtein.AI, founded by MIT researchers Tristan Bepler and Tim Lu, has launched a no-code platform to democratize access to advanced AI models for protein design and engineering among biologists.
This paper introduces Shesha, a geometric stability metric that quantifies directional coherence of single-cell CRISPR perturbation responses using mean cosine similarity, revealing regulatory architecture and predicting cellular stress across 2,200+ perturbations in five CRISPR datasets.
RoboLab is a high-fidelity simulation benchmarking framework for evaluating task-generalist robotic policies, introducing the RoboLab-120 benchmark with 120 tasks across visual, procedural, and relational competency axes. It enables scalable, realistic task generation and systematic analysis of policy behavior under controlled perturbations to assess true generalization capabilities.
NVIDIA highlights breakthroughs in physical AI and robotics during National Robotics Week, announcing new technologies including NVIDIA Isaac GR00T open models for natural language instruction understanding, Cosmos world models for synthetic data generation, Newton 1.0 physics engine, and expanded simulation capabilities with Isaac Sim 6.0 and Isaac Lab 3.0 to accelerate robot development from training to real-world deployment.
NVIDIA CEO Jensen Huang argues that the future of AI requires both open and proprietary models working together as orchestrated systems. NVIDIA announced the Nemotron Coalition, a global collaboration advancing open frontier models, with a base model co-developed by Mistral AI and NVIDIA.
DeepMind announces Genie 3, a general-purpose world model capable of generating interactive environments from text prompts at 24fps in 720p with improved consistency and real-time interactivity compared to previous versions.
OpenAI and UC Berkeley's workshop on Confidence-Building Measures for Artificial Intelligence brought together stakeholders to develop strategies for mitigating geopolitical risks from foundation models, identifying six key CBMs including crisis hotlines, incident sharing, model transparency, content provenance, red teaming, and dataset sharing.
OpenAI introduced Video PreTraining (VPT), a semi-supervised method that trains neural networks to play Minecraft by learning from 70,000 hours of unlabeled human gameplay video combined with a small labeled dataset. The model learns complex sequential tasks using the native human interface (keyboard and mouse) and demonstrates capabilities like crafting diamond tools and pillar jumping, representing progress toward general computer-using agents.