Tag
Introduces horizon-constrained Rashomon sets to characterize how model multiplicity evolves in chaotic systems. The framework proves exponential contraction of predictive equivalence and develops decision-aligned algorithms that improve decision quality by 18-34%.
This paper presents a nationwide EHR-based chronic rhinosinusitis prediction model using demographic-stratified models and a hybrid feature-selection pipeline, achieving an overall AUC of 0.8461 on data from the All of Us Research Program.
A researcher describes being harassed by an independent researcher demanding specific citations and phrasing in their paper, raising concerns about aggressive solicitation tactics in the academic community.
Google DeepMind has taken a minority stake in EVE Online's developer (now Fenris Creations) to use the game as a testbed for AI models, studying intelligence in complex, dynamic systems without affecting live players.
A new JAMA paper finds that nonprofit hospitals spent billions on management consultants with no significant impact on financial or patient outcomes.
Hugging Face Hub has surpassed 4,000 public reinforcement learning environments, positioning itself as a potentially largest platform for RL environments.
Microsoft Research introduced Agentic-iModels, a framework where coding agents evolve scikit-learn regressors optimized for LLM interpretability rather than human readability, outperforming traditional interpretable ML methods across 65 datasets.
Anthropic has introduced a new 'sleep' mechanism for AI agents inspired by biological hippocampal replay and dreaming to extract patterns and reorganize memories, aiming to prevent capability plateaus associated with raw context window reliance.
This paper introduces an auto-research framework using specialist agents to iteratively refine training recipes through an empirical loop of code execution and feedback. The system autonomously improves performance on tasks like Parameter Golf and NanoChat without human intervention by leveraging lineage feedback.
This paper introduces MMDG-Bench, a unified benchmark for multimodal domain generalization that reveals limited progress in current methods and significant robustness challenges across diverse tasks.
ClearMesh is a new platform offering Git-like versioning capabilities for datasets, AI models, and binary folders.
Stream-T1 is a proposed framework for test-time scaling in streaming video generation, improving temporal consistency and quality through mechanisms like noise propagation and reward pruning. The paper addresses the high computational costs of existing diffusion-based methods by leveraging chunk-level synthesis.
This paper introduces Side-by-Side Interleaved Reasoning, a method for controlling disclosure timing in autoregressive models to improve accuracy and efficiency. It demonstrates improved performance on benchmarks using Qwen3 models by interleaving private reasoning with partial disclosures.
This paper introduces the Neural Rule Inducer (NRI), a foundation model for zero-shot logical rule induction that uses domain-agnostic statistical properties to generalize across tasks without retraining.
This paper introduces TabEmbed, a generalist embedding model for tabular data that unifies classification and retrieval tasks, along with TabBench, a new benchmark for evaluating tabular understanding.
Hugging Face has released version 5.8.0 of the Transformers library, a widely used open-source framework for natural language processing and deep learning.
This article introduces a polynomial autoencoder that improves upon PCA for compressing transformer embeddings by using a quadratic decoder to capture nonlinear variance. Benchmarks on BEIR show it significantly outperforms standard PCA and Matryoshka embeddings in retrieval quality while maintaining high compression ratios.
Beacon Biosignals is using a lightweight EEG headband and machine learning to map brain activity during sleep, aiming to create a foundation model for brain health and accelerate clinical trials for neurological disorders.
vLLM v0.20.0 is released, an open-source library for high-throughput LLM inference and serving, featuring PagedAttention and support for various hardware architectures.
Hugging Face open-sourced ml-intern, an autonomous agent that reads ML papers, discovers datasets, trains models, debugs failures, and ships production-ready models to the Hub, automating the entire post-training workflow.