Tag
This paper systematically evaluates five imbalance handling methods (RUS, ROS, SMOTE, re-weighting, direct F1 optimization) on three biomedical datasets (tabular, text, image) using models of varying complexity. Results show that benefits depend on model complexity and data modality, with ROS, re-weighting, and direct F1 optimization being effective for complex models on unstructured data.
This paper presents a comparative evaluation of classical, ensemble, and neural machine learning approaches for predicting financial distress under severe class imbalance, using SMOTE for oversampling and SHAP for interpretability.
HodgeCover uses higher-order topological coverage to compress sparse Mixture-of-Experts layers by addressing irreducible mergeability barriers that pairwise signals miss, matching state-of-the-art baselines on expert reduction and leading on aggressive compression.
TabPFN-3 is a new foundation model for tabular data, pretrained on synthetic data, that scales to 1M training rows while reducing training and inference time, achieving state-of-the-art performance on tabular prediction, time series, and relational data.
Introduces SciPaths, a benchmark for forecasting the enabling contributions required to realize a target scientific discovery, and evaluates frontier and open-weight language models, finding significant room for improvement in reasoning backward from contributions to enabling building blocks.
Humans are training teacher models to teach student models in a step-by-step manner, penalizing leaps, to improve model intelligence.
dflash-mlx v0.1.6 is released with major agentic improvements, including adaptive verification, custom kernels, prefix cache improvements, and broader compatibility with agentic coding tools like OpenCode, aider, and Continue.
This article announces a working draft book 'Category Theory for Tiny ML in Rust' and a public workshop introducing a tiny ML pipeline using Rust and category theory, aimed at making machine learning structure explicit through typed transformations.
This paper introduces a Hessian matching framework for machine-learned coarse-grained molecular dynamics that augments force matching with stochastic Hessian-vector product matching, instilling second-order curvature information into CG potentials. The method achieves up to 85% reduction in Kullback-Leibler divergence on slow-mode metrics for fast-folding proteins.
This paper explores predicting whether Lightning Network channels will close mutually or via forced closure using machine learning on gossip data. An MLP with temporal features outperforms graph-based models, and the dataset is publicly released.
This paper presents ConRetroBert, a dual encoder framework for template-based single-step retrosynthesis that uses contrastive pretraining and listwise ranking to improve template prediction accuracy, achieving up to 75.4% top-1 accuracy on the USPTO-50k benchmark while maintaining interpretability.
This paper proves that task-relevant latent representations can be identified from generalist models in a fully nonparametric setting without interventions or parametric constraints, achieving a hierarchical identifiability guarantee across time steps and within each step.
This paper introduces Counterfactual Explanation Consistency (CEC), a framework to detect and mitigate hidden procedural bias in outcome-fair models by aligning feature attributions between individuals and their counterfactual counterparts, with experiments on credit and income datasets.
OceanCBM is a concept bottleneck model for spatiotemporal prediction and mechanistic interpretability in ocean forecasting, using mixed supervision to predict mixed layer heat content while imposing soft physical structure. The model achieves interpretable, physically grounded representations without sacrificing predictive skill.
Introduces CAWI, a copula-based weight initialization method for randomized neural networks that models inter-feature dependence, improving predictive performance across 83 classification benchmarks.
This paper introduces BenchJack, an automated red-teaming system that systematically audits AI agent benchmarks by identifying reward-hacking exploits. It applies BenchJack to 10 popular benchmarks, surfacing 219 distinct flaws and demonstrating that evaluation pipelines lack an adversarial mindset, with the system reducing hackable-task ratios from near 100% to under 10% on four benchmarks.
PyTorch 2.12 introduces significant performance improvements including up to 100x faster batched eigendecomposition on CUDA, a new device-agnostic torch.accelerator.Graph API, and support for Microscaling quantization in torch.export, continuing the framework's evolution into a unified production platform.
MLX has reached a milestone where all tests pass on the CUDA backend, indicating improved compatibility with NVIDIA GPUs.
Fast-Slow Training (FST) interleaves context optimization (via GEPA) with model weight updates via RL, achieving 3× sample efficiency over RL alone on math, code, and physics reasoning while preserving plasticity and enabling continual learning.
A paper claiming AGI via ML is impossible using complexity theory has been rebutted by a new paper showing the proof is flawed due to an undefined key term.