When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Hugging Face Daily Papers Papers

Summary

This paper proposes Adaptive Binning, a learning-coupled feature-wise coarse-to-fine curriculum for tabular self-supervised learning that adaptively discretizes features, improving representations on medical datasets and establishing a unified benchmark.

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.
Original Article
View Cached Full Text

Cached at: 06/23/26, 01:39 AM

Paper page - When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Source: https://huggingface.co/papers/2606.19827 main_figure

This paper proposes Adaptive Binning for medical tabular self-supervised learning. The core idea is to replace fixed global quantile binning with a learning-coupled, feature-wise coarse-to-fine curriculum that determines when to refine each feature, where to split its bins, and how to supervise mixed categorical–numerical schemas through type-aware ordinal reconstruction.

We show that adaptive discretization yields stronger representations across diverse public medical tabular datasets in both linear probing and fine-tuning evaluations. We also establish a unified benchmark for reproducible medical tabular self-supervised learning.

Similar Articles

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Hugging Face Daily Papers

Introduces MulTaBench, a benchmark of 40 datasets for multimodal tabular learning with text and image modalities, demonstrating that task-specific embedding tuning improves performance over frozen pretrained embeddings, particularly when modalities provide complementary predictive signals.