When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Hugging Face Daily Papers 06/18/26, 12:00 AM Papers

tabular-data self-supervised-learning medical-ai binning representation-learning benchmark

Summary

This paper proposes Adaptive Binning, a learning-coupled feature-wise coarse-to-fine curriculum for tabular self-supervised learning that adaptively discretizes features, improving representations on medical datasets and establishing a unified benchmark.

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.

Original Article

View Cached Full Text

Cached at: 06/23/26, 01:39 AM

Paper page - When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Source: https://huggingface.co/papers/2606.19827

This paper proposes Adaptive Binning for medical tabular self-supervised learning. The core idea is to replace fixed global quantile binning with a learning-coupled, feature-wise coarse-to-fine curriculum that determines when to refine each feature, where to split its bins, and how to supervise mixed categorical–numerical schemas through type-aware ordinal reconstruction.

We show that adaptive discretization yields stronger representations across diverse public medical tabular datasets in both linear probing and fine-tuning evaluations. We also establish a unified benchmark for reproducible medical tabular self-supervised learning.

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Paper page - When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Similar Articles

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data

Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification

Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation

Submit Feedback

Similar Articles

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data

Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification

Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation