When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning
Summary
This paper proposes Adaptive Binning, a learning-coupled feature-wise coarse-to-fine curriculum for tabular self-supervised learning that adaptively discretizes features, improving representations on medical datasets and establishing a unified benchmark.
View Cached Full Text
Cached at: 06/23/26, 01:39 AM
Paper page - When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning
Source: https://huggingface.co/papers/2606.19827

This paper proposes Adaptive Binning for medical tabular self-supervised learning. The core idea is to replace fixed global quantile binning with a learning-coupled, feature-wise coarse-to-fine curriculum that determines when to refine each feature, where to split its bins, and how to supervise mixed categorical–numerical schemas through type-aware ordinal reconstruction.
We show that adaptive discretization yields stronger representations across diverse public medical tabular datasets in both linear probing and fine-tuning evaluations. We also establish a unified benchmark for reproducible medical tabular self-supervised learning.
Similar Articles
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
This paper introduces TabEmbed, a generalist embedding model for tabular data that unifies classification and retrieval tasks, along with TabBench, a new benchmark for evaluating tabular understanding.
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
Introduces MulTaBench, a benchmark of 40 datasets for multimodal tabular learning with text and image modalities, demonstrating that task-specific embedding tuning improves performance over frozen pretrained embeddings, particularly when modalities provide complementary predictive signals.
GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data
This paper introduces GOTabPFN, a method that combines Graph-guided Ordering with Local Refinement (GO-LR) and Neuro-Inspired Subunit Compression (NSC) to make small tabular foundation models effective for high-dimensional, low-sample-size prediction without retraining large backbones.
Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification
This paper adapts classical class imbalance techniques to Prior-Data Fitted Networks (PFNs) for tabular classification, finding that thresholding and downsampling perform well due to PFNs' calibration and limited-data capabilities.
Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation
This paper investigates episodic sampling from few-shot learning for class-balanced batch construction in medical image segmentation, showing improved performance under low-data conditions due to reduced overfitting and extended training iterations, with code available on GitHub.