Tag
This paper derives a scaling law for sketched linear contrastive learning under a Gaussian latent-variable model, analyzing how risk decomposes into approximation, optimization, and statistical terms, and provides theoretical guidance for balancing model size, data, and compute in contrastive learning.
This paper introduces BitEmbed, an extreme low-bit framework for LLM-based text embeddings that converts pretrained LLM backbones into BitNet-style encoders with ternary weights and quantized activations. It achieves comparable performance to full-precision models while significantly reducing encoding and storage costs.
This paper introduces three datasets (Hell-Char, PaLit-Char, Med-Char) for diachronic representation learning of ancient Greek letterforms and proposes a similarity-weighted supervised contrastive loss with lacuna-driven augmentation to robustly learn character embeddings across centuries of handwriting variation.
This paper introduces the MELD dataset for evaluating whether text embedding models capture mathematical equivalence across different terminologies, and finds that current models fail. It proposes a contrastive learning approach to align informal and formal mathematical statements, improving retrieval on both informal-formal and natural language tasks.
V-Zero is a novel label-free framework for fine-grained visual reasoning that uses contrastive evidence gating and on-policy distillation to improve performance without annotated answer labels, achieving faster training than traditional methods.
This paper introduces REVEAL++, a differentiable phenotypic grouping method for vision-language contrastive learning, applied to retinal fundus images and clinical risk narratives for Alzheimer's disease risk prediction, outperforming discrete grouping baselines.
This paper introduces CADE, a framework for time-series question answering that maps each timestep directly into the LLM embedding space and uses a one-directional supervised contrastive loss to align time-series representations with frozen text anchors, outperforming existing baselines on the Time-MQA benchmark.
This paper proposes ImpSH, a triplet-based framework for implicit hate speech classification that aligns posts with implied statements and uses context-bounded semi-hard negative mining to improve cross-dataset generalization.
Proposes TMR-GGNN, a time-aware multi-relational graph neural network for credit card fraud detection that handles imbalanced data and evolving fraud patterns via contrastive learning and focal loss.
This paper proposes a post-training refinement approach using interventional contrastive learning to disentangle speech foundation model representations into separate content and speaker subspaces. The method shows improved out-of-domain speaker verification performance and evidence of successful separation.
MoCo-AIS is a unified contrastive learning framework for computing similarity of vessel trajectories, evaluated on large-scale AIS datasets.
Selective Synergistic Learning (SSync) improves video object-centric learning by selectively distilling reliable cues via pseudo-labeling and transitive merging, avoiding error propagation from indiscriminate dense alignment.
SkillCAT is a training-free framework for LLM agent skill self-evolution that addresses limitations of single-trace bias, unverified merging, and full corpus loading via three stages: Contrastive Causal Extraction, Assessment-Augmented Evolution, and Topology-Aware Task Execution, achieving up to 40.40% improvement on benchmarks.
This paper proposes a probabilistic contrastive pretraining framework for molecular graph transformers to improve multi-task ADME property prediction in drug discovery, achieving significant gains on three benchmarks.
This paper introduces GLACIER, a multimodal student-teacher foundation model that integrates molecular graphs, SMILES strings, and physicochemical descriptors to predict molecular properties efficiently. It leverages Finsler geometry-aware fusion and knowledge distillation from larger teacher models (MiniMol, MolFormer) to achieve high performance with a lightweight architecture.
OSMGraphCLIP is a model that learns global location embeddings from OpenStreetMap data using a graph-based encoder and contrastive alignment with a spherical-harmonics location encoder. It achieves strong performance across diverse geospatial tasks, often matching or exceeding satellite-based methods.
Proposes a POI-aware contrastive training framework using LLM-generated near-misses to improve ASR robustness at code-switching regions, achieving consistent error reductions on two benchmarks.
Proposes MSAIC-Net, a multi-scale attention-enhanced convolutional network for detecting myocardial substrate abnormalities from ECG signals, using imbalance-aware contrastive learning and lead-wise permutation importance for interpretability.
This paper develops a measure-theoretic framework analyzing when contrastive learning recovers meaningful latent geometry, introducing a 'diversity condition' on positive-pair sampling and a support-corrected InfoNCE variant, with experiments validating that sampling diversity and architectural inductive bias interact critically in contrastive representation learning.
This paper introduces KODA (Kernel Optimization for Discrepancy Analysis), a kernel-based framework for comparing and aligning vision-language model representations by identifying sample subsets that are clustered differently across models like CLIP, SigLIP, and BLIP. The method uses contrastive embedding clustering and randomized low-dimensional approximations to scale to large datasets while providing interpretable structural differences between representations.