Tag
GenCAD introduces an image-conditional model that generates full parametric CAD command histories using transformers and diffusion priors, enabling precise and modifiable 3D modeling from images.
AudioMosaic introduces a contrastive learning-based audio encoder that uses structured time-frequency masking on spectrogram patches for efficient large-batch training, achieving state-of-the-art performance on audio benchmarks and improving audio-language models.
This paper introduces a unified geometric framework showing that weighted InfoNCE objectives can be interpreted as Distance Geometry Problems, providing exact characterizations of optimal embeddings for supervised and weakly supervised contrastive learning methods and revealing when such embeddings are geometrically realizable, degenerate, or inconsistent.
This paper presents ConRetroBert, a dual encoder framework for template-based single-step retrosynthesis that uses contrastive pretraining and listwise ranking to improve template prediction accuracy, achieving up to 75.4% top-1 accuracy on the USPTO-50k benchmark while maintaining interpretability.
This paper proposes a unified contrastive framework for learning graph representations across multiple abstraction levels (node, proximity, cluster, graph) with a parameter-free self-weighting mechanism that adaptively assigns weights to similarity scores, outperforming state-of-the-art on downstream tasks like classification, clustering, and link prediction.
This paper proposes CTO, a method that improves code translation by combining syntax-guided and semantic-aware preference optimization through contrastive learning and direct preference optimization, achieving significant improvements over existing baselines in C++, Java, and Python translations.
This paper introduces Context-Aligned Contrastive Regression to improve lexical difficulty prediction by addressing cross-lingual alignment and ordinal structure challenges in language learning datasets.
This article introduces ProtSent, a contrastive fine-tuning framework for protein language models that improves embedding quality for downstream tasks like remote homology detection and structural retrieval.
This paper introduces GCCM, a graph contrastive consistency model that improves generative graph prediction by mitigating shortcut solutions in consistency training through negative pairs and feature perturbation.
This paper introduces TabEmbed, a generalist embedding model for tabular data that unifies classification and retrieval tasks, along with TabBench, a new benchmark for evaluating tabular understanding.
Alibaba researchers propose AFMRL, a two-stage framework that uses MLLMs to extract product attributes and enhance fine-grained multimodal representation learning for e-commerce retrieval tasks.
Researchers from KTH Royal Institute of Technology propose a two-stage framework that fine-tunes LLMs on dialogue transcripts and uses contrastive learning to create joint embeddings for aligning backchannel signals with conversational context, demonstrating improved context-backchannel retrieval compared to previous methods.
Researchers propose Brain-CLIPLM, a two-stage EEG-to-text decoding framework using contrastive learning for semantic anchor extraction and a retrieval-grounded LLM with Chain-of-Thought reasoning, achieving 67.55% top-5 sentence retrieval accuracy and suggesting EEG-to-text decoding should focus on recovering compressed semantic content rather than full sentence reconstruction.
LLMSniffer is a detection framework that fine-tunes GraphCodeBERT with supervised contrastive learning to distinguish AI-generated code from human-written code, achieving 78% accuracy on GPTSniffer and 94.65% on Whodunit benchmarks. The approach addresses critical challenges in academic integrity and code quality assurance by combining code-structure-aware embeddings with contrastive learning and comment removal preprocessing.
SCHK-HTC is a novel method for few-shot hierarchical text classification that combines sibling contrastive learning with hierarchical knowledge-aware prompt tuning to better distinguish semantically similar classes at deeper hierarchy levels. The approach achieves state-of-the-art performance across three benchmark datasets by enhancing model perception of subtle differences between sibling classes.
Proposes Slipform, a training framework that uses lexical concreteness to select harder negatives and a margin-based Cement loss, boosting compositional reasoning in vision-language models.
OpenAI presents a contrastive pre-training approach for generating high-quality text and code embeddings at scale without supervision, achieving state-of-the-art results on linear-probe classification, semantic search, and code search benchmarks.
CLIP is OpenAI's vision-language model that learns from text-image pairs from the internet, enabling zero-shot visual classification without task-specific training data. It addresses major limitations in traditional computer vision by reducing dependence on expensive labeled datasets and improving real-world generalization.