clip

#clip

TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

TTL introduces a test-time textual learning framework for OOD detection using pretrained vision-language models like CLIP, which dynamically learns OOD semantics from unlabeled test streams without external OOD labels. The method uses pseudo-labeled samples and an OOD knowledge purification strategy to improve detection robustness across diverse and evolving OOD distributions.

0 favorites 0 likes

#clip

Hierarchical text-conditional image generation with CLIP latents

OpenAI Blog ↗ · 2022-04-13 Cached

OpenAI proposes a hierarchical two-stage model for text-conditional image generation using CLIP latents: a prior that generates CLIP image embeddings from text captions, and a diffusion-based decoder that generates images from embeddings. The approach improves image diversity and enables zero-shot language-guided image manipulations.

0 favorites 0 likes

#clip

Alien Dreams: An Emerging Art Scene

ML at Berkeley ↗ · 2021-06-30 Cached

The article highlights the emerging scene of AI-generated art using OpenAI's CLIP model as a steering mechanism for generative models, showcasing various examples of text-to-image outputs.

0 favorites 0 likes

#clip

Multimodal neurons in artificial neural networks

OpenAI Blog ↗ · 2021-03-04 Cached

OpenAI discovers multimodal neurons in CLIP that respond to the same concept across different modalities (visual, symbolic, textual), mirroring biological neurons and explaining the model's robustness on challenging vision tasks. This interpretability research provides insights into how vision-language models organize and represent abstract concepts.

0 favorites 0 likes

#clip

krthr/clip-embeddings

Replicate Explore ↗ · 20h ago Cached

A CLIP-based embedding model hosted on Replicate that generates 768-dimensional embeddings for both images and text using the clip-vit-large-patch14 architecture, costing ~$0.00022 per run.

0 favorites 0 likes

#clip

andreasjansson/clip-features

Replicate Explore ↗ · yesterday Cached

A model on Replicate that outputs CLIP ViT-L/14 features for text and images, allowing similarity computation between inputs.

0 favorites 0 likes

clip

TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models

Hierarchical text-conditional image generation with CLIP latents

Alien Dreams: An Emerging Art Scene

Multimodal neurons in artificial neural networks

krthr/clip-embeddings

andreasjansson/clip-features

Submit Feedback