vision

#vision

@PrajwalTomar_: Most Flash models stop at cheaper and faster. This one is built to actually finish the job. I ran Step 3.7 Flash on a r…

X AI KOLs Timeline ↗ · 18h ago Cached

Step 3.7 Flash is a compact model that handles vision, live data retrieval, and code generation to autonomously build a working dashboard from a screenshot in minutes, costing about 50 cents per session.

0 favorites 0 likes

#vision

Claude vision v/s Gemini vision (Gemini is much better in vision and world knowledge)

Reddit r/singularity ↗ · 2d ago

A comparison claiming that Google's Gemini outperforms Anthropic's Claude in vision and world knowledge tasks.

0 favorites 0 likes

#vision

@stevibe: Mistral OCR 4 just dropped with bounding boxes (their most-requested feature) so I plugged it into my form-filling test…

X AI KOLs Timeline ↗ · 3d ago Cached

Mistral OCR 4 has been released with bounding boxes, a highly requested feature. The user tested it for form filling and found it works well, though not perfectly.

0 favorites 0 likes

#vision

Best local model for vision - 2nd benchmark update - 21 Jun 2026

Reddit r/LocalLLaMA ↗ · 5d ago

This post presents the second update of a benchmark for local vision language models, comparing 23 models across 30 images with revised settings, and provides performance recommendations for different VRAM tiers. Key findings include that thinking mode hurts vision performance and that MoE models underperform dense models for perception tasks.

0 favorites 0 likes

#vision

DeepSeek Introduces Vision

Hacker News Top ↗ · 2026-06-18

DeepSeek announces a new vision capability, likely a vision-language model, expanding its AI offerings.

0 favorites 0 likes

#vision

@AlexiGlad: Progress in AI is driven by approaches that make weaker assumptions, which allows for better scaling But representation…

X AI KOLs Following ↗ · 2026-06-16 Cached

Introduces Temporal Difference in Vision (TDV), a new paradigm for representation learning that relies solely on causality, eliminating the need for augmentations, masking, or cropping, and matches state-of-the-art methods like DINO and iBOT on dense spatial tasks.

0 favorites 0 likes

#vision

@ninaddaithankar: Can a vision model learn to see with no augmentations, no masking, no cropping, no reconstruction? It can! Introducing …

X AI KOLs Timeline ↗ · 2026-06-16 Cached

Introduces Temporal Difference in Vision (TDV), a novel visual representation learning paradigm that learns useful representations without augmentations, masking, cropping, or reconstruction, and matches state-of-the-art methods on dense spatial tasks.

0 favorites 0 likes

#vision

Claude Fable has caught up with GPT on ZeroBench (hard vision benchmark)

Reddit r/singularity ↗ · 2026-06-10

Claude Fable has matched GPT's performance on the challenging ZeroBench vision benchmark, with comparable pass@5 and pass^5 scores.

0 favorites 0 likes

#vision

@heyshrutimishra: 1. Fable 5 is state-of-the-art on nearly every benchmark that matters. Software engineering. Science. Knowledge work. V…

X AI KOLs Following ↗ · 2026-06-09 Cached

Anthropic releases Fable 5, claiming it is state-of-the-art on key benchmarks in software engineering, science, knowledge work, and vision, exceeding all previously available models.

0 favorites 0 likes

#vision

I'm building a parallel internet, and it's called The Thinnernet

Hacker News Top ↗ · 2026-06-08 Cached

The author announces a personal project to build a parallel internet called The Thinnernet, drawing inspiration from Steve Jobs and previous work on knowledge bases and low-power operating systems.

0 favorites 0 likes

#vision

Built to benefit everyone: our plan

OpenAI Blog ↗ · 2026-06-08 Cached

OpenAI outlines its plan to make AI broadly beneficial, drawing parallels to the transformative impact of electricity. The company emphasizes building AI that empowers people, distributes power, and remains aligned with human intent.

0 favorites 0 likes

#vision

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

Hugging Face Daily Papers ↗ · 2026-06-07 Cached

MaskAlign proposes a token-subset representation alignment method that improves diffusion transformer training by reducing reliance on complete token sets and maintaining stable alignment under perturbations.

0 favorites 0 likes

#vision

@victormustar: Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drop…

X AI KOLs Following ↗ · 2026-06-05 Cached

A recap of an extraordinary week in open AI, featuring over 25 open-weight model releases across LLMs, image generation, audio/speech, vision, and video/3D, with notable contributions from NVIDIA, Google, and others.

0 favorites 0 likes

#vision

Gemma 4 Unified is coming

Reddit r/LocalLLaMA ↗ · 2026-06-03

A merged PR in llama.cpp implements a new 'Gemma 4 Unified' model type, suggesting Google's upcoming release with a transformer-less vision tower.

0 favorites 0 likes

#vision

@NielsRogge: NEPA has now been added here: Check the evals at the bottom to compare to other models

X AI KOLs Following ↗ · 2026-06-02 Cached

NEPA is a new method for visual self-supervised learning and generative pretraining that predicts the next embedding autoregressively, and has been added to a benchmark for evaluation.

0 favorites 0 likes

#vision

Llama.cpp B9406 MTP mmproj fix

Reddit r/LocalLLaMA ↗ · 2026-05-29

Llama.cpp release B9406 fixes a crash (GGML_ASSERT) when using MTP with MoE vision models like Qwen3.6-35B-A3B.

0 favorites 0 likes

#vision

ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks

arXiv cs.LG ↗ · 2026-05-26 Cached

ChainzRule introduces a neural architecture with learnable polynomial layers and differential regularization, achieving sample-efficient, robust performance across tabular, NLP, and vision tasks with results on Pima Diabetes, SST-5, Yelp Full, and CIFAR-10-C.

0 favorites 0 likes

#vision

@xsser_w: Lu Qi is still amazing. A year ago he told me to work on sandbox/container security, and I didn't realize what he meant. Now looking back... I was so stupid. He had many far-sighted ideas, many of which have been validated now. Damn. Looking at it now, the core of making a harness is sandbox and validation. In the sandbox, you can see all trajectories and boundary explorations.

X AI KOLs Timeline ↗ · 2026-05-23 Cached

The author praises Lu Qi for his insights on sandbox/container security from a year ago, which have since been validated, emphasizing the core role of sandboxes in observing reward hacking.

0 favorites 0 likes

#vision

@elonmusk: True

X AI KOLs Timeline ↗ · 2026-05-21 Cached

Elon Musk's original goal for SpaceX was to increase NASA's budget, not to start a launch company.

0 favorites 0 likes

#vision

@Tesla: The legacy of Model S & X will live on in our vision for autonomy

X AI KOLs Following ↗ · 2026-05-21 Cached

Tesla states that the legacy of Model S and Model X will continue in its autonomy vision.

0 favorites 0 likes

vision

Submit Feedback