computer-vision

#computer-vision

Elastic Attention Cores for Scalable Vision Transformers [R]

Reddit r/MachineLearning ↗ · 7h ago

This article presents a new paper on Elastic Attention Cores for Vision Transformers, proposing a core-periphery block-sparse attention structure that improves scalability and accuracy compared to dense self-attention methods like DINOv3.

0 favorites 0 likes

#computer-vision

@berryxia: Guys, my back isn’t chilling. But, I’m thrilled after seeing this model architecture! While everyone is still frantically stacking parameters and competing with general-purpose large models, Interfaze has introduced a brand-new hybrid architecture. It achieves OCR, vision, STT, and structured output accuracy for deterministic tasks that crushes Gemini-3-Flash…

X AI KOLs Timeline ↗ · 14h ago Cached

Interfaze introduces a new hybrid AI model architecture that combines DNN/CNN encoders with transformers to achieve superior accuracy and cost-efficiency for deterministic tasks such as OCR, vision, and STT, compared to generalist models.

0 favorites 0 likes

#computer-vision

We tested super-resolution pre-filter for LPR OCR. It did nothing

Hacker News Top ↗ · 15h ago Cached

Wink Engineering evaluates the efficacy of neural super-resolution as a pre-filter for license plate OCR, concluding that it fails to improve accuracy and often leads to hallucinated characters compared to training directly on low-resolution data.

0 favorites 0 likes

#computer-vision

FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry

arXiv cs.LG ↗ · 15h ago Cached

This paper investigates the geometric structure of intermediate feature representations in deep neural networks by analyzing how various image manipulations map in feature space. It suggests that feature spaces are organized in linear structures to a first approximation, using generative image editing models to probe these representations.

0 favorites 0 likes

#computer-vision

Backbone-Equated Diffusion OOD via Sparse Internal Snapshots

arXiv cs.LG ↗ · 15h ago Cached

This paper introduces a protocol for fair comparison of diffusion-based OOD detectors and proposes Canonical Feature Snapshots (CFS), which leverage sparse internal activations for efficient detection.

0 favorites 0 likes

#computer-vision

@IndieDevHailey: Terrifying! You can tell what people are doing behind the next wall just using home Wi-Fi! The open-source project RuView has skyrocketed to over 50k stars on GitHub, totally blowing up! No cameras needed, no wearable devices required—just ordinary home Wi-Fi signals allow you to see through walls: - How many people are next door, their exact locations, whether they’re walking or lying down—all visible - Real-time human pose estimation (17 keypoints) - Automatically measures breathing and heart rate while sleeping - Instant alerts for falls, with highly accurate action recognition

X AI KOLs Timeline ↗ · 16h ago

The open-source project RuView leverages Wi-Fi signals and AI technology to achieve camera-free through-wall sensing, capable of real-time human pose recognition, breathing monitoring, and fall detection. It has garnered significant attention on GitHub. Emphasizing privacy and security, the project processes all data locally, supporting easy deployment via ESP32 or Docker.

0 favorites 0 likes

#computer-vision

GitHub - keon/jepa: implementing minimal versions of joint-embedding predictive architecture (JEPA)

Reddit r/ArtificialInteligence ↗ · 19h ago Cached

A GitHub repository providing minimal, standalone PyTorch reimplementations of JEPA family models (I-JEPA, V-JEPA, V-JEPA 2, C-JEPA) for educational purposes, including tutorials and visualization tools.

0 favorites 0 likes

#computer-vision

Learngene Search Across Multiple Datasets for Building Variable-Sized Models

arXiv cs.LG ↗ · yesterday Cached

This paper introduces LSAMD, a method for extracting 'learngenes' across multiple datasets to initialize variable-sized Vision Transformer models, significantly reducing training costs and storage while maintaining performance comparable to pretrain-finetune methods.

0 favorites 0 likes

#computer-vision

Weakly Supervised Concept Learning for Object-centric Visual Reasoning

arXiv cs.LG ↗ · yesterday Cached

This paper introduces a two-stage neuro-symbolic framework that uses weak supervision (as little as 1% labels) with a slot-based VAE to learn interpretable symbols for object-centric visual reasoning, outperforming foundation models in domain generalization.

0 favorites 0 likes

#computer-vision