computer-vision

Tag

Cards List
#computer-vision

Elastic Attention Cores for Scalable Vision Transformers [R]

Reddit r/MachineLearning · 7h ago

This article presents a new paper on Elastic Attention Cores for Vision Transformers, proposing a core-periphery block-sparse attention structure that improves scalability and accuracy compared to dense self-attention methods like DINOv3.

0 favorites 0 likes
#computer-vision

@berryxia: Guys, my back isn’t chilling. But, I’m thrilled after seeing this model architecture! While everyone is still frantically stacking parameters and competing with general-purpose large models, Interfaze has introduced a brand-new hybrid architecture. It achieves OCR, vision, STT, and structured output accuracy for deterministic tasks that crushes Gemini-3-Flash…

X AI KOLs Timeline · 14h ago Cached

Interfaze introduces a new hybrid AI model architecture that combines DNN/CNN encoders with transformers to achieve superior accuracy and cost-efficiency for deterministic tasks such as OCR, vision, and STT, compared to generalist models.

0 favorites 0 likes
#computer-vision

We tested super-resolution pre-filter for LPR OCR. It did nothing

Hacker News Top · 15h ago Cached

Wink Engineering evaluates the efficacy of neural super-resolution as a pre-filter for license plate OCR, concluding that it fails to improve accuracy and often leads to hallucinated characters compared to training directly on low-resolution data.

0 favorites 0 likes
#computer-vision

FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry

arXiv cs.LG · 15h ago Cached

This paper investigates the geometric structure of intermediate feature representations in deep neural networks by analyzing how various image manipulations map in feature space. It suggests that feature spaces are organized in linear structures to a first approximation, using generative image editing models to probe these representations.

0 favorites 0 likes
#computer-vision

Backbone-Equated Diffusion OOD via Sparse Internal Snapshots

arXiv cs.LG · 15h ago Cached

This paper introduces a protocol for fair comparison of diffusion-based OOD detectors and proposes Canonical Feature Snapshots (CFS), which leverage sparse internal activations for efficient detection.

0 favorites 0 likes
#computer-vision

@IndieDevHailey: Terrifying! You can tell what people are doing behind the next wall just using home Wi-Fi! The open-source project RuView has skyrocketed to over 50k stars on GitHub, totally blowing up! No cameras needed, no wearable devices required—just ordinary home Wi-Fi signals allow you to see through walls: - How many people are next door, their exact locations, whether they’re walking or lying down—all visible - Real-time human pose estimation (17 keypoints) - Automatically measures breathing and heart rate while sleeping - Instant alerts for falls, with highly accurate action recognition

X AI KOLs Timeline · 16h ago

The open-source project RuView leverages Wi-Fi signals and AI technology to achieve camera-free through-wall sensing, capable of real-time human pose recognition, breathing monitoring, and fall detection. It has garnered significant attention on GitHub. Emphasizing privacy and security, the project processes all data locally, supporting easy deployment via ESP32 or Docker.

0 favorites 0 likes
#computer-vision

GitHub - keon/jepa: implementing minimal versions of joint-embedding predictive architecture (JEPA)

Reddit r/ArtificialInteligence · 19h ago Cached

A GitHub repository providing minimal, standalone PyTorch reimplementations of JEPA family models (I-JEPA, V-JEPA, V-JEPA 2, C-JEPA) for educational purposes, including tutorials and visualization tools.

0 favorites 0 likes
#computer-vision

Learngene Search Across Multiple Datasets for Building Variable-Sized Models

arXiv cs.LG · yesterday Cached

This paper introduces LSAMD, a method for extracting 'learngenes' across multiple datasets to initialize variable-sized Vision Transformer models, significantly reducing training costs and storage while maintaining performance comparable to pretrain-finetune methods.

0 favorites 0 likes
#computer-vision

Weakly Supervised Concept Learning for Object-centric Visual Reasoning

arXiv cs.LG · yesterday Cached

This paper introduces a two-stage neuro-symbolic framework that uses weak supervision (as little as 1% labels) with a slot-based VAE to learn interpretable symbols for object-centric visual reasoning, outperforming foundation models in domain generalization.

0 favorites 0 likes
#computer-vision

WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting

Hugging Face Daily Papers · yesterday Cached

This paper introduces WildRelight, a new real-world benchmark dataset for single-image relighting that addresses the gap between synthetic and natural scenes. It proposes a physics-guided adaptation framework using diffusion posterior sampling and test-time adaptation to improve model performance on real-world data.

0 favorites 0 likes
#computer-vision

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Hugging Face Daily Papers · yesterday Cached

Lite3R is a model-agnostic framework that improves the efficiency of transformer-based 3D reconstruction using sparse linear attention and FP8-aware quantization. It reduces latency and memory usage by up to 2.4x while maintaining geometric accuracy on backbones like VGGT and DA3-Large.

0 favorites 0 likes
#computer-vision

MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

Hugging Face Daily Papers · yesterday Cached

MoCam is a research paper introducing a diffusion-based framework for unified novel view synthesis that dynamically coordinates geometric and appearance priors to improve robustness against geometric errors.

0 favorites 0 likes
#computer-vision

LychSim: A Controllable and Interactive Simulation Framework for Vision Research

Hugging Face Daily Papers · yesterday Cached

This paper introduces LychSim, a controllable simulation framework built on Unreal Engine 5 to facilitate vision research, synthetic data generation, and agentic LLM evaluation via MCP integration.

0 favorites 0 likes
#computer-vision

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Hugging Face Daily Papers · yesterday Cached

This paper introduces DRoRAE, a method that improves visual tokenization by fusing multi-layer features from pretrained vision encoders rather than relying solely on the last layer. It demonstrates significant improvements in reconstruction and generation quality on ImageNet and establishes a scaling law between fusion capacity and performance.

0 favorites 0 likes
#computer-vision

From Web to Pixels: Bringing Agentic Search into Visual Perception

Hugging Face Daily Papers · yesterday Cached

This paper introduces WebEye, a benchmark for object localization requiring external knowledge resolution, and Pixel-Searcher, an agentic approach that connects search results to visual annotations.

0 favorites 0 likes
#computer-vision

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

Hugging Face Daily Papers · yesterday Cached

VidSplat is a training-free generative reconstruction framework that uses video diffusion priors to recover complete 3D scenes from sparse inputs by synthesizing novel views.

0 favorites 0 likes
#computer-vision

This guy build a drone that tracks targets with a laser using claude

Reddit r/ArtificialInteligence · 2d ago

A maker built a drone that uses a laser to track targets, leveraging Claude AI to assist in development or processing.

0 favorites 0 likes
#computer-vision

@neil_xbt: ByteDance just dropped the number one trending repo on all of GitHub! 31,400 stars. Still climbing. Agent TARS. A free …

X AI KOLs Timeline · 2d ago

ByteDance released Agent TARS, a free open-source multimodal AI agent stack that achieves #1 trending on GitHub with 31,400 stars. The tool enables GUI control, computer vision, and browser automation across terminals and desktops.

0 favorites 0 likes
#computer-vision

Transformer-Based Wildlife Species Classification from Daily Movement Trajectories

arXiv cs.LG · 2d ago Cached

This paper presents a Transformer-based model for classifying wildlife species using only daily GPS movement trajectories, demonstrating superior accuracy over LSTM and CNN baselines across different studies and regions.

0 favorites 0 likes
#computer-vision

@VincentLogic: Found an incredible open-source desktop AI tool from ByteDance! UI-TARS Desktop, with 31k stars, truly lives up to the hype. It can actually understand your screen and automate computer operations for you. Just tell it "Enable auto-save in VS Code and set the delay to 500ms", and it will automatically: -…

X AI KOLs Timeline · 2d ago

ByteDance's open-source desktop AI automation tool, UI-TARS Desktop, supports local execution and screen visual understanding. It can autonomously control your computer to handle daily tasks through natural language commands.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback