perception

#perception

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Hugging Face Daily Papers ↗ · 4d ago Cached

A survey presenting a human-view perspective on video understanding with multimodal large language models, organized around watching, remembering, and reasoning abilities, covering challenges, methods, and applications.

0 favorites 0 likes

#perception

@benhylak: there was a time when an openai launch was heralded as a startup-killer. every company would quake in their boots. it's…

X AI KOLs Following ↗ · 6d ago Cached

A tweet discusses how OpenAI launches are no longer seen as startup-killers, referencing a new Codex feature that deploys websites using Cloudflare's Sites, D1, and R2.

0 favorites 0 likes

#perception

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

Introduces Representation Forcing (RF), a technique that enables unified multimodal models to perform both perception and generation end-to-end without external VAE latent spaces, matching state-of-the-art VAE-based models in image generation while improving understanding.

0 favorites 0 likes

#perception

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

DynaFLIP is a dynamics-aware multimodal pre-training framework that integrates motion understanding into visual perception for robot manipulation. It uses image-language-3D flow triplets and geometric regularization to improve representation learning, achieving significant gains in out-of-distribution scenarios.

0 favorites 0 likes

#perception

Toward Enactive Artificial Intelligence

arXiv cs.AI ↗ · 2026-05-26 Cached

This paper advocates for incorporating enactive approaches to perception and cognition into AI, highlighting four key concepts: experience, action-perception inseparability, autonomy, and embodiment. It finds resonance with reinforcement learning but suggests broader integration of enactive ideas.

0 favorites 0 likes

#perception

at what point do ai-generated images stop feeling ai-generated?

Reddit r/artificial ↗ · 2026-05-24

A reflection on the improving quality of AI-generated images, questioning at what point they become indistinguishable from real photography or digital art.

0 favorites 0 likes

#perception

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

DexHoldem is a real-world benchmark for evaluating embodied agents in dexterous manipulation tasks, using Texas Hold'em with a ShadowHand to test primitive execution, perception, and decision-making in a closed-loop setting.

0 favorites 0 likes

#perception

Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper introduces a reinforcement learning framework that improves perception-reasoning synergy in vision-language models by explicitly rewarding perceptual fidelity, using a 'blindfolded reasoning' proxy and structured verbal verification to address ambiguity in modality credit assignment.

0 favorites 0 likes

#perception

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper proposes Embedding Temporal Logic (ETL), a temporal logic that monitors perception-based autonomous systems directly in learned embedding spaces, enabling specification of high-level perceptual concepts and achieving strong empirical agreement with ground-truth semantics.

0 favorites 0 likes

perception

Submit Feedback