vision-language-models

Tag

Cards List
#vision-language-models

SPRITE: From Static Mockups to Engine-Ready Game UI

Hugging Face Daily Papers · 2026-03-18 Cached

SPRITE introduces a pipeline that converts static game UI screenshots into editable engine assets using vision-language models and YAML to handle complex layouts and nesting.

0 favorites 0 likes
#vision-language-models

A better method for planning complex visual tasks

MIT News — Artificial Intelligence · 2026-03-11 Cached

MIT researchers developed VLMFP, a two-stage generative AI approach combining vision-language models with formal planning software to achieve 70% success rate on complex visual planning tasks like robot navigation, nearly 2.3x better than existing baselines. The method automatically translates visual scenarios into planning files that classical solvers can process, enabling effective long-horizon planning in novel environments.

0 favorites 0 likes
#vision-language-models

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Papers with Code Trending · 2025-10-16 Cached

PaddleOCR-VL is a compact 0.9B vision-language model that achieves state-of-the-art performance in multilingual document parsing and element recognition by integrating NaViT-style dynamic resolution with the ERNIE language model.

0 favorites 0 likes
← Previous
← Back to home

Submit Feedback