shanghai-jiao-tong-university

#shanghai-jiao-tong-university

Causal Probing for Internal Visual Representations in Multimodal Large Language Models

arXiv cs.AI ↗ · 2026-05-08 Cached

This paper proposes a causal framework for probing internal visual representations in Multimodal Large Language Models, revealing differences in how entities and abstract concepts are encoded. The study highlights that increasing model depth is crucial for encoding abstract concepts and uncovers a disconnect between perception and reasoning in current MLLMs.

0 favorites 0 likes

#shanghai-jiao-tong-university

@billtheinvestor: Shanghai Jiao Tong University open-sources F5-TTS speech generation model. The model is trained on 100,000 hours of data and supports bilingual synthesis in Chinese and English. Technical features include zero-shot voice cloning, total-duration-based speed control, emotion expression control, and long text synthesis. Commercial use is allowed.

X AI KOLs Timeline ↗ · 2026-05-08 Cached

Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.

1 favorites 1 likes

shanghai-jiao-tong-university

Causal Probing for Internal Visual Representations in Multimodal Large Language Models

Submit Feedback