Tag
Count Anything is a generalist model for text-guided object counting that unifies multiple domains, supported by the new CLOC dataset with 220K images across six visual domains. It achieves strong accuracy and multi-domain generalization.
Count Anything is a generalist vision model for text-guided object counting across multiple domains, using dual-granularity instance enumeration and complementary counting fusion. It achieves strong accuracy and cross-domain generalization, outperforming existing open-world counting methods.
This paper establishes nonparametric identifiability guarantees for extracting task-relevant representations from generalist models, proving that task structure is identifiable across time steps and latent representations are identifiable within each step under sparsity regularization.