Tag
Count Anything is a generalist model for text-guided object counting that unifies multiple domains, supported by the new CLOC dataset with 220K images across six visual domains. It achieves strong accuracy and multi-domain generalization.
Count Anything is a generalist vision model for text-guided object counting across multiple domains, using dual-granularity instance enumeration and complementary counting fusion. It achieves strong accuracy and cross-domain generalization, outperforming existing open-world counting methods.
UniSteer introduces a text-guided activation flow matching method to learn a universal conditional velocity field in activation space, enabling versatile LLM behavior control and classification tasks without task-specific intervention modules.
Grounding DINO is an open-vocabulary object detection model that can detect arbitrary objects based on text descriptions, now available on Replicate.