One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation
Summary
Group Prompting introduces a training-free framework for cell instance segmentation that requires only one click per cell type, using the Segment Anything Model's feature space to recursively expand prompts, achieving competitive performance without training.
View Cached Full Text
Cached at: 06/01/26, 03:20 PM
Paper page - One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation
Source: https://huggingface.co/papers/2605.29429
Abstract
Group Prompting enables efficient cell instance segmentation by leveraging per-type prompting through a training-free framework that uses multi-scale encoder features and recursive prompt expansion.
Cell instance segmentationmodels trained on cell-specific datasets suffer severe performance drops on out-of-distribution cell types, whileinteractive foundation modelsovercome this throughper-instance promptingat a cost that is prohibitively expensive for histopathology images containing hundreds to thousands of densely packed instances. We introduce Group Prompting, a new paradigm that shifts interactive segmentation from per-instance O(N) to per-type O(T), where a single click per cell type suffices to segment all instances of that type. Our key observation is that thefrozen image encoderof theSegment Anything Model(SAM) already clusters same-type cells in its feature space before any prompt is given. Exploiting this property, we proposeChain-of-Prompts(CoP), a training-free framework that recursively expands a single user click by (1) identifying reliable same-type locations throughnon-parametric gatingofmulti-scale encoder features, and (2) selecting the most spatially distant reliable point as the next prompt to maximize coverage. On three cell-type-annotated benchmarks, CoP with one click per type retains over 90% of per-instance performance and surpasses fully-supervised methods without any additional training. On four morphologically homogeneous benchmarks, a single click retains over 99%. Project Page: https://shjo-april.github.io/Chain-of-Prompts/
View arXiv pageView PDFProject pageGitHub3Add to collection
Get this paper in your agent:
hf papers read 2605\.29429
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.29429 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.29429 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.29429 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
SAM 3: Segment Anything with Concepts
SAM 3 introduces a unified model for promptable concept segmentation and tracking, achieving state-of-the-art performance with a decoupled recognition and localization architecture and a scalable data engine.
GATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotation
This paper introduces GATHER, a convergence-centric retrieval method for zero-shot cell-type annotation using knowledge graphs, which improves accuracy and reduces LLM costs compared to existing KG-RAG baselines.
Point-E: A system for generating 3D point clouds from complex prompts
OpenAI introduces Point-E, a system for generating 3D point clouds from text prompts in 1-2 minutes on a single GPU by combining text-to-image and image-to-3D diffusion models. The method achieves significant speedup over prior methods while releasing pre-trained models and code.
@SergioPaniego: https://x.com/SergioPaniego/status/2066498136273531363
This post demonstrates how to fine-tune a model for free using a single prompt, leveraging the new Google Colab CLI along with Hugging Face's TRL and trackio tools, all orchestrated by an AI agent.
Decomposing how prompting steers behavior
This paper introduces a nested geometric decomposition framework to analyze how prompting reorganizes internal representations in large language and vision-language models. The authors show that affine transformations, particularly cross-dimensional linear mixing, are key to explaining prompt-induced behavioral changes.