Tag
Proposes Slipform, a training framework that uses lexical concreteness to select harder negatives and a margin-based Cement loss, boosting compositional reasoning in vision-language models.