Tag
This paper investigates explicit encoding of ICD-10-CM hierarchy in EHR foundation models, using hierarchical token augmentation and graph-based code representations. Experiments on MIMIC-IV and eICU show improvements over flat code representations for in-domain and cross-dataset prediction tasks.
Re2Pix is a hierarchical video prediction framework that improves future video generation by first predicting semantic representations using frozen vision foundation models, then conditioning a latent diffusion model on these predictions to generate photorealistic frames. The approach addresses train-test mismatches through nested dropout and mixed supervision strategies, achieving improved temporal semantic consistency and perceptual quality on autonomous driving benchmarks.