Tag
This paper introduces WebRISE, a benchmark for evaluating MLLM-generated web artifacts using Interaction Contract Graphs (ICGs) to assess requirement-induced states and transitions across five input modalities. Experiments show even the strongest models achieve limited validity and coverage, with video input providing the strongest interaction signal.