interaction-testing

Tag

Cards List
#interaction-testing

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts

arXiv cs.CL · 5d ago Cached

This paper introduces WebRISE, a benchmark for evaluating MLLM-generated web artifacts using Interaction Contract Graphs (ICGs) to assess requirement-induced states and transitions across five input modalities. Experiments show even the strongest models achieve limited validity and coverage, with video input providing the strongest interaction signal.

0 favorites 0 likes
← Back to home

Submit Feedback