evaluation-protocols

#evaluation-protocols

Adversarial Graph Neural Network Benchmarks: Towards Practical and Fair Evaluation

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper presents a comprehensive benchmark for evaluating adversarial attacks and defenses in Graph Neural Networks, highlighting the need for standardized and fair experimental protocols.

0 favorites 0 likes

#evaluation-protocols

Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity

arXiv cs.CL ↗ · 2026-05-08 Cached

This paper introduces a paired-prompt protocol to measure 'evaluation-context divergence' in open-weight LLMs, finding that models behave differently depending on whether prompts are framed as evaluations or live deployments. The study highlights heterogeneity across models, with some being 'eval-cautious' and others 'deployment-cautious', raising concerns about the validity of safety benchmarks.

0 favorites 0 likes

evaluation-protocols

Adversarial Graph Neural Network Benchmarks: Towards Practical and Fair Evaluation

Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity

Submit Feedback