automated-evaluation

#automated-evaluation

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability

arXiv cs.AI ↗ · yesterday Cached

AdversaBench introduces an automated LLM red-teaming pipeline that uses five mutation operators and a three-judge panel with a meta-judge tiebreaker to confirm failures, revealing that attack difficulty varies by category and that adversarial prompts transfer from smaller to larger models.

0 favorites 0 likes

#automated-evaluation

I built an open source hyperparameter search tool for diffusion fine-tunes- pick the winner based on scoring

Reddit r/LocalLLaMA ↗ · 2026-05-10

The author introduces 'Bracket', an open-source tool that automates hyperparameter search for diffusion model fine-tuning using parallel training trials and VLM-based scoring to objectively determine the best configuration.

0 favorites 0 likes

automated-evaluation

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability

I built an open source hyperparameter search tool for diffusion fine-tunes- pick the winner based on scoring

Submit Feedback