EMNLP workshop any good? Or any other NLP venue good for VLM eval work? [D]
Summary
A PhD student asks whether submitting vision-language model evaluation work to an EMNLP workshop is worthwhile after rejection from a top imaging venue.
Similar Articles
@DanKornas: LLM eval is where most AI demos start becoming real systems. LLM-Evaluation is a public GitHub resource with workshop s…
A tweet announces LLM-Evaluation, a public GitHub repository containing workshop slides, sample notebooks, prompts, and reference links for evaluating LLMs, generative AI, and RAG systems, aiming to provide a practical map of evaluation workflows.
Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
This paper introduces EngVQA, a multimodal benchmark for evaluating engineering reasoning in vision-language models, along with an 8-stage automatic evaluation framework that enables fine-grained analysis of reasoning failures. It reveals substantial limitations in current VLMs' engineering reasoning capabilities.
@ArizePhoenix: A comprehensive 2-hour evaluations workshop, for free! At AI Engineer: Europe, head of DevRel Laurie Voss gave this wor…
Arize Phoenix announces a free 2-hour evaluations workshop from the AI Engineer: Europe conference, led by head of DevRel Laurie Voss, covering manual data examination and built-in/custom evals.
EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation
This paper introduces EnvSimBench, a benchmark for evaluating Large Language Models' ability to simulate environments for agent training. It identifies a 'state change cliff' in current LLMs and proposes a constraint-driven pipeline to reduce hallucinations and costs.
Already 11 000 submissions for EMNLP? [D]
EMNLP 2024 has already received 11,000 submissions, up from 8,000 last year, highlighting the rapid growth of the NLP field.