I benchmarked models sized 2B to 35B on hard HTML data extraction

Reddit r/LocalLLaMA Papers

Summary

A benchmark comparing AI models ranging from 2B to 35B parameters on a challenging task of extracting structured data from HTML, evaluating their performance and accuracy.

No content available
Original Article

Similar Articles

Benchmarking Large Language Models for Safety Data Extraction

arXiv cs.CL

This paper benchmarks four large language models (Gemini 1.5 Pro, GPT-4o, Claude 3.7 Sonnet, Llama 3.1-70B) for extracting structured information from Safety Data Sheets, finding that text-based extraction with chain-of-thought prompting yields the highest accuracy (84% by Gemini 1.5 Pro) but no model surpasses the 90% threshold required for reliable industrial deployment.

Why there is a lack of new 100B-120B models?

Reddit r/LocalLLaMA

Analysis of the trend in AI model sizes, noting a gap in the 100-120B parameter range with recent releases focusing on smaller (25-35B) or larger (200B+) models.