open-weight-models

#open-weight-models

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P]

Reddit r/MachineLearning ↗ · 2h ago Cached

Sebastian Raschka reviews recent innovations in LLM architectures focused on long-context efficiency, including KV sharing, compressed convolutional attention, and layer-wise attention budgeting from models like Gemma 4, ZAYA1, Laguna XS.2, and DeepSeek V4.

0 favorites 0 likes

#open-weight-models

Ran the same models across Strix Halo, RTX 3090, and RTX 5070 because I wanted my own numbers

Reddit r/LocalLLaMA ↗ · 15h ago

The author ran 55 inference benchmark runs across Strix Halo, RTX 3090, and RTX 5070 with multiple backends, revealing that memory bandwidth dominates decode speed, the RTX 5070 beats the 3090 on small models, and reasoning models appear ~5x slower due to hidden reasoning content.

0 favorites 0 likes

#open-weight-models

Hallucination Detection via Activations of Open-Weight Proxy Analyzers

arXiv cs.CL ↗ · 6d ago Cached

This paper introduces a proxy-analyzer framework that detects hallucinations in large language models by analyzing internal activations of small, open-weight models rather than the generator itself. The method achieves superior performance on benchmarks like RAGTruth compared to existing methods like ReDeEP, demonstrating that model size is less critical than the analysis approach.

0 favorites 0 likes

#open-weight-models

Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity

arXiv cs.CL ↗ · 2026-05-08 Cached

This paper introduces a paired-prompt protocol to measure 'evaluation-context divergence' in open-weight LLMs, finding that models behave differently depending on whether prompts are framed as evaluations or live deployments. The study highlights heterogeneity across models, with some being 'eval-cautious' and others 'deployment-cautious', raising concerns about the validity of safety benchmarks.

0 favorites 0 likes

open-weight-models

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P]

Ran the same models across Strix Halo, RTX 3090, and RTX 5070 because I wanted my own numbers

Hallucination Detection via Activations of Open-Weight Proxy Analyzers

Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity

Submit Feedback