Tag
A new study published in PNAS shows that advanced LLMs like GPT-4.5 can pass the Turing Test, with participants finding them more human than actual humans, prompting a reevaluation of what the test measures.
Meta's new paper presents an agentic system that autonomously discovers neural architectures outperforming Llama 3.2 at 350M, 1B, and 3B scales within a 24-hour compute budget.
A pull request for MTP (likely a model training pipeline or similar) related to LLaMA models has been merged, marking a milestone.
A researcher trained small language models on their own self-generated coding mistakes and corrections, achieving 80% on HumanEval and surpassing GPT-3.5 on math, demonstrating effective self-improvement with minimal resources.
Cyankiwi introduced an updated version of their AWQ 4-bit quantization method that jointly optimizes scales and quantization ranges, achieving lower KL divergence than existing methods on Llama-3 models.
The user converted Nvidia's Llama-Embed-Nemotron-8B model to MLX format with fp16, 8-bit, 4-bit, and 2-bit quantizations, enabling in-process embedding loading on Apple Silicon via mlx-embeddings.
Book publishers and author Scott Turow have filed a class-action lawsuit against Meta and CEO Mark Zuckerberg, alleging the company illegally copied millions of copyrighted works to train its Llama AI models, circumventing licensing and copyright protections.
This research paper investigates how Large Language Models encode social role granularity as a structured latent dimension. It demonstrates that this 'Granularity Axis' is consistent across architectures like Qwen3 and Llama-3, and can be causally manipulated via activation steering.
UniPool introduces a shared expert pool architecture for Mixture-of-Experts models, reducing parameter growth with depth while improving efficiency and performance over standard MoE baselines.
Post questions why no startup has shipped a $200-300 consumer inference chip with Llama 3 baked in, suggesting the industry prefers API subscription revenue over one-time hardware sales.
Researchers from Bangladesh University of Engineering and Technology present CBRS, a multi-platform framework that filters and parses blood donation requests from social media using a dual-layer architecture and a novel 11K bilingual dataset in Bengali and English. Their LoRA fine-tuned Llama-3.2-3B model achieves 99% filtering accuracy and 92% zero-shot parsing accuracy, outperforming GPT-4o-mini and other LLMs with 35× reduced token usage.
User reports successful deployment of Qwen 3.6 with ik_llama quantization achieving 50+ tokens/second on consumer hardware (16GB VRAM, 32GB RAM) with 200k context window.