Tag
Jerry Liu presents a framework for document parsing across accuracy, cost, and latency tradeoffs, introducing LiteParse as an open-source, low-latency parsing tool for AI agent loops, along with LlamaParse for high-accuracy modes.
Meta was secretly using Google's Gemini for customer service, ad tools, and content moderation because it outperformed their own Llama models, until Google cut off access due to excessive capacity usage.
Proposes a tree-of-thoughts inspired extractive-abstractive approach for legal case judgement summarization using LLMs, with experiments on DeepSeek and LLama showing improved summaries over extractive or abstractive methods alone.
This paper presents a multi-stage explainable framework that combines SHAP-based token attribution, theory-informed linguistic features, and LLaMA-3.1-70B-Instruct LLM reasoning to interpret transformer-based speech models for cognitive impairment detection, achieving strong clinical alignment and high usability scores.
An analysis of political leanings in six major AI models, showing that 4 out of 6 lean left of center on the economic axis, with some models being unaware of their own bias.
Discussion about the significant gap between Llama model benchmark scores and actual real-world performance, with the author seeking assistance.
AutoMegaKernel is an open-source agent harness that compiles any HuggingFace model into a single persistent megakernel, fusing the entire forward pass into one GPU launch to reduce overhead. It achieves up to 1.33x speedup over CUDA-graphed cuBLAS on inference-class GPUs like L4 and L40S, while proving schedules deadlock- and race-free.
This paper uses mechanistic interpretability to audit ethical reasoning in LLaMA 3.1-8B-Instruct, finding a 'Situational Anchor Effect' where domain-specific representations dominate moral computation, and proposing 'Mechanistic Alignment' as a research program.
A Stanford professor delivered a public lecture providing a comprehensive breakdown of how modern LLMs like GPT, Claude, and LLaMA are built under the hood, making advanced architecture accessible to the public.
InfiniteKV is an open-source KV cache technique that compresses old tokens into 104-byte searchable records stored in RAM or on disk, enabling models to handle million-token contexts beyond their trained window without discarding data. Verified working with Mistral-7B and SmolLM2.
This paper investigates sequential fine-tuning of LLaMA-3.1-8B for automated essay scoring using a curriculum aligned with discourse structure, showing improved coherence and performance compared to independent or randomized training.
Meta has abandoned its open-weight Llama model family in favor of a fully proprietary model called Muse Spark, developed by Alexandr Wang's team, marking the end of Meta's role as a champion of open-source AI.
This paper presents ImmigrationQA, a source-grounded dataset of 17,058 QA pairs for U.S. immigration law, and fine-tunes a Llama 3.2 3B model using LoRA, achieving a 27% improvement over the base model on a held-out evaluation set.
Llama Surgery injects learned block-sparse attention topologies into pre-trained Llama 3.1 8B without retraining from scratch, using a Dynamic Topology Router with Gumbel-Softmax routing, temperature annealing, and a Straight-Through Estimator to avoid gradient collapse, achieving stable convergence and coherent output.
This paper explores using LLMs to predict state changes within rule-based interactive storytelling systems, aiming to improve coherence and player expression. Experiments with Llama 3 70B and Gemini 1.5 Flash show that world-state transformations can maintain consistency while encouraging creative player input.
Steeve Morin reports running Llama 3.1 3B on Tenstorrent hardware via ZML, achieving 26 tok/s, close to Tenstorrent's claimed 33 tok/s.
The Heretic LLM de-censorship project received a legal notice from Meta, leading to the removal of derivative Llama models; the project has since moved to a Codeberg mirror and plans technological measures to preserve access.
Meta served a legal notice to the Heretic Project over derivatives of its Llama AI models, prompting the project to remove the weights and announce plans to diversify infrastructure with an official Codeberg mirror.
Miso Labs releases Miso TTS 8B, a text-to-speech model based on the Sesame CSM architecture with a Llama 3.2-style backbone, designed for high-quality conversational speech generation and voice continuation.
A new study published in PNAS shows that advanced LLMs like GPT-4.5 can pass the Turing Test, with participants finding them more human than actual humans, prompting a reevaluation of what the test measures.