Tag
OpenAI's GPT-5.5 costs 49–92% more than GPT-5.4 in practice despite claimed token efficiency improvements, while Anthropic's Claude Opus 4.7 also raised effective costs by 12–27% for longer prompts, reflecting a broader trend of rising frontier model prices as both companies face massive projected losses.
OpenSeeker fully open-sources training data and models for 30B-scale ReAct-based search agents, achieving state-of-the-art performance on multiple benchmarks including BrowseComp and Humanity's Last Exam. It is the first purely academic project to reach frontier search benchmark performance while releasing complete training data.
A tutorial blog post explaining LLM Routing — the practice of directing user queries to the most appropriate LLM based on cost, latency, and quality. Covers routing strategies, anatomy of an LLM router, and comparisons with Mixture of Experts.
Assistant Professor Ernest K. Ryu at UCLA offers the open course 'Reinforcement Learning for Large Language Models,' comprehensively analyzing key LLM training techniques like RLHF, PPO, and DPO alongside their supporting resources through a blend of theory and practice. The course provides developers and researchers with a systematic learning path from foundational algorithms to practical deployment.
Shares early benchmark scores and evaluation metrics for an open-weight model stack run on a single AMD MI300X, noting competitive performance against closed-source alternatives.
This article promotes the open-source book 'Foundations of LLMs', which systematically explains knowledge about large language models, and introduces the multi-agent development framework Agent-Kernel.
This paper presents an automated diagnostic system for grading knee osteoarthritis severity using an optimized ResNet-18 model deployed on edge devices via TensorFlow Lite. It integrates an LLM interface using Gemini 2.0 Flash to provide structured interpretive findings while maintaining offline capability for resource-constrained environments.
The article introduces SAGE, a multi-agent LLM framework for time-series anomaly detection that uses specialized analyzers to improve interpretability and reliability. It demonstrates superior performance over baselines on three benchmarks and enhances diagnostic reporting through structured evidence consolidation.
This paper proposes Saliency-Aware Regularized Quantization Calibration (SARQC), a unified framework that improves Post-Training Quantization (PTQ) for LLMs by adding a regularization term to preserve weight proximity, enhancing generalization and performance.
This paper introduces AgenticRAG, a framework from Microsoft that enhances enterprise knowledge base retrieval by equipping LLMs with tools for iterative search, document navigation, and analysis. It demonstrates significant improvements in recall and factuality over standard RAG pipelines on multiple benchmarks.
This paper introduces PRISM, a framework that integrates Vision-Language Models and Large Language Models through a dynamic question-answering pipeline to improve sequential decision-making in embodied AI tasks.
This position paper analyzes sycophancy in LLMs as a boundary failure between social alignment and epistemic integrity, proposing a new framework and taxonomy to classify and mitigate these behaviors.
This paper identifies feature starvation in sparse autoencoders as a geometric instability and proposes adaptive elastic net SAEs (AEN-SAEs) to mitigate it without heuristics.
This paper presents PCNet, a probabilistic circuit trained as a tractable density estimator on LLM residual streams to detect hallucinations as geometric anomalies. It also introduces PC-LDCD, a dynamic correction method that only intervenes on hallucinated tokens, achieving near-perfect detection and reduced corruption rates.
This paper proposes an attribution-guided continual fine-tuning framework for large language models that estimates task-specific parameter importance in Transformer layers and modulates gradients accordingly, mitigating catastrophic forgetting while maintaining performance on new tasks.
This paper proposed Distribution-Aligned Adversarial Distillation (DisAAD), a method that uses a lightweight proxy model to estimate uncertainty in black-box LLMs with only 1% of the original model size, achieving reliable quantification without requiring internal parameters or multiple sampling.
Introduces IOP, a framework that internalizes outcome supervision into process supervision for reasoning reinforcement learning, enabling fine-grained credit assignment without external annotations.
BioTool introduces a comprehensive biomedical tool-calling dataset with 34 tools and 7,040 human-verified query-API pairs, enabling fine-tuned LLMs to outperform GPT-5.1 on biomedical tool use and significantly enhance answer quality.
This paper proposes Badit, a method that decomposes large language model parameters into orthogonal high-singular-value LoRA experts to mitigate cross-task interference during multi-task instruction tuning.
Nathan Lambert shares observations from visiting top Chinese AI labs, highlighting cultural differences in research focus and ego compared to US counterparts, while noting parity in hardware and model capabilities.