large-language-models

#large-language-models

Human understanding is still needed more than ever

Reddit r/ArtificialInteligence ↗ · 4h ago

A commentary emphasizing that despite AI advances, human understanding remains crucial for safe and humane deployment, urging users to verify AI outputs and treat AI with respect.

0 favorites 0 likes

#large-language-models

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

arXiv cs.CL ↗ · 6h ago Cached

This paper investigates why multi-step tool-use reinforcement learning (RL) often collapses or yields limited gains, identifying probability spikes in control tokens as a key cause. It shows that interleaving supervised fine-tuning with RL improves stability and explores various supervisory signals to guide robust training.

0 favorites 0 likes

#large-language-models

Weave of Formal Thought

arXiv cs.CL ↗ · 6h ago Cached

Weave of Formal Thought (WoFT) introduces a sound and complete constrained decoder for code generation that guarantees syntactic validity relative to the full Tree-sitter specification, and a fine-tuning method that trains models to interleave grammar symbols using reweighted wake-sleep, improving perplexity on Python code generation.

0 favorites 0 likes

#large-language-models

SFL-MTSC: Leveraging Semantic Frame-Level Multi-Task Self-Consistency for Robust Multi-Intent Spoken Language Understanding

arXiv cs.CL ↗ · 6h ago Cached

Introduces SFL-MTSC, a structured aggregation framework for robust multi-intent spoken language understanding using LLM self-consistency at the semantic frame level, showing improved slot F1 and overall accuracy on the MAC-SLU benchmark.

0 favorites 0 likes

#large-language-models

A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation

arXiv cs.CL ↗ · 6h ago Cached

This paper presents a red teaming framework for LLMs that uses a multi-role architecture to systematically uncover vulnerabilities, particularly in faithfulness. The framework demonstrated a 7.9% increase in attack success rate in QA tasks and highlights the impact of architectural choices over parameter scaling on model safety.

0 favorites 0 likes

#large-language-models

Hybrid-IR: Dual-Path Hybrid Retrieval with Iterative Reasoning for Complex Medical Question Answering

arXiv cs.CL ↗ · 6h ago Cached

Hybrid-IR introduces a dual-path retrieval framework combining graph-based and dense retrieval with iterative reasoning to improve complex medical QA, addressing limitations in existing RAG methods. Experiments on three benchmarks show effectiveness.

0 favorites 0 likes

#large-language-models

The cognitive, affective, and behavioral expression of self-stigma among people who use drugs in online substance use communities

arXiv cs.CL ↗ · 6h ago Cached

This paper develops a codebook for self-stigma among people who use drugs and analyzes 72,115 Reddit posts to examine prevalence, co-occurrence, and temporal patterns of cognitive, affective, and behavioral stigma indicators, finding that self-stigma is expressed as an integrated phenomenon with behavioral indicators often preceding core indicators.

0 favorites 0 likes

#large-language-models

LLM Evolution as an Industry-Scale Ecosystem: A Lifecycle Perspective on Continual Learning

arXiv cs.LG ↗ · 6h ago Cached

This survey reformulates industrial continual learning for LLMs as a closed-loop update-and-release problem in a versioned ecosystem, identifying key challenges and proposing five lifecycle design principles for sustainable model evolution.

0 favorites 0 likes

#large-language-models

ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling

arXiv cs.AI ↗ · yesterday Cached

ScaleToT proposes a method to generalize structured LLM reasoning for low-activity user modeling at billion scale, using tree-of-thought refinement and training a student model to reduce cost. An online A/B test in advertising deployment showed a 6.738% increase in LT30.

0 favorites 0 likes

#large-language-models

Cross-Lingual Exploration for Parametric Knowledge

arXiv cs.CL ↗ · yesterday Cached

This paper explores cross-lingual prompting strategies to improve access to parametric knowledge in large language models, demonstrating significant gains in knowledge transfer and factual recall across 17 languages on multilingual benchmarks.

0 favorites 0 likes

#large-language-models

On the Smallness of the Large Language Models Scaling Exponents

arXiv cs.AI ↗ · yesterday Cached

The paper discusses the small scaling exponents of large language models, arguing that they indicate an unsustainable regime in terms of energy resources. It also examines the 'pedestal effect' and draws analogies with fluid turbulence to comment on data smoothness.

0 favorites 0 likes

#large-language-models

CompressKV: Semantic-Retrieval-Guided KV-Cache Compression for Resource-Efficient Long-Context LLM Inference

arXiv cs.AI ↗ · yesterday Cached

CompressKV proposes a semantic-retrieval-guided KV-cache compression method for GQA-based LLMs, identifying Semantic Retrieval Heads to retain critical tokens. It achieves over 97% full-cache performance using only 3% of the KV cache on LongBench tasks.

0 favorites 0 likes

#large-language-models

ATRIA: Adaptive Traceable ECG Reporting with Iterative Agents

arXiv cs.AI ↗ · yesterday Cached

ATRIA is a multi-agent system for ECG report generation that mirrors the clinician's iterative workflow, enabling bidirectional editing, evidence grounding, and clinician-in-the-loop verification.

0 favorites 0 likes

#large-language-models

AVOC: Enhancing Hour-Level Audio-Video Understanding in Omni-Modal LLMs via Retrieval-Inspired Token Compression

arXiv cs.CL ↗ · yesterday Cached

AVOC introduces a retrieval-inspired token compression method for omni-modal LLMs that effectively handles hour-long audio-video inputs by selecting informative tokens based on relevance, importance, and diversity. The framework achieves state-of-the-art results on long-form audio-video understanding benchmarks, surpassing prior methods by significant margins.

0 favorites 0 likes

#large-language-models

Pigeonholing: Bad prompts hurt models to collapse and make mistakes

arXiv cs.CL ↗ · yesterday Cached

This paper introduces 'pigeonholing,' a phenomenon where bad prompts cause LLMs to collapse and repeat errors, leading to a 38-40% performance drop. Experiments across 10 tasks and 10 models show worsening with more conversation turns, and propose RLVR with synthetic errors as a mitigation.

0 favorites 0 likes

#large-language-models

Probing the Misaligned Thinking Process of Language Models

arXiv cs.AI ↗ · yesterday Cached

This paper proposes monitoring LLM misalignment by decomposing it into fine-grained cognitive processes (misalignment indicators) and detecting them via linear probes on internal activations, achieving high AUROC on out-of-distribution transcripts.

0 favorites 0 likes

#large-language-models

CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression

arXiv cs.CL ↗ · yesterday Cached

This paper introduces CAVEWOMAN, a two-channel evaluation protocol for assessing the effects of linguistic input and output compression on LLMs. It finds that output compression reduces costs, while input compression increases costs and degrades accuracy, challenging the common 'caveman style' advice.

0 favorites 0 likes

#large-language-models

Sentence-Level Contextual Entrainment in Large Language Models

arXiv cs.CL ↗ · yesterday Cached

This paper extends contextual entrainment from token-level to sentence-level, showing that even counterfactual sentences in prompts increase their probability during inference. The effect decreases with model size and is driven by 2-4% of attention heads, which can be ablated without performance loss.

0 favorites 0 likes

#large-language-models

Towards Version-aware Operations and Transaction Memories for Multi-layer MeMo

arXiv cs.CL ↗ · yesterday Cached

This paper proposes version-aware operations and transaction memories for the MeMo architecture, enabling direct editing of explicit correlation matrix memories instead of full retraining when knowledge changes.

0 favorites 0 likes

#large-language-models

Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization

arXiv cs.CL ↗ · yesterday Cached

This paper introduces CAMS, a modular multi-document summarization framework that extracts atomic claims with token-level provenance, clusters equivalent claims, and rewrites them into summaries with fine-grained, multi-source traceability, significantly improving faithfulness and citation precision.

0 favorites 0 likes

large-language-models

Submit Feedback