research-paper

Tag

Cards List
#research-paper

EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

arXiv cs.LG · 6h ago Cached

EnergyLens is an end-to-end framework for predictive energy-aware optimization of multi-GPU LLM inference, validated on Llama3 and Qwen3-MoE, achieving mean absolute percentage errors between 9.25% and 13.19% and revealing significant energy variation across configurations.

0 favorites 0 likes
#research-paper

Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

arXiv cs.AI · yesterday Cached

This paper introduces Bot-Mod, a moderation framework that identifies malicious intent in multi-agent systems through multi-turn dialogue and Gibbs-based sampling, and presents a dataset from Moltbook for evaluation.

0 favorites 0 likes
#research-paper

Context Is Not Control, a source-boundary eval for LLMs

Reddit r/LocalLLaMA · yesterday Cached

A paper introducing 'Context Is Not Control', an evaluation benchmark for assessing source-boundary failures in LLMs' use of controlled text-mediated evidence. Includes replication packages for open-weight and frontier API models.

0 favorites 0 likes
#research-paper

Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

arXiv cs.AI · 3d ago Cached

This paper investigates methods for improving LLM accuracy in chart data extraction, finding that spatial priming via coordinate grids significantly outperforms semantic prompting strategies.

0 favorites 0 likes
#research-paper

Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs

arXiv cs.CL · 3d ago Cached

This paper investigates smoothness degradation in extremely quantized Large Language Models, arguing that preserving smoothness is crucial for maintaining performance beyond numerical accuracy.

0 favorites 0 likes
#research-paper

Do Benchmarks Underestimate LLM Performance? Evaluating Hallucination Detection With LLM-First Human-Adjudicated Assessment

arXiv cs.CL · 3d ago Cached

This paper investigates whether standard benchmarks underestimate LLM performance by re-evaluating hallucination detection datasets using an LLM-first, human-adjudicated assessment method. The study finds that incorporating LLM reasoning into the adjudication process improves agreement and suggests that model-assisted re-evaluation yields more reliable benchmarks for ambiguity-prone tasks.

0 favorites 0 likes
#research-paper

Towards Customized Multimodal Role-Play

arXiv cs.LG · 3d ago Cached

This paper introduces UniCharacter, a two-stage training framework for Customized Multimodal Role-Play (CMRP) that enables unified customization of persona, dialogue style, and visual identity. It presents the RoleScape-20 dataset and demonstrates that the model can achieve coherent cross-modal generation with minimal data.

0 favorites 0 likes
#research-paper

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

arXiv cs.CL · 3d ago Cached

This article introduces Magis-Bench, a benchmark for evaluating large language models on magistrate-level legal tasks such as judicial reasoning and sentence drafting, using data from Brazilian judicial exams.

0 favorites 0 likes
#research-paper

@rwayne: Yesterday an interesting paper dropped on arXiv that directly translates the 'consciousness' mechanism from cognitive science into long-context engineering.

X AI KOLs Timeline · 2026-05-08

Researchers propose applying the "global ignition" consciousness mechanism from cognitive science to long-context engineering, introducing the MiA-Signature method that uses submodular selection of high-level concepts to cover the activation space. Applied to RAG and agentic systems, it delivers consistent performance improvements across multiple long-context tasks.

0 favorites 0 likes
#research-paper

PRISM: Perception Reasoning Interleaved for Sequential Decision Making

arXiv cs.AI · 2026-05-08 Cached

This paper introduces PRISM, a framework that integrates Vision-Language Models and Large Language Models through a dynamic question-answering pipeline to improve sequential decision-making in embodied AI tasks.

0 favorites 0 likes
#research-paper

When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models

arXiv cs.AI · 2026-05-08 Cached

This position paper analyzes sycophancy in LLMs as a boundary failure between social alignment and epistemic integrity, proposing a new framework and taxonomy to classify and mitigate these behaviors.

0 favorites 0 likes
#research-paper

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

arXiv cs.CL · 2026-05-08 Cached

This paper introduces Steering via Key-Orthogonal Projections (SKOP), a method to control LLM behavior by preventing attention rerouting, thereby reducing utility degradation while maintaining steering efficacy.

0 favorites 0 likes
#research-paper

Logic-Regularized Verifier Elicits Reasoning from LLMs

arXiv cs.CL · 2026-05-08 Cached

Introduces LoVer, an unsupervised verifier that uses logical rules (negation consistency, intra-group and inter-group consistency) to improve LLM reasoning without labeled data, achieving performance close to supervised verifiers on reasoning benchmarks.

0 favorites 0 likes
#research-paper

Rubric-based On-policy Distillation

Hugging Face Daily Papers · 2026-05-08 Cached

This paper introduces ROPD, a rubric-based on-policy distillation framework that achieves superior sample efficiency compared to traditional logit-based methods. It enables model alignment in black-box scenarios by using structured semantic rubrics instead of teacher logits.

0 favorites 0 likes
#research-paper

CBRS: Cognitive Blood Request System with Bilingual Dataset and Dual-Layer Filtering for Multi-Platform Social Streams

arXiv cs.CL · 2026-04-21 Cached

Researchers from Bangladesh University of Engineering and Technology present CBRS, a multi-platform framework that filters and parses blood donation requests from social media using a dual-layer architecture and a novel 11K bilingual dataset in Bengali and English. Their LoRA fine-tuned Llama-3.2-3B model achieves 99% filtering accuracy and 92% zero-shot parsing accuracy, outperforming GPT-4o-mini and other LLMs with 35× reduced token usage.

0 favorites 0 likes
#research-paper

HalluSAE: Detecting Hallucinations in Large Language Models via Sparse Auto-Encoders

arXiv cs.CL · 2026-04-21 Cached

Researchers from Beihang University and other institutions propose HalluSAE, a framework using sparse autoencoders and phase transition theory to detect hallucinations in LLMs by modeling generation as trajectories through a potential energy landscape and identifying critical transition zones where factual errors occur.

0 favorites 0 likes
#research-paper

@omarsar0: Nice paper combining the strength of Skills and RAG. Most RAG systems retrieve on every query, whether the model needs …

X AI KOLs Following · 2026-04-20 Cached

Research introduces Skill-RAG, a novel approach that combines Skills with Retrieval-Augmented Generation to address inefficiencies in traditional RAG systems that retrieve on every query regardless of whether the model actually needs the information.

0 favorites 0 likes
← Back to home

Submit Feedback