large-language-models

Tag

Cards List
#large-language-models

The End of Code Review: Coding Agents Supersede Human Inspection

Hacker News Top · 4h ago Cached

This paper argues that LLM-based coding agents have reached a capability threshold making human code review redundant, and proposes replacing human inspection with agent-driven verification to reduce costs and latency.

0 favorites 0 likes
#large-language-models

Gemini and AI Hallucination

Reddit r/artificial · 6h ago

Discussion of AI hallucination issues in Google's Gemini model, highlighting challenges in reliability and accuracy of large language models.

0 favorites 0 likes
#large-language-models

AI is the Ultimate Bullshitter

Reddit r/artificial · 17h ago

An opinion piece arguing that AI systems, especially large language models, are fundamentally bullshitters because they generate plausible but false information without understanding or intent to deceive.

0 favorites 0 likes
#large-language-models

Causal Discovery in the Era of Agents

Hugging Face Daily Papers · yesterday Cached

This paper argues that language model agents should assist causal discovery workflows by providing contextual support and explanations rather than generating causal conclusions, and introduces causal-learn+ platform to demonstrate this principle.

0 favorites 0 likes
#large-language-models

Qwen 27B for planning, Qwen 35B-A3B for execution?

Reddit r/LocalLLaMA · 2d ago

Discusses using Qwen 27B for planning tasks and Qwen 35B-A3B for execution tasks, suggesting a specialized model approach.

0 favorites 0 likes
#large-language-models

@seclink: Tsinghua University Language Processing Lab: Welcoming postdocs, researchers, and interns to join—you'll have the opportunity to work on cutting-edge large model research and development, with freedom to choose based on your interests. The team provides ample computing power, data, funding, and competitive salaries. Join the research team to work on large models together! No profit or self-sufficiency pressure—just do…

X AI KOLs Following · 2d ago Cached

Tsinghua University Language Processing Lab is recruiting postdocs, researchers, and interns to work on cutting-edge large model research and development. It offers ample computing power, data, funding, and competitive salaries, with a focus on research and open source.

0 favorites 0 likes
#large-language-models

@seclink: Meituan recently released AI browser Tabbit 1.0. It seems Perplexity's Comet doesn't have much of a barrier; anyone can easily make a similar (or even better) product. Indians really talk more and do less... https://meituan.com/news…

X AI KOLs Following · 2d ago Cached

Meituan's GN06 team officially launched AI browser Tabbit 1.0, which integrates multiple top large language models, supports automatic execution of complex tasks across software and web pages, and adds a memory function.

0 favorites 0 likes
#large-language-models

BIM-Edit: Benchmarking Large Language Models for IFC-Based Building Information Modeling

arXiv cs.AI · 3d ago Cached

BIM-Edit is a benchmark for evaluating LLMs on natural-language editing of Building Information Models (BIM) in IFC format. Results show a substantial gap, with the best model achieving only 49.5% average score across geometric, semantic, and topological metrics.

0 favorites 0 likes
#large-language-models

A Systematic Evaluation of Black-Box Uncertainty Estimation Methods for Large Language Models

arXiv cs.AI · 3d ago Cached

This paper presents a systematic review and benchmark of 24 black-box uncertainty estimation methods for large language models across 4 models and 4 dataset settings, finding that no single method dominates but hybrid methods that combine multiple uncertainty signals perform well.

0 favorites 0 likes
#large-language-models

Diffusion Language Models: An Experimental Analysis

arXiv cs.AI · 3d ago Cached

A systematic experimental analysis evaluating eight state-of-the-art Diffusion Language Models across multiple benchmarks, analyzing trade-offs between generation quality and computational efficiency.

0 favorites 0 likes
#large-language-models

Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding

Hugging Face Daily Papers · 3d ago Cached

This paper introduces Confident Decoding, a training-free decoding strategy that dynamically selects the most reliable intermediate layer in LLMs using entropy-guided search, mitigating the alignment tax and improving reasoning performance on benchmarks like GPQA-Diamond and Omni-MATH with negligible overhead.

0 favorites 0 likes
#large-language-models

@h100envy: Ying Sheng co-wrote SGLang, the inference engine now serving Grok at xAI on a hundred thousand GPUs. She also built Fle…

X AI KOLs Timeline · 4d ago Cached

Ying Sheng co-wrote SGLang, the inference engine now serving Grok at xAI on a hundred thousand GPUs, achieving 5x cost cuts over DeepSeek's API; she also built FlexGen and helped build Chatbot Arena.

0 favorites 0 likes
#large-language-models

@seclink: https://x.com/seclink/status/2067968283492712846

X AI KOLs Following · 4d ago Cached

This article, based on the sharing of researcher Victoria Lin, systematically reviews the mainstream technical approaches of native multimodal large models (Chameleon, Transfusion, MOT) and their pros and cons. It points out that multimodal AI is still in the early exploration stage, with open problems such as gaps in scaling laws, inconsistency between image understanding and generation encoding, and connection with the physical world.

0 favorites 0 likes
#large-language-models

@aiwithmayank: THE BEST EXPLANATION OF HOW LLMS ACTUALLY WORK IS A FREE STANFORD LECTURE AND IT STARTS WITH A MOUSE EATING CHEESE it's…

X AI KOLs Timeline · 5d ago Cached

A tweet promotes Stanford's free CS324 course on large language models, which uses a simple example of a mouse eating cheese to explain how LLMs work, and includes interactive demos.

0 favorites 0 likes
#large-language-models

As Easy as Rocket Science: Assessing the Ability of Large Language Models to Interpret Negation in Figurative Language

arXiv cs.CL · 5d ago Cached

This paper investigates how large language models handle the combination of negation and figurative language, finding that this combination poses a particular challenge and that performance depends heavily on prompt style. The authors develop new annotations for the Fig-QA dataset and analyze embedding spaces to uncover additional linguistic factors like tense and concreteness.

0 favorites 0 likes
#large-language-models

SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration

arXiv cs.CL · 5d ago Cached

Introduces SPO, a stochastic search framework for automatic prompt optimization, with three strategies including SAGE, an agent-guided multi-agent pipeline. Evaluated on benchmarks and deployed on a mental-health chatbot, showing improvements in retention through continuous optimization.

0 favorites 0 likes
#large-language-models

Output Vector Editing for Memorization Mitigation in Large Language Models

arXiv cs.CL · 5d ago Cached

Presents output vector editing, a constrained-optimization weight edit to mitigate memorization in LLMs by modifying MLP neuron output vectors instead of zeroing activations, achieving up to 87.9% suppression with minimal locality failures.

0 favorites 0 likes
#large-language-models

RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

arXiv cs.CL · 5d ago Cached

RegMix-D extends RegMix to dynamic data mixing by using loss trajectories from proxy runs to predict optimal mixtures at multiple training stages, achieving improvements over static methods.

0 favorites 0 likes
#large-language-models

The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs

arXiv cs.CL · 5d ago Cached

This paper introduces VETO, a benchmark to quantify 'misfired alignment' where LLMs avoid correct inferences due to safety training, and finds that all tested models exhibit such failures while humans do not.

0 favorites 0 likes
#large-language-models

PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes

arXiv cs.CL · 5d ago Cached

This paper introduces PEC-Home, a simulated home dataset for interpreting progressively elliptical commands in smart homes, and finds that current LLM-based assistants struggle with such commands due to referential and intention ambiguity.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback