llm-deployment

Tag

Cards List
#llm-deployment

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

arXiv cs.LG · 16h ago Cached

This paper presents a two-stage methodology for end-to-end LLM deployment on spatial NPUs, progressing from human-guided development to an autonomous agent skill system. The system achieves speedups of 2.2x on prefill and 4.0x on decode for a reference model, and autonomously deploys eight additional LLMs on AMD XDNA 2 NPU with minimal human guidance.

0 favorites 0 likes
#llm-deployment

How are people keeping OpenClaw/Hermes agents running 24/7 without blowing through their API budget?

Reddit r/AI_Agents · 2026-05-21

A practitioner seeks advice on running AI agents 24/7 without high API costs, asking about local models, cloud GPUs, or hosted APIs, and wants cost-efficient setups balancing reliability and reasoning quality.

0 favorites 0 likes
#llm-deployment

@ickma2311: Efficient AI Lecture 13: LLM Deployment Techniques The lecture helped me understand AWQ, vLLM, and FlashAttention very …

X AI KOLs Timeline · 2026-05-13 Cached

A lecture on LLM deployment techniques covering AWQ, vLLM, FlashAttention, quantization, and activation smoothing for efficient serving.

0 favorites 0 likes
#llm-deployment

When a client wants to deploy an LLM internally but their data governance is a mess, do you take the engagement and fix the data first, or walk away?

Reddit r/AI_Agents · 2026-05-13

A discussion on the challenges consultants face when clients want to deploy LLMs despite having poor data governance, weighing the risks of fixing data first versus deploying quickly on messy data.

0 favorites 0 likes
#llm-deployment

Taiwanese company Skymizer announces HTX301 - PCIE inference card with 384GB of Memory at ~240 Watts

Reddit r/LocalLLaMA · 2026-05-08 Cached

Skymizer announces the HTX301, a PCIe inference card capable of running 700B-parameter LLMs on-premises with high memory and low power consumption.

0 favorites 0 likes
#llm-deployment

@anyscalecompute: Most agent frameworks solve orchestration and leave infrastructure completely unresolved. New blog: production-ready AI…

X AI KOLs Following · 2026-05-07 Cached

Anyscale published a technical guide on deploying production-ready AI agents using Ray Serve, MCP, and A2A protocols. The article addresses common infrastructure bottlenecks by proposing a decoupled microservices architecture that enables independent scaling of LLMs, tools, and agents.

0 favorites 0 likes
#llm-deployment

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

Hugging Face Daily Papers · 2026-04-20 Cached

This paper introduces geometric stability measures—based on pairwise distance consistency in representations—to predict language model steerability and detect structural drift. Supervised variants achieve near-perfect correlation (ρ=0.89-0.97) with linear steerability across 35-69 embedding models, while unsupervised variants outperform CKA and Procrustes for post-deployment drift detection.

0 favorites 0 likes
← Back to home

Submit Feedback