@vllm_project: We just shipped a major redesign of http://recipes.vllm.ai. "How do I run model X on hardware Y for task Z?" now has a …
Summary
vLLM launched a redesigned recipes site that turns any HuggingFace model URL into a ready-to-run inference recipe for specific hardware and tasks.
Similar Articles
vllm-project/vllm v0.19.1
vLLM v0.19.1 release - a fast and easy-to-use open-source library for LLM inference and serving with state-of-the-art throughput, supporting 200+ model architectures and diverse hardware including NVIDIA/AMD GPUs and CPUs.
vllm-project/vllm v0.20.0
vLLM v0.20.0 is released, an open-source library for high-throughput LLM inference and serving, featuring PagedAttention and support for various hardware architectures.
@RedHat_AI: GuideLLM just hit 1,000 GitHub stars. Benchmarking tool for LLM inference under @vllm_project. Test your deployment wit…
GuideLLM, a benchmarking tool for LLM inference built on the vLLM project, reached 1,000 GitHub stars. It enables developers to test deployments with real workloads and measure throughput and latency before production.
@0xSero: Here's everything you need to know about inference and hosting LLMs. Have you ever seen: - vllm - sglang - llama.cpp - …
An overview of popular open-source inference engines including vLLM, SGLang, llama.cpp, and ExLlamaV3 for hosting and running large language models.
@socialwithaayan: HUGGING FACE JUST OPEN-SOURCED THE ML INTERN EVERY RESEARCHER HAS DREAMED OF No more spending days reading papers and w…
Hugging Face open-sourced ml-intern, an autonomous agent that reads ML papers, discovers datasets, trains models, debugs failures, and ships production-ready models to the Hub, automating the entire post-training workflow.