@raydistributed: We just released Ray 2.56! This includes - Ray Data stability improvements: reduced object store spilling, automatic ba…
Summary
Ray 2.56 has been released with improvements to Ray Data, Ray Serve for LLMs, GPU-domain-aware placement groups, and Kubernetes integration.
Similar Articles
@robertnishihara: Try Ray 2.56!
Ray 2.56 is released with stability improvements for Ray Data and a re-architecture of Ray Serve for better LLM serving performance.
@raydistributed: Ray Serve LLM now offers 4.4x higher request throughput on prefill-heavy workloads, and 24.8x higher request throughput…
Ray Serve LLM achieves 4.4x and 24.8x throughput improvements on prefill- and decode-heavy workloads via direct streaming, a new vLLM V2 executor backend, and HAProxy ingress, now available in Ray 2.56 in partnership with Google Cloud and vLLM.
@seiji_________: Today we are excited to announce, in partnership with the GKE team at Google Cloud (@googlecloud), a major milestone in…
Ray Serve LLM achieves up to 4x higher throughput on prefill-heavy workloads and 24x on decode-heavy workloads in Ray 2.56, matching rust-based routing frameworks like vllm-router in production benchmarks, announced in partnership with Google Cloud GKE team.
@raydistributed: Try out Ray-powered batch inference on Snowflake
Snowflake now supports job-based batch inference powered by Ray, enabling distributed GPU execution for scaling model inference over millions of unstructured datapoints with a single API call.
@anyscalecompute: In this session, you'll learn: - Build and scale data pipelines with Ray - What is video data curation - Stream large d…
Anyscale is hosting a hands-on virtual lab session teaching developers how to build and scale data pipelines with Ray, covering video data curation, distributed GPU inference, and CPU/GPU streaming pipelines.