ray

#ray

@robertnishihara: Try Ray 2.56!

X AI KOLs Following ↗ · 3d ago Cached

Ray 2.56 is released with stability improvements for Ray Data and a re-architecture of Ray Serve for better LLM serving performance.

0 favorites 0 likes

#ray

@raydistributed: We just released Ray 2.56! This includes - Ray Data stability improvements: reduced object store spilling, automatic ba…

X AI KOLs Following ↗ · 3d ago

Ray 2.56 has been released with improvements to Ray Data, Ray Serve for LLMs, GPU-domain-aware placement groups, and Kubernetes integration.

0 favorites 0 likes

#ray

@robertnishihara: A great example of the importance of disaggregation in RL. From the paper LLM generation alternates between prefill and…

X AI KOLs Following ↗ · 2026-06-20 Cached

Robert Nishihara highlights a paper on disaggregating RL workloads, showing that using compute-optimized H800s for prefill and bandwidth-optimized H20s for decode can cut rollout times by 21-51% and 47% respectively, emphasizing that no single hardware type fits all stages.

0 favorites 0 likes

#ray

@seiji_________: Today we are excited to announce, in partnership with the GKE team at Google Cloud (@googlecloud), a major milestone in…

X AI KOLs Following ↗ · 2026-06-18 Cached

Ray Serve LLM achieves up to 4x higher throughput on prefill-heavy workloads and 24x on decode-heavy workloads in Ray 2.56, matching rust-based routing frameworks like vllm-router in production benchmarks, announced in partnership with Google Cloud GKE team.

0 favorites 0 likes

#ray

@robertnishihara: Some intuition about PD disaggregation from the blog - PD doesn't speed up prefill and can actually hurt TTFT - PD's re…

X AI KOLs Following ↗ · 2026-06-17 Cached

This blog post from Anyscale explains the intuition behind Prefill-Decode (PD) disaggregation for LLM serving, showing how separating prefill and decode phases onto dedicated GPUs can achieve up to 2.7x better goodput and 67% cost savings when using Ray and vLLM on AMD MI325X, while also discussing when PD disaggregation does not help.

0 favorites 0 likes

#ray

@anyscalecompute: Anyscale on Azure is now in public preview, and we're going deep on how it works. Join Daniel Arrizza (Field Engineer, …

X AI KOLs Following ↗ · 2026-06-09 Cached

Anyscale on Azure is now in public preview. Daniel Arrizza and Paul Yu will host a working session on building and deploying production AI workloads within an Azure tenant, integrating with existing Azure services.

0 favorites 0 likes

#ray

@raydistributed: Congratulations to the Microsoft AI team on MAI-Thinking-1! Exciting to see Ray used in multiple parts of frontier-mode…

X AI KOLs Following ↗ · 2026-06-04 Cached

Microsoft AI announces MAI-Thinking-1, a 35B active/1T total MoE reasoning model competitive on STEM and coding tasks, developed using Ray for distributed training and orchestration.

0 favorites 0 likes

#ray

@raydistributed: Try out Ray-powered batch inference on Snowflake

X AI KOLs Following ↗ · 2026-05-21 Cached

Snowflake now supports job-based batch inference powered by Ray, enabling distributed GPU execution for scaling model inference over millions of unstructured datapoints with a single API call.

0 favorites 0 likes

#ray

@anyscalecompute: In this session, you'll learn: - Build and scale data pipelines with Ray - What is video data curation - Stream large d…

X AI KOLs Following ↗ · 2026-05-07 Cached

Anyscale is hosting a hands-on virtual lab session teaching developers how to build and scale data pipelines with Ray, covering video data curation, distributed GPU inference, and CPU/GPU streaming pipelines.

0 favorites 0 likes

#ray

@anyscalecompute: Most coding agents can write Python, but that does not mean they know how to deploy Ray workloads. They still miss GPU …

X AI KOLs Following ↗ · 2026-04-22 Cached

Anyscale releases Agent Skills to help coding agents correctly deploy Ray workloads with proper GPU memory handling and up-to-date APIs.

0 favorites 0 likes

ray

Submit Feedback