Tag
A tweet discusses the idea of building a transactional database with infinite throughput using S3-like object storage and content-addressing, where blocks are written in parallel and the root hash is updated periodically.
Amazon unveiled 'Resilient Network Graphs' (RNG), a data center network design that reduces hardware needs by 69% and increases throughput by 33%, now default for most AWS workloads after quiet deployment since last year.
Benchmarks for Qwen 3.6 27B and 35B models on dual RTX PRO 6000 GPUs using VLLM, showing generation throughput up to 3500 tokens per second.
Open-dLLM adapts Qwen3.6 to use diffusion-based generation, achieving over 3,000 tok/s on an RTX 5090 for short sequences, with code released on GitHub.
Kimi K2.6 achieves 56 tokens per second on a single MI300X GPU; user plans further throughput benchmarking.
K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times, boosting throughput from ~15 tokens/s to ~193 tokens/s, ultimately achieving 20% faster inference than LM Studio.