convex twin
Summary
The author built a deterministic replay engine for Convex backends to enable local debugging with production snapshots and controlled anomaly testing, seeking feedback from users.
Similar Articles
@GergelyOrosz: Did a deepdive on Antithesis back in 2024, and their multiverse debugger that took years to build. It's now a free arti…
A deep dive on Antithesis, a multiverse debugger for large distributed systems that offers deterministic replay and fault injection, now available as a free article.
2X tk/s (from 19.4 -> 38.1 tk/s on 1 x MI50) Playing with a hypothesis like speculative decoding.. but instead of an additional side model, exploiting that I can run multiple computations side-by-side AS IF I had Qwen3.6-27B loaded twice in memory - small quants don't use all the available compute.
Packed Twin Inference (PTI) is a technique that achieves ~2× LLM throughput by running multiple token sequences in a single batch decode, exploiting weight sharing in llama.cpp without needing a draft model or additional VRAM.
@no_stp_on_snek: got it here if ya want to try it out:
A fork of llama.cpp integrating TurboQuant+ for advanced KV-cache and weight quantization, with cross-backend kernel support (Apple Silicon, NVIDIA CUDA, AMD ROCm, Vulkan) and used in production by LocalAI, Chronara, and AtomicChat.
Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion
Pantheon360 introduces a 3D-aware 360° video diffusion framework that uses an explicit 3D cache to enforce geometric consistency, enabling high-fidelity digital twin generation from sparse 360° inputs.
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino is a speculative decoding framework that decouples causal dependency modeling from autoregressive drafting, using a parallel backbone and lightweight causal refinement head to achieve up to 5.49× end-to-end speedup on Qwen3 models.