EMiX: Emulating Beyond Single-FPGA Limits
Summary
Introduces EMiX, a scalable multi-FPGA framework for emulating multi-core RISC-V architectures beyond single-FPGA resource limits, demonstrated with a 64-core system across eight FPGAs.
View Cached Full Text
Cached at: 05/16/26, 09:38 AM
# EMiX: Emulating Beyond Single-FPGA Limits Source: [https://arxiv.org/abs/2604.27012](https://arxiv.org/abs/2604.27012) [View PDF](https://arxiv.org/pdf/2604.27012)[HTML \(experimental\)](https://arxiv.org/html/2604.27012v1) > Abstract:FPGA\-level emulation is a key step in pre\-silicon chip design validation\. However, emulating large\-scale multi\-core systems increasingly exceed the hardware resource capacity of a single FPGA, limiting the feasibility of full\-system emulation\. To address this challenge, we introduce EMiX, a scalable multi\-FPGA framework that enables distributed emulation of multi\-core RISC\-V architectures beyond single\-FPGA resource limits\. EMiX systematically partitions a monolithic multi\-core design into multiple components and deploys them across multiple interconnected FPGAs, effectively exploiting inter\-FPGA interconnects to balance scalability and performance without requiring fundamental RTL redesign\. We prototype EMiX with a 64\-core architecture across eight interconnected Alveo U55c FPGAs \(scalable on core and FPGA counts\), successfully demonstrating full\-system execution including Linux boot\. EMiX will be released as an open\-source platform\. ## Submission history From: Behzad Salami \[[view email](https://arxiv.org/show-email/5c931f29/2604.27012)\] **\[v1\]**Wed, 29 Apr 2026 10:32:10 UTC \(704 KB\)
Similar Articles
FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail
This paper argues that using FP8 tensor cores with Ozaki Scheme II can replace native FP64 hardware for high-performance scientific computing on AI-optimized GPUs like NVIDIA's B300, achieving full double-precision accuracy at much higher throughput. The authors present a Tensor-Memory Equilibrium model and show that emulated FP64 performance can exceed native FP64 by orders of magnitude across all workloads.
One man, two kernels, and a lot of RISC-V
Yuri Zaporozhets of QRV Systems has built a RISC-V-based personal computer and a mainframe on an FPGA, and rewritten QNX twice. His latest OS QSOE is gaining attention in the FOSS world.
REAP-pruned Nemotron-3-Super (512 -> 256 experts) + GRPO fine-tune + FP8/AWQ. AIME 2026 90%+. Benchmark inside.
Community release of REAP-pruned Nemotron-3-Super-120B to 64B, GRPO fine-tuned on math, quantized to AWQ/FP8, hitting 90%+ on AIME 2026 and runnable on a single H100/RTX PRO 6000.
@onusoz: 16x parallel Gemma-4-26B-A4B-NVFP4 runs 18 output tokens/s, aggregate 300 tok/s 1 DGX Spark with 128 GB unified memo…
@onusoz demonstrates running 16 parallel instances of NVIDIA's quantized Gemma-4-26B-A4B-NVFP4 model on a single DGX Spark with 128GB unified memory, achieving 300 tok/s aggregate, showcasing high concurrency without flashinfer.
Programmable Probabilistic Computer with 1M p-bits
This paper presents a programmable probabilistic computer with one million p-bits by networking FPGAs, achieving Gibbs sampling at over a trillion flips per second for Ising models while introducing a design rule for scaling beyond single-chip limits.