EMiX: Emulating Beyond Single-FPGA Limits
Summary
Introduces EMiX, a scalable multi-FPGA framework for emulating multi-core RISC-V architectures beyond single-FPGA resource limits, demonstrated with a 64-core system across eight FPGAs.
View Cached Full Text
Cached at: 05/16/26, 09:38 AM
# EMiX: Emulating Beyond Single-FPGA Limits Source: [https://arxiv.org/abs/2604.27012](https://arxiv.org/abs/2604.27012) [View PDF](https://arxiv.org/pdf/2604.27012)[HTML \(experimental\)](https://arxiv.org/html/2604.27012v1) > Abstract:FPGA\-level emulation is a key step in pre\-silicon chip design validation\. However, emulating large\-scale multi\-core systems increasingly exceed the hardware resource capacity of a single FPGA, limiting the feasibility of full\-system emulation\. To address this challenge, we introduce EMiX, a scalable multi\-FPGA framework that enables distributed emulation of multi\-core RISC\-V architectures beyond single\-FPGA resource limits\. EMiX systematically partitions a monolithic multi\-core design into multiple components and deploys them across multiple interconnected FPGAs, effectively exploiting inter\-FPGA interconnects to balance scalability and performance without requiring fundamental RTL redesign\. We prototype EMiX with a 64\-core architecture across eight interconnected Alveo U55c FPGAs \(scalable on core and FPGA counts\), successfully demonstrating full\-system execution including Linux boot\. EMiX will be released as an open\-source platform\. ## Submission history From: Behzad Salami \[[view email](https://arxiv.org/show-email/5c931f29/2604.27012)\] **\[v1\]**Wed, 29 Apr 2026 10:32:10 UTC \(704 KB\)
Similar Articles
REAP-pruned Nemotron-3-Super (512 -> 256 experts) + GRPO fine-tune + FP8/AWQ. AIME 2026 90%+. Benchmark inside.
Community release of REAP-pruned Nemotron-3-Super-120B to 64B, GRPO fine-tuned on math, quantized to AWQ/FP8, hitting 90%+ on AIME 2026 and runnable on a single H100/RTX PRO 6000.
@bastani_behnam: We just published how we unlocked +50% inference capacity on a 27B model — no new GPUs, no new nodes, at a fraction of …
OpenInfer demonstrates "vertical disaggregation" that boosts Qwen 3.5 27B throughput by ~50% by co-executing quantized layers across a single node’s AMD EPYC CPU and Nvidia L40S GPU with a custom SLA-aware scheduler.
Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon
Metal-Sci introduces a 10-task benchmark for optimizing scientific computing kernels on Apple Silicon, paired with an evolutionary search framework driven by large language models. The study evaluates models like Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.5, demonstrating significant speedups while using out-of-distribution testing to catch silent performance regressions.
@LinQingV: When exploring LLM inference chip architectures previously, I reviewed the architectures of the four major AI inference ASIC companies: Groq, SambaNova, Tenstorrent, and Cerebras. While the first three have different emphases, their underlying logic falls within the same framework: large on-chip SRAM + dataflow architecture + deterministic scheduling...
The article analyzes the AI inference ASIC architectures of Groq, SambaNova, Tenstorrent, and Cerebras, highlighting Cerebras's unique wafer-scale engine design. It discusses the benefits of deterministic latency and high bandwidth for LLM inference, while noting challenges like yield, cost, and KV cache bottlenecks.
Sipeed's K3 RISC-V SBCs can run 30B-parameter LLMs 60 TOPS (INT4), Supports BF16/FP16/INT4
Sipeed's new K3 RISC-V single-board computers feature 32GB LPDDR5 and a 60 TOPS NPU, enabling local inference of large language models at up to 15 tokens per second.