@bingxu_: I started INT21 two months ago, and I’m proud to announce that we’re coming out of stealth today with our first product…
Summary
INT21 announced PTX Kernel Factory, a self-improving agent swarm that autonomously generates expert-level PTX GPU kernels, with open-source proof-of-concept implementations and beta access.
View Cached Full Text
Cached at: 06/17/26, 03:47 AM
I started INT21 two months ago, and I’m proud to announce that we’re coming out of stealth today with our first product: PTX Kernel Factory. We’ve created the first self-improving agent swarm for producing expert-level PTX kernels autonomously and at massive scale. https://int21.ai
INT21 | Self-Improving Compute Infrastructure
Source: https://int21.ai/
Use compute to improve compute.
INT21 builds self-improving AI systems for the software beneath modern AI. Our first product generates and optimizes low-level GPU software, then proves its work with tests and benchmarks.
INT21Use compute to improve computeINT21Use compute to improve compute PTX Kernel FactoryFirst public releaseJune 16, 2026Company launch
Introducing INT21 and PTX Kernel Factory
We are building self-improving AI systems for the software beneath modern AI. The first four implementations produced by PTX Kernel Factory are open source, and the product is entering beta.
01 / First proof
Generated by PTX Factory. Measured against best baseline.
PTX Kernel Factory produced four GPU kernel artifacts across Hopper and Blackwell. We tested them on matching hardware and inputs against established expert implementations, with correctness verified before timing.
GH200 / Hopper1.59x
Peak measured performance.
B200 / Blackwell1.52x
Optimized integration.
GH200 / Hopper8.17%
Faster on geometric mean.
B200 / Blackwell126 / 126
Faster in every comparable case.
Operator-level benchmark results, not full-model speedup claims.
Autonomous optimization loopEvidence retained across generationsPTX Kernel Factory autonomous improvement loopAn autonomous agent swarm generates GPU kernels, evaluates them on target hardware, retains the evidence, and feeds that knowledge into the next generation.PLANWRITEREVIEWTUNEAGENTSWARMAUTONOMOUSTARGET GPULIVE01COMPILE02CORRECTNESS03BENCHMARKBEST VALID CANDIDATE+18.7%GENERATIONSGEN 01GEN 08GEN N+1SEARCH MEMORYRETAINED EVIDENCE STARTS THE NEXT GENERATIONPTX Kernel Factory autonomous improvement loopAn agent swarm generates kernels, target hardware evaluates them, and retained evidence improves the next generation.AUTONOMOUS SWARMPLANWRITEREVIEWTUNEAGENTSWARMTARGET GPULIVECOMPILECORRECTBENCHMARKBEST VALID CANDIDATE+18.7%RETAINED EVIDENCEGEN 01GEN 08GEN N+1EVIDENCE COMPOUNDS INTO THE NEXT GENERATION
- 01### Fully autonomous swarm Specialized agents plan, implement, review, and optimize each kernel end to end.
- 02### Grounded in real hardware Every candidate is compiled, verified, and benchmarked on the target GPU.
- 03### Improvement compounds Reusable evidence improves both the search process and the kernels it produces.
Cloud-native swarm control planeLive system model1. 01 / Elastic scaleThousands of agentsCloud-native scheduling expands the swarm around available compute. 2. 02 / Shared directionOne measurable goalEvery agent works against the same constraints and acceptance criteria. 3. 03 / Generational memoryExperience carries forwardResults, failures, and strategies become the next generation’s starting point.
- 01Agents
- 02Models
- 03GPU
- 04Infrastructure
- 05Cloud
AI-native operating model
Engineering capacity scales with compute, not headcount.
Our experts set direction, constraints, and acceptance criteria. Autonomous agent swarms execute, evaluate, and retain the work, so adding compute expands how much engineering INT21 can perform.
Founders / Cross-stack operators
Research, systems, and infrastructure experience carried into one company.
### Bing Xu
Founder & CEO
- Agents
- Models
- GPU
Bing co-authored the original Generative Adversarial Nets paper, created XGBoost’s Python package, and co-created MXNet and AITemplate. Before founding INT21, he was a Distinguished Engineer at NVIDIA following its acquisition of HippoML, the GPU inference company he founded.
### Qingye Jiang
Founding Partner
- Infrastructure
- Cloud
Qingye has spent more than a decade building and tuning high-performance computing and distributed systems at AWS. His work spans workload analysis, performance engineering, cloud infrastructure, and real-time systems.
PTX Kernel Factory / Beta
Bring us a hard GPU workload.
Start with an operation that is too slow, a new architecture without a mature kernel, or an important workload that has not justified weeks of specialist time.
Similar Articles
@songhan_mit: We develop an agent-native approach to accelerate genAI, continuing the success of KDA (Kernel Design Agent) at a highe…
Enze Xie announces Sol Video Inference Engine, an agent-native, training-free full-stack accelerator for video diffusion that auto-tunes cache, sparse attention, token pruning, quantization, and kernel fusion, achieving >2× end-to-end speedup on large models like 64B Cosmos3-Super and 22B LTX-2.3.
@ycombinator: General Instinct (@gen_instinct) deploys frontier AI models onto constrained edge hardware, helping robotics and physic…
General Instinct launches a deployment layer that enables frontier AI models to run on constrained edge hardware like Jetsons and mobile NPUs, helping robotics and physical AI teams achieve low-latency offline inference.
@CNET: From Nvidia GTC 2026, CEO Jensen Huang talks about investment in AI Natives
Supermicro and NVIDIA unveil turnkey “AI Factory” reference architectures combining Blackwell GPUs, certified servers, networking, storage and deployment services to let enterprises spin up cluster-scale AI infrastructure faster.
I was tired of "babysitting" my AI. So I spent 6 months building a C++20 Autonomous Software House that ships while I sleep
Neon Sovereign is a native C++20/Vulkan autonomous software development workstation that uses a multi-agent swarm to execute software briefs end-to-end, running local LLM weights via Ollama/GGUF with no cloud dependency. The creator is seeking systems engineers and early testers as it enters Active Alpha.
@Ex0byt: A must bookmark.. tiny cracked team, 4 H100 nodes, open source 3 stage recipe, trained on 8k synthetic rubric tasks, fu…
A small team trained a frontier-level Deep Research Agent on an academic budget using only 32 H100s and 8K synthetic samples, releasing fully open weights, code, and paper for models from 2B to 35B that match or beat closed frontier agents on key benchmarks.