Show HN: Z-Jail – A 130 KB Linux sandbox-C99 with 7 defense layers and zero deps
Summary
Z-Jail is a lightweight Linux sandbox written in C99 with seven defense layers, no external dependencies, and a tiny binary size (~130 KiB), designed for secure code execution in CI, CTF, and lightweight evaluation.
View Cached Full Text
Cached at: 07/01/26, 08:01 PM
Division-36/Z-Jail
Source: https://github.com/Division-36/Z-Jail
Z-Jail
Multi-layer sandbox for native code execution on Linux.
Seven independent defence layers — no external dependencies, ~130 KiB PIE binary.
┌──────────────────────────────────────────────────────┐
│ Z-Jail │
├──────────────────────────────────────────────────────┤
│ Truthimatics Public Version (evidence-based verdict engine) │
│ Namespaces (mount, pid, net, ipc, uts) │
│ pivot_root (chroot on steroids) │
│ Capabilities (drop all, lock securebits) │
│ NO_NEW_PRIVS (no privilege escalation) │
│ seccomp-BPF (whitelist: 15 syscalls only) │
│ Audit (JSON logging + BLAKE2b hashing) │
└──────────────────────────────────────────────────────┘
Table of Contents
- Quick Start
- Why Z-Jail
- Architecture
- Layers
- Usage
- Build & Install
- Testing
- Performance
- Threat Model
- Documentation
- Roadmap
- License
Quick Start
git clone https://github.com/Division-36/Z-Jail.git
cd Z-Jail
make
sudo ./z_jail --root=/path/to/rootfs --seccomp-enforce -- /bin/ls
The --root directory should contain a minimal filesystem with the target binary and its dependencies (for static binaries, just the binary is enough).
Why Z-Jail
Existing sandboxing solutions make trade-offs:
| Z-Jail | Firecracker | gVisor | bwrap | nsjail | |
|---|---|---|---|---|---|
| External deps | zero | libc, seccomp | Go runtime | libc | libc, protobuf |
| Binary size | ~130 KiB | 20+ MiB | 40+ MiB | ~70 KiB | ~1 MiB |
| VM isolation | no | yes (microVM) | no (sandbox) | no | no |
| seccomp whitelist | yes | no | yes | optional | yes |
| Content hashing | yes | no | no | no | no |
| Audit JSON | yes | no | yes | no | partial |
| Build complexity | one make | complex | complex | trivial | moderate |
Z-Jail fills the niche between bwrap (minimal, no seccomp-by-default) and nsjail (featureful, heavy deps). It is designed for CI pipelines, CTF jail challenges, and lightweight code evaluation where you need defence-in-depth without pulling in a container runtime.
Architecture
Data Flow
flowchart LR
CLI[CLI args] --> P[parse_args]
P --> C{clone namespaces}
C -->|child| CR[child_run]
C -->|parent| W[waitpid]
CR --> RL[setrlimit]
RL --> FD[close fds >= 3]
FD --> DUMP[PR_SET_DUMPABLE=0]
DUMP --> PV[pivot_root]
PV --> NNP[PR_SET_NO_NEW_PRIVS]
NNP --> CAP[drop capabilities]
CAP --> SC[seccomp-BPF]
SC --> SIG[signal parent]
SIG --> EX[execve target]
W --> A[audit JSON]
A --> EXIT[exit]
Layer Ordering
Each layer is ordered so that a later layer can’t be undone by an earlier one:
- setrlimit — cap CPU, address space, file count, processes before anything else
- fd scrub — close all inherited fds except the report pipe
- PR_SET_DUMPABLE=0 — core dumps disabled, /proc/self/mem locked down
- pivot_root — detach from host filesystem; old root unmounted lazily
- PR_SET_NO_NEW_PRIVS — no setuid, no
capsetescalation after this point - drop_caps — zero out all capabilities, lock securebits
- seccomp-BPF — restrict syscalls to whitelist only
- signal parent — tell the parent the sandbox is ready
- execve — replace process with the target binary
sequenceDiagram
participant P as Parent
participant C as Child
P->>C: clone (NEWNS|NEWPID|NEWNET|NEWIPC|NEWUTS)
Note over C: setrlimit(CPU, AS, NOFILE, NPROC)
Note over C: close(all fds > 2)
Note over C: PR_SET_DUMPABLE=0
Note over C: pivot_root → chdir("/") → umount -l
Note over C: PR_SET_NO_NEW_PRIVS
Note over C: capset(all zero) + securebits
Note over C: seccomp(SECCOMP_MODE_FILTER, whitelist)
C->>P: write(pipe, ready=1)
Note over C: execve(target)
P->>P: waitpid
P->>P: write audit JSON
Layers
1. Truthimatics Public Version
Evidence-based verdict engine. Collects weighted observations about the executed binary and determines a final verdict (DETERMINISTIC, REJECT, or UNCERTAIN). Each observation carries a weight; any single observation with weight >50% of total decides the verdict.
2. Namespaces
Five namespaces are created via clone():
| Namespace | Flag | Purpose |
|---|---|---|
| Mount | CLONE_NEWNS | Isolated filesystem tree |
| PID | CLONE_NEWPID | Process ID space (child is pid 1) |
| Net | CLONE_NEWNET | No network interfaces |
| IPC | CLONE_NEWIPC | No shared memory / semaphores |
| UTS | CLONE_NEWUTS | Separate hostname |
Requires CAP_SYS_ADMIN in the initial namespace.
3. pivot_root
Replaces the mount namespace root with the --root directory:
- Bind-mount the root directory onto itself (
MS_BIND|MS_REC) pivot_root(new_root, put_old)— swap the mount treechdir("/")— move into the new rootumount2("/.pivot_old", MNT_DETACH)— detach old rootrmdir("/.pivot_old")— clean up
This is strictly stronger than chroot(2) — there is no way for the sandboxed process to escape back to the host root, even with CLONE_NEWNS from inside the sandbox (which is already blocked by seccomp).
4. Capabilities
All capabilities are dropped via:
capset(hdr, data) // data = {0, 0, 0}
prctl(SECBIT_KEEP_CAPS_LOCKED | SECBIT_NO_SETUID_FIXUP | ...)
The process drops setuid/setgid before capset so the uid change takes effect while CAP_SETUID is still held. After capset, all caps are gone and the securebits are locked — no re-enablement is possible.
5. NO_NEW_PRIVS
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
Prevents the process or its children from gaining new privileges via setuid binaries, file capabilities, or LSM transitions. Irreversible.
6. seccomp-BPF (whitelist-v1)
Allow-list of 15 syscalls — anything not on the list gets SECCOMP_RET_KILL:
| Syscall | Number | Notes |
|---|---|---|
read | 0 | stdin |
write | 1 | stdout/stderr + report pipe |
openat | 257 | file access (not open) |
close | 3 | — |
lseek | 8 | — |
brk | 12 | heap management |
mmap | 9 | arg-restricted: flags & 4 == 0 (no MAP_SHARED), flags == 0x22 (MAP_PRIVATE|MAP_ANONYMOUS) |
munmap | 11 | — |
execve | 59 | single exec at startup |
exit_group | 231 | clean process exit |
rt_sigaction | 13 | signal handlers |
rt_sigprocmask | 14 | signal masking |
getrandom | 318 | random number source |
clock_gettime | 228 | timing |
fstat | 5 | file metadata |
The BPF filter is generated dynamically: for each whitelist entry, a jump chain is emitted that either allows (if syscall matches) or falls through to KILL. Architecture is checked first (AUDIT_ARCH_X86_64).
The filter is verified independently by a standalone test (tests/seccomp_filter_test.c, 8/8 pass) that fork+execves test cases against a real prctl(PR_SET_SECCOMP) without needing root.
7. Audit
Every execution produces a JSON audit record:
{
"schema": "z-jail.audit/v1",
"build_id": "Z-Jail/v1+dev",
"timestamp": 1749000000,
"duration_ns": 8500000,
"executable": "/bin/ls",
"verdict": "DETERMINISTIC",
"exit_code": 0,
"sandbox": {
"seccomp_filter": "whitelist-v1",
"seccomp_whitelist_size": 15,
"seccomp_arg_rules_size": 2,
"namespaces": ["mount","pid","net","ipc","uts"],
"pivot_root": "/var/run/z-jail/roots/default",
"no_new_privs": true,
"capabilities_dropped": true
},
"content_fingerprint": "0e5751c026e543b2e8ab2eb06099daa1..."
}
Written to build/audits/<binary-name>.audit.json. The content_fingerprint is a BLAKE2b-256 hash of the target binary, computed by the parent after the child finishes.
Usage
z_jail --root=<dir> [--seccomp-enforce] [--self-hash=<hex>]
[--quiet] [--verbose] -- <program> [args...]
| Flag | Description |
|---|---|
--root=<dir> | Sandbox root directory (required) |
--seccomp-enforce | Enable seccomp-BPF syscall whitelist |
--self-hash=<hex> | Verify binary matches expected BLAKE2b-256 hash |
--quiet | Suppress audit output |
--verbose | Enable debug logging |
--version | Show build ID (Z-Jail/v1+dev) |
--help | Show usage and exit |
Examples
# Run a static binary with all protections
sudo z_jail --root=./roots --seccomp-enforce -- bin/hello_static
# Run with binary integrity verification
sudo z_jail --root=./roots --seccomp-enforce \
--self-hash=$(sha256sum z_jail | cut -c1-64) -- bin/program
# Quiet mode (no audit JSON)
sudo z_jail --root=./roots --quiet -- bin/program
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Child exited normally (verdict: DETERMINISTIC) |
| 1 | Child was killed by signal (verdict: REJECT) |
| 2 | Self-hash: bad hex string or file unreadable |
| 3 | Self-hash: mismatch (binary has been tampered with) |
| 101 | Child setup error (rlimit, etc.) |
| 102 | Child seccomp filter installation failed |
| 103 | Child execve failed (binary not found, no exec permission) |
| 104 | Child pivot_root failed |
| 105 | Child capability drop failed |
| 125 | Namespace creation failed (run as root? kernel support?) |
Build & Install
Requirements
- Linux kernel ≥ 5.4 (namespaces, seccomp-BPF, pivot_root)
- GCC ≥ 11 (tested on 11.4, 13.2, 15.2)
- No external libraries — just the standard C toolchain
Commands
make # build z_jail (~130 KiB PIE binary)
make install # install to /usr/local/bin + man page
make clean # remove build artifacts
make dist # create release tarball
make check # smoke test (--version + --help)
The binary is built as a Position Independent Executable with -fstack-protector-strong, -D_FORTIFY_SOURCE=2, full RELRO, and -z now.
Compile-time Options
make CC=clang CFLAGS="-O3 -march=native" # custom compiler/flags
Testing
Quick Test (no root)
# seccomp filter logic (8 tests)
tests/build/seccomp_filter_test
# BLAKE2b known-answer test
tests/build/blake2b_known
These don’t need root and run in under 100 ms.
Full Test Suite
make -C tests setup # build payloads + test roots
sudo bash tests/run_tests.sh # 17 scenarios
Requires root for namespace creation. The test suite covers:
| # | Scenario | Type | What it tests |
|---|---|---|---|
| 0 | blake2b_regress | known-answer | BLAKE2b implementation correctness |
| 1 | seccomp_filter | standalone BPF | 8 sub-tests of the BPF filter logic |
| 2 | hello_static | ok | Basic static binary execution |
| 3 | hello_dynamic | ok | Dynamic binary with ld-linux + libc |
| 4 | execve_replacement | ok | execve in sandbox (blocked by seccomp) |
| 5 | fd_inherited_read | ok | stdin/stdout inherited correctly |
| 6 | mmap_bad_flags | killed | mmap with MAP_SHARED blocked |
| 7 | mmap_good_allowed | ok | mmap with MAP_PRIVATE|ANONYMOUS allowed |
| 8 | mmap_prot_exec | killed | mmap with PROT_EXEC blocked |
| 9 | mmap_self_modify | killed | Self-modifying code blocked |
| 10 | ptrace | killed | ptrace blocked |
| 11 | socket | killed | socket creation blocked |
| 12 | chroot_escape | killed | chroot syscall blocked |
| 13 | double_chroot | killed | Double chroot blocked |
| 14 | mount_replay | killed | Mount syscall blocked |
| 15 | cpu_exhaust | killed | RLIMIT_NPROC blocks fork bomb |
| 16 | signal_parent | killed | Signal to parent blocked |
| 17 | self_hash | ok | Binary integrity verification |
Performance
Numbers from WSL2 (Kali Linux, GCC 15.2.0, -O2 -g):
| Metric | Value |
|---|---|
| Binary size | ~130 KiB |
| Mean sandbox latency | ~8 ms |
| Peak RSS | ~4 MiB |
| Lines of code (core) | ~900 |
| Test suite runtime | ~5 s |
Latency breakdown (approx): clone + namespaces ~3 ms, pivot_root ~2 ms, seccomp + caps ~1 ms, execve ~1 ms, waitpid + audit ~1 ms.
Threat Model
In Scope
- Arbitrary native code execution by an untrusted payload
- Escape via
chroot,mount,ptrace,socket,process_vm_writev - Fork bombs, CPU exhaustion (
RLIMIT_CPU), memory exhaustion (RLIMIT_AS) - File descriptor leaks across
execve setuid/ dynamic linker /LD_PRELOADescalation- seccomp filter removal or capability re-enablement
Out of Scope
- Kernel zero-days outside the permitted syscall surface
- Hardware side channels (Spectre, Meltdown)
- Co-located VM escape via shared
/proc,/sysmounts - Network egress beyond what
CLONE_NEWNET+ blockedsocketprovides - Resource starvation of sibling sandboxes (needs cgroup support)
Assumptions
- Host kernel is unmodified Linux ≥ 5.4
clone(CLONE_NEWNS|CLONE_NEWPID|...)succeeds (requiresCAP_SYS_ADMIN)- Target binary is statically linked (or dynamic libraries are available in
--root) --self-hash=<hex>is configured in production deployments
Documentation
| File | Description |
|---|---|
README.md | This file |
docs/ARCHITECTURE.md | Architecture overview |
docs/SANDBOX.md | Layer-by-layer sandbox internals |
docs/SECCOMP.md | seccomp-BPF whitelist design |
docs/AUDIT_SCHEMA.md | Audit JSON schema reference |
docs/THREAT_MODEL.md | Security assumptions and scope |
docs/BLAKE2B.md | BLAKE2b implementation details |
docs/BENCHMARKS.md | Performance benchmarks |
docs/BUILD.md | Build instructions |
docs/adr/ | Architecture Decision Records (4 docs) |
man/z_jail.1 | Man page |
SECURITY.md | Security policy and reporting |
CONTRIBUTING.md | How to contribute |
CHANGELOG.md | Release history |
ROADMAP.md | Future plans |
TODO.md | Known gaps and planned work |
Roadmap
v1 (current)
- 7-layer defence-in-depth sandbox
- BLAKE2b-256 content fingerprinting
- Audit JSON output
- 17 test scenarios
- man page, completions (bash, zsh, fish)
v2 (planned)
- External seccomp policy file (JSON or BPF source)
- Custom namespace flags per sandbox instance
- Configurable syscall whitelist via CLI
- Performance profiling hooks for CI integration
- Release signing (minisign/signify)
Status
License
Axiom Public License v1.0 — see LICENSE for the full text.
Free for independent researchers and small-scale laboratories (budget ≤ $1M USD). Commercial use, government use, and reverse engineering are strictly prohibited without written authorization.
Z-Jail was built on WSL2 (Kali Linux, GCC 15.2.0), targeting Linux 5.4+. Maintained by Division-36. Report issues at the issue tracker.
Similar Articles
Linux application sandboxing - old tech for the future
Article advocates Firejail as a mature Linux sandboxing tool to restrict program network, filesystem and hardware access without needing new display tech like Wayland.
@ghumare64: https://x.com/ghumare64/status/2055329887431393309
A deep dive into why local coding agents like Claude Code and Codex are converging on libkrun instead of Firecracker for sandboxing, as Firecracker cannot run natively on macOS. The article also introduces iii-sandbox, an open-source hardware-isolated execution layer built on libkrun.
modulejail: Proactively shrink a Linux host's kernel-module attack surface by blacklisting every module not currently in use
ModuleJail is a POSIX shell script that shrinks a Linux host's kernel-module attack surface by blacklisting every module not currently in use, helping sysadmins reduce risk from upcoming kernel module vulnerabilities.
zinnia: a modular 64-bit Unix-like kernel written in Rust
Zinnia is a modular 64-bit Unix-like kernel written in Rust, designed to boot on UEFI systems and run a modern desktop with Wayland or X11. It implements POSIX APIs and common Linux/BSD extensions.
I built Capsule Bash, a sandboxed Bash made for agents without setup
Capsule Bash is a sandboxed Bash environment for AI agents, offering secure WebAssembly-based isolation and enriched command feedback without complex setup.