Show HN: Z-Jail – A 130 KB Linux sandbox-C99 with 7 defense layers and zero deps

Hacker News Top 07/01/26, 07:18 PM Tools

linux sandbox security c99 namespaces seccomp-bpf open-source

Summary

Z-Jail is a lightweight Linux sandbox written in C99 with seven defense layers, no external dependencies, and a tiny binary size (~130 KiB), designed for secure code execution in CI, CTF, and lightweight evaluation.

No content available

Original Article

View Cached Full Text

Cached at: 07/01/26, 08:01 PM

Division-36/Z-Jail

Source: https://github.com/Division-36/Z-Jail

Z-Jail

Multi-layer sandbox for native code execution on Linux.
Seven independent defence layers — no external dependencies, ~130 KiB PIE binary.

┌──────────────────────────────────────────────────────┐
│                    Z-Jail                            │
├──────────────────────────────────────────────────────┤
│  Truthimatics Public Version  (evidence-based verdict engine)       │
│  Namespaces    (mount, pid, net, ipc, uts)           │
│  pivot_root    (chroot on steroids)                  │
│  Capabilities  (drop all, lock securebits)           │
│  NO_NEW_PRIVS  (no privilege escalation)             │
│  seccomp-BPF   (whitelist: 15 syscalls only)         │
│  Audit         (JSON logging + BLAKE2b hashing)      │
└──────────────────────────────────────────────────────┘

Quick Start
Why Z-Jail
Architecture
Layers
Usage
Build & Install
Testing
Performance
Threat Model
Documentation
Roadmap
License

Quick Start

git clone https://github.com/Division-36/Z-Jail.git
cd Z-Jail
make
sudo ./z_jail --root=/path/to/rootfs --seccomp-enforce -- /bin/ls

The --root directory should contain a minimal filesystem with the target binary and its dependencies (for static binaries, just the binary is enough).

Why Z-Jail

Existing sandboxing solutions make trade-offs:

	Z-Jail	Firecracker	gVisor	bwrap	nsjail
External deps	zero	libc, seccomp	Go runtime	libc	libc, protobuf
Binary size	~130 KiB	20+ MiB	40+ MiB	~70 KiB	~1 MiB
VM isolation	no	yes (microVM)	no (sandbox)	no	no
seccomp whitelist	yes	no	yes	optional	yes
Content hashing	yes	no	no	no	no
Audit JSON	yes	no	yes	no	partial
Build complexity	one `make`	complex	complex	trivial	moderate

Z-Jail fills the niche between bwrap (minimal, no seccomp-by-default) and nsjail (featureful, heavy deps). It is designed for CI pipelines, CTF jail challenges, and lightweight code evaluation where you need defence-in-depth without pulling in a container runtime.

Architecture

Data Flow

flowchart LR
    CLI[CLI args] --> P[parse_args]
    P --> C{clone namespaces}
    C -->|child| CR[child_run]
    C -->|parent| W[waitpid]
    CR --> RL[setrlimit]
    RL --> FD[close fds >= 3]
    FD --> DUMP[PR_SET_DUMPABLE=0]
    DUMP --> PV[pivot_root]
    PV --> NNP[PR_SET_NO_NEW_PRIVS]
    NNP --> CAP[drop capabilities]
    CAP --> SC[seccomp-BPF]
    SC --> SIG[signal parent]
    SIG --> EX[execve target]
    W --> A[audit JSON]
    A --> EXIT[exit]

Layer Ordering

Each layer is ordered so that a later layer can’t be undone by an earlier one:

setrlimit — cap CPU, address space, file count, processes before anything else
fd scrub — close all inherited fds except the report pipe
PR_SET_DUMPABLE=0 — core dumps disabled, /proc/self/mem locked down
pivot_root — detach from host filesystem; old root unmounted lazily
PR_SET_NO_NEW_PRIVS — no setuid, no capset escalation after this point
drop_caps — zero out all capabilities, lock securebits
seccomp-BPF — restrict syscalls to whitelist only
signal parent — tell the parent the sandbox is ready
execve — replace process with the target binary

sequenceDiagram
    participant P as Parent
    participant C as Child
    P->>C: clone (NEWNS|NEWPID|NEWNET|NEWIPC|NEWUTS)
    Note over C: setrlimit(CPU, AS, NOFILE, NPROC)
    Note over C: close(all fds > 2)
    Note over C: PR_SET_DUMPABLE=0
    Note over C: pivot_root → chdir("/") → umount -l
    Note over C: PR_SET_NO_NEW_PRIVS
    Note over C: capset(all zero) + securebits
    Note over C: seccomp(SECCOMP_MODE_FILTER, whitelist)
    C->>P: write(pipe, ready=1)
    Note over C: execve(target)
    P->>P: waitpid
    P->>P: write audit JSON

Layers

1. Truthimatics Public Version

Evidence-based verdict engine. Collects weighted observations about the executed binary and determines a final verdict (DETERMINISTIC, REJECT, or UNCERTAIN). Each observation carries a weight; any single observation with weight >50% of total decides the verdict.

2. Namespaces

Five namespaces are created via clone():

Namespace	Flag	Purpose
Mount	`CLONE_NEWNS`	Isolated filesystem tree
PID	`CLONE_NEWPID`	Process ID space (child is pid 1)
Net	`CLONE_NEWNET`	No network interfaces
IPC	`CLONE_NEWIPC`	No shared memory / semaphores
UTS	`CLONE_NEWUTS`	Separate hostname

Requires CAP_SYS_ADMIN in the initial namespace.

3. pivot_root

Replaces the mount namespace root with the --root directory:

Bind-mount the root directory onto itself (MS_BIND|MS_REC)
pivot_root(new_root, put_old) — swap the mount tree
chdir("/") — move into the new root
umount2("/.pivot_old", MNT_DETACH) — detach old root
rmdir("/.pivot_old") — clean up

This is strictly stronger than chroot(2) — there is no way for the sandboxed process to escape back to the host root, even with CLONE_NEWNS from inside the sandbox (which is already blocked by seccomp).

4. Capabilities

All capabilities are dropped via:

capset(hdr, data)  // data = {0, 0, 0}
prctl(SECBIT_KEEP_CAPS_LOCKED | SECBIT_NO_SETUID_FIXUP | ...)

The process drops setuid/setgid before capset so the uid change takes effect while CAP_SETUID is still held. After capset, all caps are gone and the securebits are locked — no re-enablement is possible.

5. NO_NEW_PRIVS

prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);

Prevents the process or its children from gaining new privileges via setuid binaries, file capabilities, or LSM transitions. Irreversible.

6. seccomp-BPF (whitelist-v1)

Allow-list of 15 syscalls — anything not on the list gets SECCOMP_RET_KILL:

Syscall	Number	Notes
`read`	0	stdin
`write`	1	stdout/stderr + report pipe
`openat`	257	file access (not `open`)
`close`	3	—
`lseek`	8	—
`brk`	12	heap management
`mmap`	9	arg-restricted: `flags & 4 == 0` (no MAP_SHARED), `flags == 0x22` (MAP_PRIVATE\|MAP_ANONYMOUS)
`munmap`	11	—
`execve`	59	single exec at startup
`exit_group`	231	clean process exit
`rt_sigaction`	13	signal handlers
`rt_sigprocmask`	14	signal masking
`getrandom`	318	random number source
`clock_gettime`	228	timing
`fstat`	5	file metadata

The BPF filter is generated dynamically: for each whitelist entry, a jump chain is emitted that either allows (if syscall matches) or falls through to KILL. Architecture is checked first (AUDIT_ARCH_X86_64).

The filter is verified independently by a standalone test (tests/seccomp_filter_test.c, 8/8 pass) that fork+execves test cases against a real prctl(PR_SET_SECCOMP) without needing root.

7. Audit

Every execution produces a JSON audit record:

{
  "schema": "z-jail.audit/v1",
  "build_id": "Z-Jail/v1+dev",
  "timestamp": 1749000000,
  "duration_ns": 8500000,
  "executable": "/bin/ls",
  "verdict": "DETERMINISTIC",
  "exit_code": 0,
  "sandbox": {
    "seccomp_filter": "whitelist-v1",
    "seccomp_whitelist_size": 15,
    "seccomp_arg_rules_size": 2,
    "namespaces": ["mount","pid","net","ipc","uts"],
    "pivot_root": "/var/run/z-jail/roots/default",
    "no_new_privs": true,
    "capabilities_dropped": true
  },
  "content_fingerprint": "0e5751c026e543b2e8ab2eb06099daa1..."
}

Written to build/audits/<binary-name>.audit.json. The content_fingerprint is a BLAKE2b-256 hash of the target binary, computed by the parent after the child finishes.

Usage

z_jail --root=<dir> [--seccomp-enforce] [--self-hash=<hex>]
       [--quiet] [--verbose] -- <program> [args...]

Flag	Description
`--root=<dir>`	Sandbox root directory (required)
`--seccomp-enforce`	Enable seccomp-BPF syscall whitelist
`--self-hash=<hex>`	Verify binary matches expected BLAKE2b-256 hash
`--quiet`	Suppress audit output
`--verbose`	Enable debug logging
`--version`	Show build ID (`Z-Jail/v1+dev`)
`--help`	Show usage and exit

Examples

# Run a static binary with all protections
sudo z_jail --root=./roots --seccomp-enforce -- bin/hello_static

# Run with binary integrity verification
sudo z_jail --root=./roots --seccomp-enforce \
  --self-hash=$(sha256sum z_jail | cut -c1-64) -- bin/program

# Quiet mode (no audit JSON)
sudo z_jail --root=./roots --quiet -- bin/program

Exit Codes

Code	Meaning
0	Child exited normally (verdict: DETERMINISTIC)
1	Child was killed by signal (verdict: REJECT)
2	Self-hash: bad hex string or file unreadable
3	Self-hash: mismatch (binary has been tampered with)
101	Child setup error (rlimit, etc.)
102	Child seccomp filter installation failed
103	Child execve failed (binary not found, no exec permission)
104	Child pivot_root failed
105	Child capability drop failed
125	Namespace creation failed (run as root? kernel support?)

Build & Install

Requirements

Linux kernel ≥ 5.4 (namespaces, seccomp-BPF, pivot_root)
GCC ≥ 11 (tested on 11.4, 13.2, 15.2)
No external libraries — just the standard C toolchain

Commands

make              # build z_jail (~130 KiB PIE binary)
make install      # install to /usr/local/bin + man page
make clean        # remove build artifacts
make dist         # create release tarball
make check        # smoke test (--version + --help)

The binary is built as a Position Independent Executable with -fstack-protector-strong, -D_FORTIFY_SOURCE=2, full RELRO, and -z now.

Compile-time Options

make CC=clang CFLAGS="-O3 -march=native"   # custom compiler/flags

Testing

Quick Test (no root)

# seccomp filter logic (8 tests)
tests/build/seccomp_filter_test

# BLAKE2b known-answer test
tests/build/blake2b_known

These don’t need root and run in under 100 ms.

Full Test Suite

make -C tests setup          # build payloads + test roots
sudo bash tests/run_tests.sh # 17 scenarios

Requires root for namespace creation. The test suite covers:

#	Scenario	Type	What it tests
0	blake2b_regress	known-answer	BLAKE2b implementation correctness
1	seccomp_filter	standalone BPF	8 sub-tests of the BPF filter logic
2	hello_static	ok	Basic static binary execution
3	hello_dynamic	ok	Dynamic binary with ld-linux + libc
4	execve_replacement	ok	execve in sandbox (blocked by seccomp)
5	fd_inherited_read	ok	stdin/stdout inherited correctly
6	mmap_bad_flags	killed	mmap with MAP_SHARED blocked
7	mmap_good_allowed	ok	mmap with MAP_PRIVATE\|ANONYMOUS allowed
8	mmap_prot_exec	killed	mmap with PROT_EXEC blocked
9	mmap_self_modify	killed	Self-modifying code blocked
10	ptrace	killed	ptrace blocked
11	socket	killed	socket creation blocked
12	chroot_escape	killed	chroot syscall blocked
13	double_chroot	killed	Double chroot blocked
14	mount_replay	killed	Mount syscall blocked
15	cpu_exhaust	killed	RLIMIT_NPROC blocks fork bomb
16	signal_parent	killed	Signal to parent blocked
17	self_hash	ok	Binary integrity verification

Performance

Numbers from WSL2 (Kali Linux, GCC 15.2.0, -O2 -g):

Metric	Value
Binary size	~130 KiB
Mean sandbox latency	~8 ms
Peak RSS	~4 MiB
Lines of code (core)	~900
Test suite runtime	~5 s

Latency breakdown (approx): clone + namespaces ~3 ms, pivot_root ~2 ms, seccomp + caps ~1 ms, execve ~1 ms, waitpid + audit ~1 ms.

Threat Model

In Scope

Arbitrary native code execution by an untrusted payload
Escape via chroot, mount, ptrace, socket, process_vm_writev
Fork bombs, CPU exhaustion (RLIMIT_CPU), memory exhaustion (RLIMIT_AS)
File descriptor leaks across execve
setuid / dynamic linker / LD_PRELOAD escalation
seccomp filter removal or capability re-enablement

Out of Scope

Kernel zero-days outside the permitted syscall surface
Hardware side channels (Spectre, Meltdown)
Co-located VM escape via shared /proc, /sys mounts
Network egress beyond what CLONE_NEWNET + blocked socket provides
Resource starvation of sibling sandboxes (needs cgroup support)

Assumptions

Host kernel is unmodified Linux ≥ 5.4
clone(CLONE_NEWNS|CLONE_NEWPID|...) succeeds (requires CAP_SYS_ADMIN)
Target binary is statically linked (or dynamic libraries are available in --root)
--self-hash=<hex> is configured in production deployments

Documentation

File	Description
`README.md`	This file
`docs/ARCHITECTURE.md`	Architecture overview
`docs/SANDBOX.md`	Layer-by-layer sandbox internals
`docs/SECCOMP.md`	seccomp-BPF whitelist design
`docs/AUDIT_SCHEMA.md`	Audit JSON schema reference
`docs/THREAT_MODEL.md`	Security assumptions and scope
`docs/BLAKE2B.md`	BLAKE2b implementation details
`docs/BENCHMARKS.md`	Performance benchmarks
`docs/BUILD.md`	Build instructions
`docs/adr/`	Architecture Decision Records (4 docs)
`man/z_jail.1`	Man page
`SECURITY.md`	Security policy and reporting
`CONTRIBUTING.md`	How to contribute
`CHANGELOG.md`	Release history
`ROADMAP.md`	Future plans
`TODO.md`	Known gaps and planned work

Roadmap

v1 (current)

7-layer defence-in-depth sandbox
BLAKE2b-256 content fingerprinting
Audit JSON output
17 test scenarios
man page, completions (bash, zsh, fish)

v2 (planned)

External seccomp policy file (JSON or BPF source)
Custom namespace flags per sandbox instance
Configurable syscall whitelist via CLI
Performance profiling hooks for CI integration
Release signing (minisign/signify)

Status

License

Axiom Public License v1.0 — see LICENSE for the full text.

Free for independent researchers and small-scale laboratories (budget ≤ $1M USD). Commercial use, government use, and reverse engineering are strictly prohibited without written authorization.

Z-Jail was built on WSL2 (Kali Linux, GCC 15.2.0), targeting Linux 5.4+. Maintained by Division-36. Report issues at the issue tracker.