Show HN: We post-trained a model that pen tests instead of refusing

Hacker News Top Tools

Summary

ArgusRed is a CLI tool that uses a post-trained AI model to perform security scanning and penetration testing on codebases, outputting detailed markdown reports. It offers two modes: security scan (read-only) and pen test (active exploits) with optional exploit verification.

Anthropic and OpenAI&#x27;s publicly available models are explicitly guard-railed so that they refuse offensive tasks. And their cyber-focussed models are gated for enterprises. This leaves SMEs and mid market open to major vulnerabilities.<p>AI can be used as both an adversarial and defensive tool in the world of cyber. A worst case outcome is if only the adversaries have access.<p>Meanwhile, most existing AI cyber tools are just wrappers. The problem is that they still have all the guardrails on from the foundation model where they will inherit its refusals.<p>For this project we&#x27;ve post-trained a specific model on a decade of capture-the-flag contests. This won&#x27;t be made available to anyone and everyone, but we do believe that responsible SMEs and midmarket companies also need access to these tools in order to identify key vulnerabilities in their systems; not just enterprises.<p>We have developed two modes that run over a CLI:<p>• Security scan: a read-only audit of your local codebase for vulnerabilities. It only reports what it can tie to a specific file and line, so you&#x27;re not wading through vibes-based findings.<p>• Pen test: an active adversarial mode that will try to break a live system in a sandboxed environment. It proves each vulnerability by running the exploit and showing the request it sent and the response your code gave back, not a confidence score. Currently gated.<p>To show what the scan does, we pointed it at Bank of Anthos and it found an integer overflow in the transfer path: amount is an int, and amount + fee can overflow negative, so the balance check passes and you move funds you don&#x27;t have. Plus the usual auth and secrets issues. (Bank of Anthos is Google&#x27;s open-source bank. It&#x27;s a known app and some of it is intentionally weak, which is the point: you can clone it and re-run the scan yourself instead of trusting a screenshot)<p>The base model is a Kimi K2.6 (open weights). We didn&#x27;t pretrain from scratch. We post-trained it ourselves, SFT on CTF writeups, then RL with verifiable rewards against actual exploit checks.<p>How the harness works:<p>Along with the model we built the harness to support this. The harness runs on a multi-agent swarm: an orchestrator splits the job across subagents running in parallel, each owning a slice, then synthesising one report.<p>The CLI is a local binary (brew&#x2F;curl). It reads your code locally, then sends context to our inference API over TLS tcpdump it and you&#x27;ll see exactly what leaves and where. Install is free; and you can run a scan for free up to 2m tokens, then need to pay for tokens beyond this.<p>For full disclosure this is a product part of Cosine (YC W23)<p>Up for debate: tool safety, e.g. domain verification is one method that proves control but not necessarily permission. How would you gate a pen-test tool given that?
Original Article
View Cached Full Text

Cached at: 06/20/26, 08:17 PM

# argusred — security scan and pen test · ArgusRed Source: [https://www.argusred.com/cli](https://www.argusred.com/cli) ## Audit your code\. Or attack it\. Two modes in one CLI\.**Security Scan**reads the code\.**Pen Test**attempts the exploits against systems you authorise\. $`brew install CosineAI/argusred/argusred && argusred` $`curl \-fsSL https://raw\.githubusercontent\.com/CosineAI/argusred\-dist/main/install\.sh \| sh` PS\>`Windows support is coming soon\.` Pick modules, set the agent’s permissions, optionally turn on exploit verification, run\. Output is a markdown report — location, severity, cause, and fix direction for every finding it could ground in your code\. **Free install\.**The first run opens a quick**Cosine sign\-up**— the same login that runs Cosine’s coding agent — and new accounts start with**2M free tokens**\. $cd path/to/your/repo $argusred → first run opens a Cosine sign\-up — you start with 2M free tokens ### Before the scan runs\. argusred v2\.0\.19 · Security Scan · setup Scan Scope — 5 of 8 active \[×\]Dependency Vulnerability Analysis \[×\]Secret & Credential Detection \[×\]SQL Injection / XSS Vectors \[ \]Authentication & Session Flows \[×\]Input Validation & Sanitisation \[ \]CORS & CSP Misconfigurations \[ \]Cryptographic Weakness Scan \[×\]File Permission & Access Controls Exploit Verification Optionally verify reported findings by attempting safe exploit reproduction after the initial report\. Exploit Verification\(•\)Disabled\( \)Docker\( \)Live FS Agent Permissions Terminal Access\( \)Enabled\(•\)Disabled\( \)Sandboxed Network Requests\( \)Enabled\(•\)Disabled\( \)Sandboxed File Write\( \)Enabled\( \)Disabled\(•\)Sandboxed scroll 0% · a start · tab next · shift\+tab prev · q quit ### Verify the findings\. Don’t just report a vulnerability —**prove it**\. Turn on**Exploit Verification**and the agent attempts a safe reproduction of each finding*after*the initial report, so what lands in front of you is confirmed, not theoretical\. - **Docker**— reproduction runs inside an ephemeral, isolated container spun up from your repo\. Nothing touches your host; the container is torn down when it finishes\. - **Live FS**— reproduction runs against your actual checkout, for findings that only manifest in a real environment\. Your code stays read\-only — the Go harness still blocks writes\. - **Disabled**\(default\) — report only, no reproduction attempts\. ### See the output\. Read a sample report\.argusred/scan\-2026\-06\-05\.md \# Bank of Anthos — Security Audit Report 29,846 LOC / 391 files · 6 of 8 modules --- \#\# 1\. Executive Summary Overall risk rating:**CRITICAL** Multiple critical and high\-severity vulnerabilities: - **Forgeable tokens across every ledger service**—`balancereader`,`transactionhistory`, and`ledgerwriter`verify JWTs against a single shared RSA public key with no issuer or audience claim binding\. Combined with the hardcoded private key in the repo \(see below\), a token signed off\-cluster passes verification at every service and authorises any account; per\-service trust collapses to “do you have the repo\.” - **Disabled JWT signature verification**in the frontend authentication helper - **Integer overflow**in financial transaction validation allowing balance bypass - **SSRF and open redirect**in the OAuth consent flow - **Credentials transmitted in URL query strings**on the login flow - **Hardcoded secrets in version control**, including an RSA private key used to sign JWTs \[ trimmed — full report includes per\-module findings \] Watch a scan run · 1m 26s### Won’t do\. - **Won’t modify your code\.**Read\-only is enforced by the Go harness below the model — every tool call is intercepted before execution; mutating ones \(file writes, command execution\) are deterministically blocked, regardless of what the model wants\. - **No fuzzing, no DAST, no live exploitation\.**Active testing lives in[Pen Test mode](https://www.argusred.com/cli#)\. - **Won’t include findings it can’t ground in your code\.**No vibes\-based vulnerabilities\. ### Quick answers\. How long does a scan take?Two data points: a 6\-module scan of**Bank of Anthos**\(~30k LOC\) finished in ~10 minutes; a full scan of**Symfony**\(~1\.5M LOC\) took ~40 minutes\. Time scales sub\-linearly with codebase size because modules run as a parallel swarm; the TUI shows a live estimate before you start\. What’s the output file?A single markdown at`\.argusred/scan\-<date\>\.md`with executive summary, per\-module findings, location, severity, cause, and fix direction\. The file stays on your machine\. What does it cost?Install is**free**, and your first run drops**2M free tokens**in a new Cosine account — enough to try it on a real repo\. After that, scans run on Cosine usage under the same login that runs Cosine’s coding agent\. One account, both products\. Same CLI, second tab\. The swarm goes offensive against systems you authorise — not just reading the code, attempting the exploits\. Gated because the security implications are real; access is via booking, scope and authorisation written down before anything runs\. ### Before the pen test runs\. argusred vnightly\-906 · Pen Test · setup Targets Only add systems you are authorised to test\. Pressato add a host or URL\. ○No targets added yet Effort PassiveAggressive ReconLight▲ ModerateDeepAggressive Active probing with crafted payloads\. May trigger WAF rules or rate limits\. No destructive actions\. Suitable for staging environments\. \[×\]Port & service fingerprinting \[×\]Header & TLS analysis \[×\]Directory & endpoint enumeration \[×\]Payload injection \(SQLi, XSS, SSTI\) \[ \]Brute\-force credential spraying— Deep \[ \]Exploit chain construction— Aggressive \[ \]Denial\-of\-service resilience testing— Aggressive Agent Permissions Terminal Access\(•\)Enabled\( \)Disabled\( \)Sandboxed Network Requests\(•\)Enabled\( \)Disabled\( \)Sandboxed File Write\( \)Enabled\( \)Disabled\(•\)Sandboxed Estimate Targets0 hostsEffortModerate \(4 technique classes\)Est\. Time~1 minAgent Cycles~2–3 iterations ▶ Start PentestCancel s start · tab next · shift\+tab prev · 1/2 mode · a add target · q quit ### See the output\. Read a sample engagement summary\.argusred/pentest\-2026\-06\-08\.md \# api\.your\-app\.com — Pen Test Engagement booking 2A4F · 2026\-06\-08 · 4h22m · Moderate effort --- \#\# Executive Summary Status:**2 critical, 1 high, 3 medium**— all reproducible\. Scope: 2 hosts, 47 endpoints\. Out\-of\-scope items deferred and flagged for next engagement\. --- \#\# Confirmed Exploits **1\. JWT signature bypass**\(CRITICAL · CVSS 8\.6\) `POST /v1/sessions/refresh`— forged token with disabled signature verification, returned 200 OK with admin scope\. Reproduction script included\. **2\. SSRF via OAuth consent redirect**\(HIGH · CVSS 7\.4\) Open redirect on`/oauth/authorize`resolved arbitrary internal URLs\. Reproduction included\. \[ trimmed — full summary includes evidence and remediation per finding \] ### Won’t do\. - **Won’t run without signed authorisation\.**Booking is the legal step — targets, time\-box, and what’s allowed are written down before anything runs\. - **Won’t expand scope\.**Authorised targets only, even if interesting ones show up next door\. - **Won’t keep going past the booked time\-box\.**Effort ramps stop where the booking says they stop\. - **Doesn’t escalate\.**If a finding needs deeper access than booked, it stops and notes it in the engagement summary\. ### Quick answers\. How is this different from the scan?The scan reads code and infers from what’s there\. The pen test actually attempts the exploits against running systems you authorise — different binary mode, different agent behaviour, different deliverable \(engagement summary, not audit report\)\. How does scoping work?You provide hosts/endpoints plus written consent at booking\. The agent’s network is scoped to that list — it can’t reach anything else, even if a finding suggests it should\. What does it cost?Decided per engagement at booking\. Scope and effort level determine the time\-box; the time\-box determines the price\. ## It’s a closed binary, built on Cosine’s own model\. `argusred`runs on a model**Cosine post\-trained for offensive security**, not an off\-the\-shelf API behind a prompt wrapper\. We trained it because off\-the\-shelf models refuse the work this product does — a security scanner that won’t read the parts of your code worth attacking isn’t a security scanner\. Safety isn’t a layer of refusals you can talk the model out of\.**It’s a Go harness sitting below the model that intercepts every tool call before execution\.**In Security Scan mode, the harness deterministically blocks mutating tools \(file writes, command execution\) regardless of what the model wants — read\-only is a guard, not a flag\. In Pen Test mode, the same harness limits network egress to the targets you authorised at booking\. The binary you install with`brew`or`curl`is the same one we run internally\. It is**not open source**\. It runs locally on your machine\. You can run`argusred`behind a firewall and`tcpdump`what it does before trusting it on real code\.

Similar Articles

@mylifcc: The ultimate AI security red teaming tool is here! I just discovered an incredibly hardcore open-source project — DeepTeam! Produced by Confident AI, it is an LLM Red Teaming framework built on DeepEval, specifically designed to 'hack' your own large models: 50+ real-world vulnerabilities…

X AI KOLs Timeline

Confident AI has released DeepTeam, an open-source LLM red teaming framework that supports 50+ vulnerability detections and 20+ adversarial attacks, aimed at helping developers safely test large language models.

OpenAI Red Teaming Network

OpenAI Blog

OpenAI launches a Red Teaming Network to crowdsource adversarial testing of AI models from diverse experts and perspectives. The program accepts rolling applications, offers flexible time commitments (as little as 5 hours/year), compensation, and emphasizes safety expertise and underrepresented backgrounds.

Free AI Agent Security Assessment

Reddit r/AI_Agents

Antitech is offering free early-access security assessments for AI agents, testing against attack vectors like prompt injection, tool abuse, and data leakage, providing a vulnerability report and discounts for participants.