@jakevin7: Sharing something interesting Maka is currently working on: letting agents automatically optimize their own system prompt, fully closed-loop, without any human intervention. Karpathy's autoresearch, AEGIS, etc. have explored similar directions—a goal-driven self-reinforcement learning system.

X AI KOLs Following 06/20/26, 03:25 PM Tools

agent system-prompt self-optimization open-source benchmark automation ai-workbench

Summary

Maka is a local-first desktop AI workbench whose new feature allows agents to automatically optimize their own system prompts by generating variants, using Harbor container evaluation, and an acceptance policy for iterative improvement, all without human intervention.

Sharing something interesting Maka is currently working on: letting agents automatically optimize their own system prompt, completely closed-loop, without any human intervention. Karpathy's autoresearch, AEGIS, etc. have explored similar directions—a goal-driven self-reinforcement learning system. https://github.com/Maka-Agent/maka-agent… The full pipeline: generate N system prompt variants → Harbor container evaluation → WAL logging → acceptance policy filtering → update prompt → continue iterating until manually stopped. This turns the task of writing system prompts into an optimization problem with benchmark feedback signals. Four consecutive PRs combined to form a complete system: ● PR #67 lays the foundation: fixed prompt WAL controller. All experimental results of prompt candidates are recorded via append-only log, resumable from breakpoints, avoiding wasted experiments. It also separates infra failures (container crashed) from benchmark failures (prompt performed poorly) to prevent noise from contaminating evaluation results. ● PR #68 adds the prompt candidate loop: the agent automatically generates multiple system prompt variants, runs them in parallel in Harbor containers, each variant independently executes real Terminal Bench tasks, and obtains a reward of 0/1. ● PR #69 is the acceptance policy—defining what constitutes a better prompt. It's not good enough that the agent itself says so; the benchmark score must actually improve for acceptance. Accepted prompts become the starting point for the next iteration, generating new candidates. ● PR #70 reinforces the entire evaluation pipeline: a prompt hash is written into each trajectory, the controller verifies that the hash matches, ensuring the agent is actually running the current round's prompt rather than an older version. The agent itself is unaware that it is being optimized; it simply completes tasks normally, rewards come from an independent verifier, and self-evaluation passing is not allowed.

Original Article

View Cached Full Text

Cached at: 06/20/26, 04:18 PM

Let me share something interesting that Maka is currently working on: enabling agents to automatically optimize their own system prompts in a fully closed loop, without any human intervention. Similar directions have been explored by Karpathy’s autoresearch, AEGIS, etc. — a goal-oriented self-reinforcing learning system. https://github.com/Maka-Agent/maka-agent…

The full pipeline: generate N system prompt variants → evaluate in Harbor containers → WAL logging → acceptance policy filtering → update prompt → continue iterating, until manually stopped. This turns writing system prompts into an optimization problem with benchmark feedback signals.

Four consecutive PRs combined form the complete system:

● PR #67 built the foundation: a fixed prompt WAL controller. All experimental results from prompt candidates are recorded in an append-only log, allowing breakpoint resumption and avoiding wasted experiments. It also separately marks infra failures (container crashes) and benchmark failures (poor prompt performance) to prevent noise from polluting evaluation results.

● PR #68 added the prompt candidate loop: the agent automatically generates multiple system prompt variants, runs them in parallel within Harbor containers, and each variant independently executes real tasks on Terminal Bench to obtain a reward of 0/1.

● PR #69 is the acceptance policy — defining what constitutes a better prompt. It’s not enough for the agent to claim improvement; the benchmark score must actually increase for acceptance. An accepted prompt becomes the starting point for the next iteration, continuing to generate new candidates.

● PR #70 hardened the entire evaluation flow: the prompt hash is written into each trajectory, and the controller verifies that the hash matches, ensuring the agent is actually running the current prompt and not an old version. The agent itself is unaware it’s being optimized; it simply completes its tasks normally. Rewards come from an independent verifier, not self-evaluation.

Maka-Agent/maka-agent

Source: https://github.com/Maka-Agent/maka-agent

Maka

Maka is a local-first desktop AI workbench. It integrates model connections, conversations, tool permissions, file reading/writing, terminal execution, search, bot gateways, and run recovery into a single Electron application, with the goal of letting users run an observable, controllable, and persistently recoverable agent on their own machine.

This repository is under active development. This README serves two audiences:

First-time Maka users: understand why AI must be configured first, where data is stored, and which capabilities are already available.
Engineers continuing Maka development: quickly start, verify, and locate key packages and design documents.

What You Will See

Upon first launch of Maka, if no usable model connection exists, the first screen will guide you through AI configuration rather than presenting an empty chat box that cannot send messages. The recommended path is:

Open Settings → Models.
Select a real model provider, fill in an API key, or complete login for an already integrated account.
Test the connection and choose a default model.
Return to the main screen and start your first conversation using the quick input.

Integrated model types include:

Overseas APIs: Anthropic, OpenAI, Google Gemini.
Domestic APIs: DeepSeek, Moonshot, Z.AI Coding Plan, Kimi Coding Plan.
Local models: Ollama.
Custom gateways: OpenAI Compatible endpoint.
Account subscription entry points: Claude Subscription, Codex Subscription, Gemini CLI, etc., are presented separately according to experimental/available status. Entry points not connected to the send chain are not disguised as usable.

Current Capabilities

Maka is not just a simple chat demo; it already has these core aspects:

Desktop conversations: create, switch, archive, search, rename, stop, retry, regenerate, branch from turn.
Model runtime: based on the Vercel AI SDK provider runtime, supporting model streaming output, tool calls, usage recording, error classification, and startup recovery.
Local tools: Read, Write, Edit, Bash, Glob, Grep. File writes and command execution go through permission policies.
First-run guidance: displays different states based on real connection status — “configure missing settings / select default connection / select default model / start conversation”.
Settings center: models, accounts, usage statistics, daily review, local memory, voice models, open gateways, bot conversations, web search, network proxy, permissions & capabilities, health status, data & about.
Local memory: MEMORY.md management, manual addition, archive/restore, agent read toggle.
Web search: Tavily credential configuration, testing, and agent tool boundaries.
Bot gateways: configuration/testing/runtime status framework for Telegram, Feishu, WeCom, WeChat iLink, Discord, DingTalk, QQ.
Open gateways: local HTTP/SSE API, protected by token, allowing external reading of session state, events, capabilities, and health summary.
Office document workflow: detection via local officecli enables reading, validation, and per-use authorization for editing.
Runtime kernel: AgentRun ledger, RuntimeEvent read model, ToolRuntime, ModelAdapter, RunTrace, and recovery logic.

Local & Privacy Boundaries

By default, Maka places working data in the workspace directory under Electron userData:

/workspaces/default/
llm-connections.json
credentials.json
settings.json
sessions/

Important boundaries:

Provider connection metadata and session JSONL are stored on the local filesystem.
Sensitive values such as API keys, OAuth tokens, bot tokens, proxy passwords, gateway tokens, Tavily keys, etc., are encrypted via Electron safeStorage and written to credentials.json.
The renderer does not directly access plaintext keys; Settings only displays masked status and test results.
File reads/writes, shell commands, and dangerous operations require passing through the permission engine.
Incognito / privacy context, memory, voice, workspace instructions, and other capabilities have separate contract documents with constraints.

Quick Start

The repository uses npm workspaces. Although pnpm-workspace.yaml exists, current scripts and lockfile are based on npm.

npm install
npm run dev

npm run dev first builds all workspaces, then launches the Electron desktop app.

If ELECTRON_SKIP_BINARY_DOWNLOAD=1 was set during dependency installation, you need to install the Electron platform binary before launching:

node node_modules/electron/install.js

Common development commands:

npm run build
npm run typecheck
npm --workspace @maka/desktop run test
npm --workspace @maka/runtime run test
npm --workspace @maka/core run test

Desktop visual and real window verification:

npm --workspace @maka/desktop run screenshots
npm --workspace @maka/desktop run screenshots:diff:stable
npm --workspace @maka/desktop run smoke:real-window

Pre-release basic checks:

npm run check:release

Optional Environment Variables

These variables only affect local development or specific capabilities:

Variable	Purpose
`ANTHROPIC_API_KEY`	Can be used to bootstrap an Anthropic connection on first launch.
`OPENAI_API_KEY`	Can be used to bootstrap an OpenAI connection on first launch.
`TAVILY_API_KEY` / `MAKA_TAVILY_API_KEY`	Source of Tavily credentials for web search.
`MAKA_RIVE_BIN` / `RIVE_BIN`	Specifies the `rive` CLI used for Rive workflows.
`MAKA_VISUAL_SMOKE_FIXTURE`	Enables deterministic visual fixture; only in dev/test builds.

Project Structure

apps/desktop/
  src/main/          Electron main process, IPC, settings, OAuth, bot, gateway
  src/preload/       window.maka preload bridge
  src/renderer/      React desktop UI and Settings surfaces
packages/core/       Pure contracts: sessions, events, settings, permissions, model connections
packages/storage/    File-backed session, settings, connection, run-ledger stores
packages/runtime/    SessionManager, AgentRun, AI SDK runtime, tools, bots, telemetry
packages/ui/         Shared rendering components, markdown, artifacts, redaction helpers
docs/                Product, runtime, design-system, privacy and test-plan contracts
scripts/             Build hygiene, screenshot, smoke and release helpers

Runtime Architecture

The runtime has been refactored from a single large flow into clearer kernel boundaries:

SessionManager -> AgentRun -> AiSdkBackend -> ModelAdapter -> ToolRuntime -> RunTrace -> AgentRunStore

Key principles:

SessionManager remains the public runtime API exposed to desktop, bots, and gateways.
AgentRun handles the durable run facts and startup recovery for a single turn.
ToolRuntime handles tool input validation, permissions, watchdog, abort, telemetry, artifact candidates, and error classification.
ModelAdapter isolates provider stream / error / usage normalization.
RunTrace is best-effort; a failed trace write must not affect user conversation.

More details can be found in:

docs/runtime-kernel.md
docs/runtime-v2-architecture-evolution.md
docs/runtime-v2-implementation-notes.md

UI & Product Quality Contract

Maka’s UI is not built ad-hoc; there are already separate design systems and test plans:

docs/design-system.md: color, density, states, motion, Settings IA, copy, and a11y contracts.
docs/ui-quality-plan.md: real window, visual screenshots, interaction states, regression verification strategy.
docs/full-product-test-plan.md: complete QA route from first run, settings, conversations, tools, search, bots, gateways to failure paths.

When changing UI, do not just run TypeScript. At a minimum, pair it with:

Node:test contracts for the corresponding surface.
Passing check-console / check-a11y.
When necessary, supplement visual fixtures or real-window smoke tests.

Pre-contribution Checks

For typical code changes, it’s recommended to run at least:

npm run typecheck --workspaces --if-present
npm run build
git diff --check

For changes involving the desktop renderer / Settings / IPC, also run the corresponding focused suite, e.g.:

npm --workspace @maka/desktop run test -- settings-form-a11y-contract visible-copy-hygiene-contract

For changes involving runtime / storage, also run the corresponding workspace tests:

npm --workspace @maka/runtime run test
npm --workspace @maka/storage run test

Maka-Agent/maka-agent

Maka

What You Will See

Current Capabilities

Local & Privacy Boundaries

Quick Start

Optional Environment Variables

Project Structure

Runtime Architecture

UI & Product Quality Contract

Pre-contribution Checks

Related Documentation

Similar Articles

@jakevin7: I increasingly feel that Maka is very suitable for learning Agent. For example, recently a Maka core dev raised an issue discussing DeepSeek's cache optimization. The whole process is transparent: 1 issue + 8 PRs pushed through, from usage normalization → …

@jakevin7: Open-sourced my previous agent: maka, still under intensive development: https://github.com/jackwener/maka-agent… With so many open-source agents already out there, why build another one? - First, maka is a...

@jakevin7: Maka has been sprinting hard in the past two days, and the most noteworthy thing is out. Autonomous Task Loop v1 is live. Previously, Maka would run an agent and be done. Now it's a persistent loop: preflight → runtime → SelfCheck…

@jakevin7: Today, OpenCLI's app was completely rebuilt with a component library, which took a lot of effort. I'm also planning to have the Agent fully refactor the UI of MakeAgent—this might be the final version of the UI. https://github.com/jackwener/maka-…

Submit Feedback

Similar Articles

@jakevin7: I increasingly feel that Maka is very suitable for learning Agent. For example, recently a Maka core dev raised an issue discussing DeepSeek's cache optimization. The whole process is transparent: 1 issue + 8 PRs pushed through, from usage normalization → …

@jakevin7: Open-sourced my previous agent: maka, still under intensive development: https://github.com/jackwener/maka-agent… With so many open-source agents already out there, why build another one? - First, maka is a...

@jakevin7: Maka has been sprinting hard in the past two days, and the most noteworthy thing is out. Autonomous Task Loop v1 is live. Previously, Maka would run an agent and be done. Now it's a persistent loop: preflight → runtime → SelfCheck…

@jakevin7: Today, OpenCLI's app was completely rebuilt with a component library, which took a lot of effort. I'm also planning to have the Agent fully refactor the UI of MakeAgent—this might be the final version of the UI. https://github.com/jackwener/maka-…

@sitinme: Saw Karpathy open-sourced a very interesting project autoresearch, which gives a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on a single NVIDIA…