@jakevin7: Sharing something interesting Maka is currently working on: letting agents automatically optimize their own system prompt, fully closed-loop, without any human intervention. Karpathy's autoresearch, AEGIS, etc. have explored similar directions—a goal-driven self-reinforcement learning system.
Summary
Maka is a local-first desktop AI workbench whose new feature allows agents to automatically optimize their own system prompts by generating variants, using Harbor container evaluation, and an acceptance policy for iterative improvement, all without human intervention.
View Cached Full Text
Cached at: 06/20/26, 04:18 PM
Let me share something interesting that Maka is currently working on: enabling agents to automatically optimize their own system prompts in a fully closed loop, without any human intervention. Similar directions have been explored by Karpathy’s autoresearch, AEGIS, etc. — a goal-oriented self-reinforcing learning system. https://github.com/Maka-Agent/maka-agent…
The full pipeline: generate N system prompt variants → evaluate in Harbor containers → WAL logging → acceptance policy filtering → update prompt → continue iterating, until manually stopped. This turns writing system prompts into an optimization problem with benchmark feedback signals.
Four consecutive PRs combined form the complete system:
● PR #67 built the foundation: a fixed prompt WAL controller. All experimental results from prompt candidates are recorded in an append-only log, allowing breakpoint resumption and avoiding wasted experiments. It also separately marks infra failures (container crashes) and benchmark failures (poor prompt performance) to prevent noise from polluting evaluation results.
● PR #68 added the prompt candidate loop: the agent automatically generates multiple system prompt variants, runs them in parallel within Harbor containers, and each variant independently executes real tasks on Terminal Bench to obtain a reward of 0/1.
● PR #69 is the acceptance policy — defining what constitutes a better prompt. It’s not enough for the agent to claim improvement; the benchmark score must actually increase for acceptance. An accepted prompt becomes the starting point for the next iteration, continuing to generate new candidates.
● PR #70 hardened the entire evaluation flow: the prompt hash is written into each trajectory, and the controller verifies that the hash matches, ensuring the agent is actually running the current prompt and not an old version. The agent itself is unaware it’s being optimized; it simply completes its tasks normally. Rewards come from an independent verifier, not self-evaluation.
Maka-Agent/maka-agent
Source: https://github.com/Maka-Agent/maka-agent
Maka
Maka is a local-first desktop AI workbench. It integrates model connections, conversations, tool permissions, file reading/writing, terminal execution, search, bot gateways, and run recovery into a single Electron application, with the goal of letting users run an observable, controllable, and persistently recoverable agent on their own machine.
This repository is under active development. This README serves two audiences:
- First-time Maka users: understand why AI must be configured first, where data is stored, and which capabilities are already available.
- Engineers continuing Maka development: quickly start, verify, and locate key packages and design documents.
What You Will See
Upon first launch of Maka, if no usable model connection exists, the first screen will guide you through AI configuration rather than presenting an empty chat box that cannot send messages. The recommended path is:
- Open
Settings → Models. - Select a real model provider, fill in an API key, or complete login for an already integrated account.
- Test the connection and choose a default model.
- Return to the main screen and start your first conversation using the quick input.
Integrated model types include:
- Overseas APIs: Anthropic, OpenAI, Google Gemini.
- Domestic APIs: DeepSeek, Moonshot, Z.AI Coding Plan, Kimi Coding Plan.
- Local models: Ollama.
- Custom gateways: OpenAI Compatible endpoint.
- Account subscription entry points: Claude Subscription, Codex Subscription, Gemini CLI, etc., are presented separately according to experimental/available status. Entry points not connected to the send chain are not disguised as usable.
Current Capabilities
Maka is not just a simple chat demo; it already has these core aspects:
- Desktop conversations: create, switch, archive, search, rename, stop, retry, regenerate, branch from turn.
- Model runtime: based on the Vercel AI SDK provider runtime, supporting model streaming output, tool calls, usage recording, error classification, and startup recovery.
- Local tools:
Read,Write,Edit,Bash,Glob,Grep. File writes and command execution go through permission policies. - First-run guidance: displays different states based on real connection status — “configure missing settings / select default connection / select default model / start conversation”.
- Settings center: models, accounts, usage statistics, daily review, local memory, voice models, open gateways, bot conversations, web search, network proxy, permissions & capabilities, health status, data & about.
- Local memory:
MEMORY.mdmanagement, manual addition, archive/restore, agent read toggle. - Web search: Tavily credential configuration, testing, and agent tool boundaries.
- Bot gateways: configuration/testing/runtime status framework for Telegram, Feishu, WeCom, WeChat iLink, Discord, DingTalk, QQ.
- Open gateways: local HTTP/SSE API, protected by token, allowing external reading of session state, events, capabilities, and health summary.
- Office document workflow: detection via local
officeclienables reading, validation, and per-use authorization for editing. - Runtime kernel:
AgentRunledger,RuntimeEventread model,ToolRuntime,ModelAdapter,RunTrace, and recovery logic.
Local & Privacy Boundaries
By default, Maka places working data in the workspace directory under Electron userData:
/workspaces/default/
llm-connections.json
credentials.json
settings.json
sessions/
Important boundaries:
- Provider connection metadata and session JSONL are stored on the local filesystem.
- Sensitive values such as API keys, OAuth tokens, bot tokens, proxy passwords, gateway tokens, Tavily keys, etc., are encrypted via Electron
safeStorageand written tocredentials.json. - The renderer does not directly access plaintext keys; Settings only displays masked status and test results.
- File reads/writes, shell commands, and dangerous operations require passing through the permission engine.
- Incognito / privacy context, memory, voice, workspace instructions, and other capabilities have separate contract documents with constraints.
Quick Start
The repository uses npm workspaces. Although pnpm-workspace.yaml exists, current scripts and lockfile are based on npm.
npm install
npm run dev
npm run dev first builds all workspaces, then launches the Electron desktop app.
If ELECTRON_SKIP_BINARY_DOWNLOAD=1 was set during dependency installation, you need to install the Electron platform binary before launching:
node node_modules/electron/install.js
Common development commands:
npm run build
npm run typecheck
npm --workspace @maka/desktop run test
npm --workspace @maka/runtime run test
npm --workspace @maka/core run test
Desktop visual and real window verification:
npm --workspace @maka/desktop run screenshots
npm --workspace @maka/desktop run screenshots:diff:stable
npm --workspace @maka/desktop run smoke:real-window
Pre-release basic checks:
npm run check:release
Optional Environment Variables
These variables only affect local development or specific capabilities:
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY | Can be used to bootstrap an Anthropic connection on first launch. |
OPENAI_API_KEY | Can be used to bootstrap an OpenAI connection on first launch. |
TAVILY_API_KEY / MAKA_TAVILY_API_KEY | Source of Tavily credentials for web search. |
MAKA_RIVE_BIN / RIVE_BIN | Specifies the rive CLI used for Rive workflows. |
MAKA_VISUAL_SMOKE_FIXTURE | Enables deterministic visual fixture; only in dev/test builds. |
Project Structure
apps/desktop/
src/main/ Electron main process, IPC, settings, OAuth, bot, gateway
src/preload/ window.maka preload bridge
src/renderer/ React desktop UI and Settings surfaces
packages/core/ Pure contracts: sessions, events, settings, permissions, model connections
packages/storage/ File-backed session, settings, connection, run-ledger stores
packages/runtime/ SessionManager, AgentRun, AI SDK runtime, tools, bots, telemetry
packages/ui/ Shared rendering components, markdown, artifacts, redaction helpers
docs/ Product, runtime, design-system, privacy and test-plan contracts
scripts/ Build hygiene, screenshot, smoke and release helpers
Runtime Architecture
The runtime has been refactored from a single large flow into clearer kernel boundaries:
SessionManager -> AgentRun -> AiSdkBackend -> ModelAdapter -> ToolRuntime -> RunTrace -> AgentRunStore
Key principles:
SessionManagerremains the public runtime API exposed to desktop, bots, and gateways.AgentRunhandles the durable run facts and startup recovery for a single turn.ToolRuntimehandles tool input validation, permissions, watchdog, abort, telemetry, artifact candidates, and error classification.ModelAdapterisolates provider stream / error / usage normalization.RunTraceis best-effort; a failed trace write must not affect user conversation.
More details can be found in:
docs/runtime-kernel.mddocs/runtime-v2-architecture-evolution.mddocs/runtime-v2-implementation-notes.md
UI & Product Quality Contract
Maka’s UI is not built ad-hoc; there are already separate design systems and test plans:
docs/design-system.md: color, density, states, motion, Settings IA, copy, and a11y contracts.docs/ui-quality-plan.md: real window, visual screenshots, interaction states, regression verification strategy.docs/full-product-test-plan.md: complete QA route from first run, settings, conversations, tools, search, bots, gateways to failure paths.
When changing UI, do not just run TypeScript. At a minimum, pair it with:
- Node:test contracts for the corresponding surface.
- Passing
check-console/check-a11y. - When necessary, supplement visual fixtures or real-window smoke tests.
Pre-contribution Checks
For typical code changes, it’s recommended to run at least:
npm run typecheck --workspaces --if-present
npm run build
git diff --check
For changes involving the desktop renderer / Settings / IPC, also run the corresponding focused suite, e.g.:
npm --workspace @maka/desktop run test -- settings-form-a11y-contract visible-copy-hygiene-contract
For changes involving runtime / storage, also run the corresponding workspace tests:
npm --workspace @maka/runtime run test
npm --workspace @maka/storage run test
Related Documentation
CHANGELOG.md: summary of unpublished changes.SECURITY.md: security boundaries and reporting methods.docs/workspace-privacy-context.md: workspace privacy context.docs/search-service-threat-model.md: search service threat model.docs/memory-threat-model.md: local memory threat model.docs/voice-threat-model.md: voice capability boundaries.docs/maka-capability-audit-v1.md: capability maturity audit and subsequent roadmap.
Similar Articles
@jakevin7: I increasingly feel that Maka is very suitable for learning Agent. For example, recently a Maka core dev raised an issue discussing DeepSeek's cache optimization. The whole process is transparent: 1 issue + 8 PRs pushed through, from usage normalization → …
A tweet and project description introducing the Maka desktop AI workbench, discussing cache optimization in Agent development, runtime engineering issues, and Maka's functional architecture as a local-first tool.
@jakevin7: Open-sourced my previous agent: maka, still under intensive development: https://github.com/jackwener/maka-agent… With so many open-source agents already out there, why build another one? - First, maka is a...
Maka is a local-first desktop AI workbench that achieves long-running execution via DAG workflows, with built-in browser automation and plugin capabilities. It is now open-sourced on GitHub.
@jakevin7: Maka has been sprinting hard in the past two days, and the most noteworthy thing is out. Autonomous Task Loop v1 is live. Previously, Maka would run an agent and be done. Now it's a persistent loop: preflight → runtime → SelfCheck…
Maka has released Autonomous Task Loop v1, enabling a persistent agent loop: preflight → runtime → SelfCheck → FeedbackObservation → Decision. It supports self-checking, budget control, and state recovery, giving Maka's desktop AI workstation the foundational ability to run ongoing tasks.
@jakevin7: Today, OpenCLI's app was completely rebuilt with a component library, which took a lot of effort. I'm also planning to have the Agent fully refactor the UI of MakeAgent—this might be the final version of the UI. https://github.com/jackwener/maka-…
Maka is a local-first desktop AI workbench built with Electron, supporting multi-model connections, tool calls, permission control, and privacy protection, along with integrated bot access, local memory, and more. The author also mentioned rebuilding the OpenCLI app and planning to refactor MakeAgent's UI.
@sitinme: Saw Karpathy open-sourced a very interesting project autoresearch, which gives a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on a single NVIDIA…
Karpathy open-sourced an experimental project, autoresearch, that lets an AI Agent automatically complete the research loop for small-scale LLM training: modify code, run experiments, evaluate results, and iterate. Humans only need to write the research plan and constraints.