@akshay_pachaar: Hermes Mixture of Agents (MoA) explained. Every agent commits to a single model, and every model has blind spots the ot…

X AI KOLs Following Tools

Summary

Hermes Agent by Nous Research introduces Mixture of Agents (MoA), allowing users to define presets that combine multiple models for consultation and a final answer model, improving performance by covering blind spots. The feature integrates seamlessly into the existing agent loop, maintaining tools, memory, and context.

Hermes Mixture of Agents (MoA) explained. Every agent commits to a single model, and every model has blind spots the others would have caught. The usual workaround is to run the same prompt through a few models by hand and reconcile the answers. It works, but it lives outside the agent, so the tools, the memory, and the session are gone the moment that detour starts. Hermes Agent by Nous Research just shipped Mixture of Agents, which folds that whole process back inside the agent. The unit you work with is a preset. Think of it as a recipe that names a few models to consult and one model to write the final answer, saved under a label you can reuse. So a preset might list GPT-5.5 and DeepSeek as the models to consult, with Opus as the one that replies. You set it up once, give it a name, and pick it later like any other model. The models you consult run first and quietly hand their analysis to the one writing the answer. That final model is the one that actually replies and makes the tool calls, now informed by several perspectives instead of one. Here is the part that makes it click. The preset shows up as a model, not as a framework to wire together. So everything that already works in Hermes keeps working. Tool calls, follow-up iterations, memory, and the same session context behave exactly as they do with a single model, because to the agent loop it is a single model. The models can come from anywhere. One preset can mix OpenAI, Anthropic, DeepSeek, and Google, and it is not capped at two. A few things follow from that design. → It composes a model instead of choosing one. Several models covering each other's blind spots can beat the strongest one on its own. → It stays cheap to run. The models you consult see a stripped-down view of the conversation, so the extra calls stay light and the main context keeps its cache. → It reaches past any single frontier model. Combining the providers already on hand assembles a composite that can outscore the best one available alone. → It is a dial, not a default. It turns on for the hard ten percent of tasks where a second opinion matters, and stays off for routine work where speed wins. Nous reports the effect on its own benchmark. A preset running Opus-4.8 over a GPT-5.5 reference scored higher than either model alone, by roughly six points and eight to eleven percent. The lesson is not that one model has to win. It is that the best answer rarely comes from a single model, and the agent should make blending them as easy as picking one. That said, if you're looking to set up Hermes, I wrote a full deep dive covering the Hermes agent's architecture, memory system, self-evolving skills, GEPA optimization, and how to set up multiple specialized agents. The article is quoted below. You can also watch my YouTube crash course on the Hermes agent: https://youtube.com/watch?v=bNp6YcKBLgY…
Original Article
View Cached Full Text

Cached at: 06/28/26, 04:04 PM

Hermes Mixture of Agents (MoA) explained.

Every agent commits to a single model, and every model has blind spots the others would have caught.

The usual workaround is to run the same prompt through a few models by hand and reconcile the answers. It works, but it lives outside the agent, so the tools, the memory, and the session are gone the moment that detour starts.

Hermes Agent by Nous Research just shipped Mixture of Agents, which folds that whole process back inside the agent.

The unit you work with is a preset. Think of it as a recipe that names a few models to consult and one model to write the final answer, saved under a label you can reuse.

So a preset might list GPT-5.5 and DeepSeek as the models to consult, with Opus as the one that replies. You set it up once, give it a name, and pick it later like any other model.

The models you consult run first and quietly hand their analysis to the one writing the answer. That final model is the one that actually replies and makes the tool calls, now informed by several perspectives instead of one.

Here is the part that makes it click. The preset shows up as a model, not as a framework to wire together.

So everything that already works in Hermes keeps working. Tool calls, follow-up iterations, memory, and the same session context behave exactly as they do with a single model, because to the agent loop it is a single model.

The models can come from anywhere. One preset can mix OpenAI, Anthropic, DeepSeek, and Google, and it is not capped at two.

A few things follow from that design.

→ It composes a model instead of choosing one. Several models covering each other’s blind spots can beat the strongest one on its own.

→ It stays cheap to run. The models you consult see a stripped-down view of the conversation, so the extra calls stay light and the main context keeps its cache.

→ It reaches past any single frontier model. Combining the providers already on hand assembles a composite that can outscore the best one available alone.

→ It is a dial, not a default. It turns on for the hard ten percent of tasks where a second opinion matters, and stays off for routine work where speed wins.

Nous reports the effect on its own benchmark. A preset running Opus-4.8 over a GPT-5.5 reference scored higher than either model alone, by roughly six points and eight to eleven percent.

The lesson is not that one model has to win. It is that the best answer rarely comes from a single model, and the agent should make blending them as easy as picking one.

That said, if you’re looking to set up Hermes, I wrote a full deep dive covering the Hermes agent’s architecture, memory system, self-evolving skills, GEPA optimization, and how to set up multiple specialized agents.

The article is quoted below.

You can also watch my YouTube crash course on the Hermes agent: https://youtube.com/watch?v=bNp6YcKBLgY…


TL;DR: Hermes 智能代理通过三层记忆系统、自我进化技能和 GAPA 技术实现“越用越好”,本文覆盖从架构到实战构建三个 24x7 代理(程序员 Neo、设计师 Pixel、深度研究员)的全部内容。

Hermes 智能代理的核心架构

所有请求都流经一个统一的 AI agent 类(script run agent.py 的一部分),通过 CLI、Telegram、批处理或 ID 进入。终端无关性(platform-agnostic)由此实现。转换层支持几乎所有模型(GPT、Gemini、本地 Ollama),通过三种 API 格式之一路由。每个任务有 90 轮硬上限,子代理共享同一预算,避免无限循环消耗 API 积分。内部运行“思考-行动-观察”的 ReAct 循环。

身份层:soul.md

soul.md 位于根 Hermes 文件夹,定义代理的角色与个性。系统提示槽位按顺序是:

  1. soul.md(身份,固定框架)
  2. 记忆
  3. 技能文件
  4. 对话历史

memory.mduser.md 作为快照被纳入系统提示(每轮均在上下文中)。soul.md 是一次性写入、随时间调整的固定框架,代理之后的所有行为都通过这个个性视角发生。

三层记忆系统

第一层:始终在上下文的小型备忘录

  • memory.md:保持在 2200 字符,存储代理关于环境、项目约定、工具、艰难学到的笔记。
  • user.md:保持在 1375 字符,存储用户个人资料(名字、沟通偏好、技能水平、想避免的事情)。

两个文件在会话开始时作为冻结快照纳入系统根目录,每轮都在上下文中。

第二层:按需搜索的 SQLite 数据库

所有对话(CLI、Telegram 等)存储到启用全文搜索的 SQLite 数据库中。可以搜索数周前的聊天记录,但需要显式搜索调用 + LLM 总结。

第三层:即插即用的外部记忆提供商

支持知识图谱、时序知识图谱等外部记忆源。集成方式可参考文档。

核心规则:关键事实存在于第一层,其他一切可搜索,所有会话存储在 SQLite 中;需要更深持久化时连接外部提供商。

技能与自我进化机制

技能是一个 markdown 文件,以 YAML 前置元数据开头,包含名称和描述。采用 渐进式技能披露 机制:

  • 零级:加载所有可用技能的 YAML 前置元数据(极小 token 消耗)。
  • 一级:代理根据描述选择合适技能,然后逐步披露技能的步骤、陷阱、验证等完整内容。
  • 二级:仅在技能引用其他内容时触发。

这样避免将所有技能全部加载到上下文,节省 token。代理可以 自我进化技能,即根据经验不断优化技能文件,这是 Hermes 区别于其他开源代理的关键特性。GAPA 技术(无需改变权重即可改进提示词)被 ICLR 2026 接收,进一步推动技能进化。

实战:构建三个 24x7 工作代理

1. Neo:云端程序员

配置:让其实时访问云端代码(如 GitHub 仓库)。当委托项目时,先创建计划、问几个问题确定规格,然后开始处理。演示中,Neo 对用户进行深度研究(职业、平台、公司等),然后构建了一个个人登录页面(index.html),包含写作、课程、GitHub 等标签链接。

2. Pixel:品牌设计师

通过自定义技能理解用户的设计风格(背景、插图风格、图标等)。只需给几个示例,它就能学会并坚持一致的品牌设计。演示中,Pixel 创建了一张解释 LLM 推理中“预填充阶段 vs 解码阶段”的手绘示意图,风格统一。

3. 深度研究员

扫描最新 GitHub 仓库、论文、AI/ML 趋势新闻,提供汇总信息。

快速上手建议

如果时间紧张,可直接跳到“入门”章节,命令可独立运行。但理解理论(技能进化、记忆组成、GAPA 何时发挥作用)能区分“把 Hermes 当作带节点的聊天工具”和“将其用作随时间累积价值的系统”。


Source: YouTube 视频链接

Similar Articles

NousResearch/hermes-agent

GitHub Trending (daily)

Hermes Agent is an open-source, self-improving AI agent framework by Nous Research featuring a closed learning loop, cross-platform deployment, and compatibility with hundreds of LLMs. It provides a terminal interface, persistent memory, automated scheduling, and research-ready tooling for scaling AI workflows.