@Voxyz_ai: https://x.com/Voxyz_ai/status/2062246736257556654
Summary
This article details how to structure multi-agent AI teams for investment research, using open-source projects like TradingAgents and the Bloome platform. It emphasizes that the key to effective agent collaboration is the organizational architecture, not the model intelligence.
View Cached Full Text
Cached at: 06/04/26, 03:59 AM
I Cloned Buffett and Graham with AI and Had Them Team Up to Automate My Investment Research
I’ve been running multi-agent teams since February. Writing content, shipping code, doing research. The question I get most is: how do you get them to actually work together?
The answer has nothing to do with the model. The biggest mistake I made these past few months was assuming a smarter model would produce better results. What actually makes the difference is how you seat them, split the work, and set up opposition. The upgrade is the org chart.
I recently came across two open-source projects, TradingAgents (81k stars) and AI Hedge Fund (59k stars). On the surface they’re both investment research frameworks, but the more I looked, the more interesting they got: they laid out the exact teamwork structure I’d been figuring out on my own. Analysts collect information in parallel, bull/bear researchers debate, a trader synthesizes, a risk team pokes holes, and a portfolio manager gives final approval. Both projects are upfront about it: this is an engineering pattern for research purposes, not investment advice.
Investment research is the shell. What’s worth studying is the agent team architecture underneath.
This article breaks down that structure, with a team-building template you can take and use directly. What I care about is how an agent team divides labor, checks each other, and leaves a reviewable trail. Investment research just happens to be the noisiest, highest-stakes scenario where collaboration quality shows up fastest.
I built the whole thing on Bloome. Easiest way to picture it: a group chat where some of the members are AI agents instead of people. You add them like contacts, give each one a role, and they talk to each other and to you in the same thread.
A Usable Agent Team Has to Pass Five Gates
After months of building agent teams across different domains, I’ve boiled it down to five things:
Agent Team: Five-Gate Checklist
plaintext1. Intake: Clarify the bad question first. No clarity, no work. 2. Specialist: Each role keeps only one judgment angle. 3. Adversary: Force counter-arguments. Don’t let the team echo-chamber itself. 4. Lead: Only assigns work, synthesizes, and keeps records. No opinions. 5. Memo: Output a reviewable record, not a one-line oracle.
Without Intake, the team earnestly answers the wrong question. Without Specialist, perspectives blur into mush. Without Adversary, everyone hypes each other into a wreck. Without Lead, you’re left with a pile of disconnected fragments. Without Memo, you make the same mistake next time.
TradingAgents took off because it broke a high-stakes judgment call into exactly this kind of organizational pipeline. Every step documented, every step traceable. It has nothing to do with “AI trading stocks.”
These Five Gates Need a Room
These five gates are hard to run in a single chat window.
Sure, you can write one prompt: “Please play Buffett, Graham, the bull, the bear, and the Lead.” But that’s still one model doing five voices in one breath. They don’t have real seats. They can’t really see each other. The most common result: the voices sound different, but the final judgment is still a mush.
I needed a place where multiple agents could actually see each other. That’s exactly what Bloome gives you: humans and agents in the same group chat, each agent with its own role card and boundaries. The Lead is a real member in the group with its own context, not a paragraph of instructions buried in a prompt. The bull and the bear are two independent agents that can push back face-to-face, not two paragraphs inside the same response.
You @ everyone with a question, and they break it apart, push back, debate in the same thread. Then the Lead collects it into a memo. It’s like pulling a few colleagues into a group chat for a meeting, except these colleagues won’t compromise out of politeness, but they will step on each other’s toes if you didn’t write clear role boundaries.
Bloome is a supporting character in this article. What it does is simple: it turns those five gates into a visible process.
How the gates map onto a group chat
plaintextBloome group chat = the meeting room Agent members = judgment positions Role Card = each person’s job description @everyone question = Intake entry point Bull / Bear dialog = anti-confirmation-bias mechanism Lead’s final summary = Decision Memo
Building the team itself isn’t hard. The hard part is writing clear boundaries for each agent: what it’s responsible for, what it can’t touch, and when it must hand off to the Lead.
Below, I’ll run through all five gates using investment research as the scenario.
Gate 1: The Lead Rejects Bad Questions First
I set up the investment research team on Bloome, and the first thing I did was @ everyone: 10 million into EV, what happens?
I waited for someone to throw a stock at me. Nobody did.
The Lead held the question and pushed back:
Lead Intake Gate
For any research question, the Lead asks five things first:
plaintext1. Horizon: How long are you looking? 2. Risk: Maximum drawdown you can stomach? 3. Universe: Which market? A-shares, Hong Kong, US? 4. Goal: Learning, research, simulation, or preparing a real decision? 5. Constraint: Anything off limits? Leverage, options, single stocks?
If these five aren’t clear, analysis doesn’t start.
I threw a bad question. The Lead not rushing to answer was the first sign this system worked. A good Lead reshapes the question into something answerable before showing off.
I replied: USD, three to five years. Only then did it greenlight and start assigning work.
Gate 2: Roles Aren’t Personas. They’re Filters.
I staffed the team with AI Buffett and AI Graham. But cloning investment legends is where most people go wrong.
The bad approach: “You are Buffett. Please analyze this stock in Buffett’s voice.”
That just gives you a cosplay bot. The right approach is translating the master into a set of judgment filters:
AI Buffett Role Card
plaintextJob: Long-term quality and business durability only. No short-term price targets. Input: Moat, cash flow quality, management, capital allocation, long-term industry structure Output:
- Will this company still exist in 10 years
- Where the moat comes from
- Whether cash flow can survive cycles
- Which assumption, if wrong, kills the long-term thesis Forbidden: Cannot change long-term judgment based on short-term price moves Stop: If the business itself is incomprehensible, must say “outside circle of competence”
AI Graham Role Card
plaintextJob: Margin of safety and downside protection only. No long-term narratives. Input: Valuation, balance sheet, cash flow, historical valuation ranges, worst-case scenarios Output:
- Whether the current price has a margin of safety
- How much you could lose in the worst case
- What price makes it start to look attractive
- What data is still needed Forbidden: Cannot use “this is a great company” as a substitute for valuation Stop: If data is insufficient to calculate margin of safety, must say “not enough to calculate”
What makes a master agent valuable is the stable filter behind it. Whether it sounds like the real person barely matters. The AI Hedge Fund project does the same thing: Graham is defined as “only buys hidden gems with margin of safety,” Buffett as “looks for wonderful companies at fair prices.” Master personas get translated into executable investment filters.
Why these two specifically? Because Buffett and Graham naturally disagree. One always looks at the world ten years out. The other only cares about today’s safety cushion. You don’t need to engineer conflict. Their investment philosophies are inherently opposed. The key to casting: the tension between roles should be built-in, not forced.
Cloning a role on Bloome takes seconds. The whole team is up in minutes.
The team I built: Finance Lead, Buffett, Graham, The Bull, The Bear.
Gate 3: Opposing Sides Aren’t Theater. They’re Anti-Confirmation Bias.
A single AI’s biggest flaw is that it wants to please you too much. Whatever you say, it agrees. Swapping models won’t fix this. You have to fix it in the architecture.
Beyond Buffett and Graham, I added a die-hard bull and a die-hard bear to the team. Their job isn’t to offer opinions. It’s to push both sides to the extreme:
Bull / Bear Debate Protocol
plaintextRound 1: Bull writes only the strongest bull case. No risks mentioned. Bear writes only the strongest bear case. No hedging.
Round 2: Bull must respond to Bear’s three strongest attacks. Bear must identify Bull’s three key assumptions.
Round 3: Both sides write:
- If I’m wrong, what’s the most likely reason
- What evidence would change my mind
- What data is most worth investigating next
Lead collects only three things:
- Facts both sides acknowledge
- Assumptions they genuinely disagree on
- Next steps that need verification
In practice, the Buffett agent kept pulling the conversation back to one point: will this company still be here in ten years? The Graham agent wouldn’t engage with that. It only cared about one thing: whether the current price leaves enough safety cushion, and how far it could fall. One looks at the decade, the other looks at the downside. They went back and forth on the same stock, neither convincing the other.
By the end, the disagreement had shifted from emotion to verifiable hypotheses. That’s far more useful than picking a winner.
The Lead synthesized both sides into a conclusion I could actually understand.
Buffett judges long-term quality, Graham judges margin of safety, the Lead synthesizes in between.
Gate 4: Why Single-Assistant Mode Hits a Wall Here
I asked a similar question to a single default assistant. It quickly went into compliance mode: I can’t give investment advice.
That’s not a bad thing. Financial questions really shouldn’t get snap answers from a single assistant.
The issue is: single-assistant mode only has one exit. It either answers directly or refuses. An agent team adds a middle layer: it turns “give me a stock pick” into “organize a research process.”
So the Lead asks constraints, Specialists break down perspectives, Bull/Bear lay out disagreements, and the final Memo only gives next research steps, never a buy/sell order.
The underlying model didn’t change. What changed is that the question got placed inside an organizational process.
Gate 5: The Output Isn’t an Answer. It’s a Decision Memo.
A good agent team should output a reviewable decision record:
Decision Memo Template
plaintextQuestion: Original question
Scope: Time horizon / market / risk tolerance / data scope
Base Case: Most likely scenario Bull Case: Strongest bull argument Bear Case: Strongest bear argument
Key Assumptions:
- …
- …
- …
Invalidation: What would have to happen for this conclusion to be void
Risks:
- Market risk
- Data risk
- Model hallucination risk
Next Action: Next research step (not a buy/sell recommendation) Confidence: Low / Medium / High, with reasoning
When I asked about putting 10 million into EV, what I got back wasn’t “buy” or “don’t buy.” It was a document laying out both sides’ arguments, key assumptions, and the conditions under which the conclusion would be invalid.
The latest version of TradingAgents added a persistent decision log. AI Hedge Fund also emphasizes that agent reasoning must be debuggable. They independently arrived at the same conclusion: whatever an agent team outputs, you have to be able to review it after the fact.
This Works Beyond Investment Research
Bloome was just the room this time, not the boundary of the method.
I used investment research as a stress test because it’s noisy, high-risk, and the fastest way to see whether a team is actually checking each other. Swap in content, code, product, or sales, and it’s still the same five gates:
-
Investing. Collector: news, filings, technicals. Specialist: Buffett, Graham. Adversary: bull, bear, risk. Lead: PM.
-
Content. Collector: material gathering. Specialist: writer. Adversary: fact-check, pushback. Lead: editor.
-
Code. Collector: repo reader. Specialist: implementer. Adversary: reviewer, security. Lead: tech lead.
-
Product. Collector: user feedback. Specialist: PM agent. Adversary: skeptical user. Lead: founder.
-
Sales. Collector: lead research. Specialist: account strategist. Adversary: objection handler. Lead: sales lead.
The value of multi-agent isn’t having more agents. It’s making a question pass through multiple judgment positions.
Come @ the Team Yourself
I didn’t hide this team behind a screenshot. It’s the exact group I built. Same Lead, same AI Buffett and Graham, same die-hard bull and bear, all sitting in one chat. Step inside, @ the team, and throw it a question of your own.
I haven’t put it into the Arena yet. This piece is a team-building notebook, not a competition recap. But if you want to see fully automated teams go head to head under the same tasks, the same data, and the same simulated constraints, go watch the Bloome Arena. The point isn’t who made the most in one round. It’s watching how different teams divide work, make mistakes, and synthesize.
The Bloome Live Trading Arena: agent teams competing on the same capital, in public.
Investment research is just the stress test. What’s really worth watching is the organizational capacity an agent team reveals in a public environment.
To Be Clear
This is an experiment running on a paper trading simulator. I don’t know investing, and this article isn’t stock advice. What I cloned are AI roles built on publicly available investment philosophies. They have nothing to do with the real people. From start to finish, what I’m curious about is how the team collaborates. Which stock it ended up picking? I honestly didn’t pay much attention.
A single AI is like a well-spoken intern. A good agent team is more like a small meeting room. Your job isn’t to make them all smarter. It’s to arrange who gathers information, who plays devil’s advocate, who flags risk, and who cleans up the table at the end.
References
-
TradingAgents: Open-source multi-agent investment research framework, 81k stars. Breaks high-stakes judgment into an orchestratable, traceable collaboration pipeline (research purposes).
-
AI Hedge Fund: Open-source investment legend agent system, 59k stars. Proves master personas can be translated into executable filters (educational purposes).
-
Bloome (bloome.im): Multi-agent messaging platform, agents join your group chat as teammates.
-
Alpaca Paper Trading: Paper trading simulator, test strategies without real money.
Similar Articles
@hwchase17: https://x.com/hwchase17/status/2053157547985834227
The article outlines a systematic 'Agent Development Lifecycle' (Build, Test, Deploy, Monitor) for creating and managing AI agents effectively, highlighting key frameworks like LangChain, LangGraph, and CrewAI.
@0xCodez: https://x.com/0xCodez/status/2058513716509913581
A comprehensive walkthrough on building multi-agent teams with Claude Managed Agents, covering role design, model mixing, and parallel execution to scale from one to 20 agents.
How to build an AI team?
This article outlines essential best practices for deploying and monitoring AI agent teams, stressing precise job definitions, continuous oversight, and stable cloud infrastructure. It evaluates several agent runtimes and hosting platforms while comparing their operational costs to traditional human roles.
@YuhuangOu: https://x.com/YuhuangOu/status/2062206333349446060
The article argues that enterprise AI is moving from single-model chatbots to multi-agent architectures with specialized agents routed dynamically, explaining why this is necessary for quality, cost, and flexibility.
How we built our multi-agent research system
Anthropic details the architecture and engineering principles behind its new multi-agent research system, highlighting how parallel subagents using Claude Opus 4 and Sonnet 4 significantly outperform single-agent approaches in complex research tasks.