@itsclelia: Had a blast yesterday attending at @techeurope_'s Applied AI Conference in Berlin! I had a talk about building document…

X AI KOLs Following 05/29/26, 08:25 AM Events

Summary

Attended the Applied AI Conference in Berlin and gave a talk on building document agents, including a detailed walkthrough of LobsterX, a document-processing agent built with LlamaIndex that uses structured outputs and event-driven workflows.

Had a blast yesterday attending at @techeurope_'s Applied AI Conference in Berlin! I had a talk about building document agents and agentic development in general, that you can find here: https://astrabert.github.io/agent-anatomy-presentation… Beyond my talk, I met lots of builders with an incredible energy and

Original Article

View Cached Full Text

Cached at: 05/30/26, 04:04 AM

Had a blast yesterday attending at @techeurope_’s Applied AI Conference in Berlin! I had a talk about building document agents and agentic development in general, that you can find here: https://astrabert.github.io/agent-anatomy-presentation… Beyond my talk, I met lots of builders with an incredible energy and

The Anatomy of LobsterX

Source: https://astrabert.github.io/agent-anatomy-presentation/ Applied AI Conference · Berlin, 05/28/202601 / 17

TheAnatomy ofLobsterX

a document processing agent

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Follow Along01b / 17

Open on your device

QR code

Scan to follow along on your phone or tablet

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Introduction02 / 17

Hi, I’mClelia

Member of Technical Staff @LlamaIndex, where i work on agents, retrieval systems and OSS
Background in computational biology, then slowly drifted into AI and engineering
I build small, opinionated agents to stress-test the framework I work on
Today: a guided tour of one of them→LobsterX🦞

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Introduction03 / 17

What isLobsterX?

A document-processing AI agent that lives in aTelegram chat. You send it a PDF and a task; it parses, extracts, classifies, reasons, and replies asynchronously when it’s done.

~600

LOC of agent implementation

~1.5k

LOC of workflow orchestration underneath

swappable LLM providers (OpenAI, Anthropic, Google)

Small enough to dissect on stage. Real enough to be interesting.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Introduction04 / 17

Whydissectan agent?

Most agents look like a black box: prompt in, answer out. The interesting engineering lives in the gap between those two. We’ll walk through it using four anatomical metaphors:

Brain— the LLM, steered into structured behaviour
Loop— the event-driven workflow that drives it
Eyes & Limbs— the filesystem and tools it touches
Ears & Mouth— how it talks to a human

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

The Brain05 / 17

TheBrain: an LLM with a problem

The LLM is the only decision-maker — it picks what to think, what to call, when to stop
But LLMs arenon-deterministic: same prompt, different shapes of output
For an agent, that’s fatal. You can’t parse “well I think maybe call the tool with...” with a regex and hope for the best
We need a way toconstrainthe model’s output to something the rest of the system can rely on

If the brain is unreliable, every downstream step inherits that unreliability. The fix has to start at the LLM call itself.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

The Brain06 / 17

Steering:Structured Outputs

Every LLM call in LobsterX is constrained by atyped JSON schema. The model cannot reply with free-form prose — it must fill in a known shape.

One schema per operation: a Think looks different from an Act
Forces a clean separation betweenreasoningandaction
The LLM wrapper exposesonlystructured-generation methods — there is no “raw chat” escape hatch in the agent code
Same schema works across OpenAI, Anthropic and Google providers

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

The Loop07 / 17

The Loop:Agent Workflows

LobsterX is built on**LlamaIndex Agent Workflows**: an event-driven, async-first stepwise execution engine.

Eachstepis a typed Python function that consumes one event type and emits another
No central orchestrator — theevent typeswire the steps together implicitly
Async by construction: a long-running tool call doesn’t block anything else
Loops are not special cases — they’re just steps that re-emit upstream event types

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

The Loop08 / 17

Event-DrivenExecution

InputUser prompt

→

ThinkReason about next step

→

ActCall a tool

→

ObserveProcess result

↺

StopFinal answer

Each arrow is a typed event. Observe re-enters Think until Think decides the task is done and emits Stop — at which point the workflow terminates and the answer goes back to the user.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World09 / 17

Three windows on theworld

The brain+loop is a generalist scaffolding. What makes LobsterX adocument agentare the three interfaces it exposes to the world.

FilesystemWhere documents live and where the agent writes its outputs

Document ToolsParse, extract and classify unstructured content via LlamaCloud

Chat InterfaceTelegram bot — async upload and notification

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · Filesystem10 / 17

TheEyes: a virtual filesystem

File ops route through**AgentFS**, a virtualized layer — not the real machine FS
The agent getsread / write / edit / grep / glob— no delete, no shell execution
Scope is bounded to a working directory; common credential files (\.env, and other files are excluded entirely)
Telegram-uploaded PDFs are written into AgentFS, never your real disk

If the agent is jailbroken into writing something destructive, the damage stays inside the virtual FS. Nothing leaks to the host unless you explicitly sync it.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · Tools11 / 17

TheLimbs: document tools

Filesystem ops alone only see plain text. To actually understand unstructured documents, the agent calls three**LlamaCloud**tools — each with its own typed input schema.

LlamaParseFull-text parsing of PDFs, Office docs and more via OCR, VLMs and agentic pipelines

LlamaExtractSchema-driven extraction — you define the JSON shape, the tool fills it in

LlamaClassifyClassification into user-defined categories with confidence signals

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · Tools12 / 17

Why these toolschange the game

A document agent is only as good as its eyes on unstructured content. Generic OCR isn’t enough — layout, tables and figures all carry meaning that naive text extraction loses.

LlamaParseLayout-aware parsing for PDFs, DOCX, PPTX, XLSX, images. Tables stay as tables; figures get described by VLMs; reading order is preserved across columns.

LlamaExtractYou hand it a JSON schema, it hands you back populated objects — typed, citation-linked, validated. No glue prompt engineering on the agent side.

LlamaClassifyUser-defined categories with confidence signals — the agent uses it to route documents (invoice? contract? report?) before deciding what to do next.

Each tool exposes a typed input schema, so the Act step can call them with full structured-output guarantees end to end.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · Chat13 / 17

Ears & Mouth: async by default

Telegram was chosen specifically becausemessaging is async— no spinner, no held-open HTTP connection
Documents come in as message attachments and land in AgentFS. Text messages are dispatched as workflow inputs
Document workflows can take minutes to half an hour — the agent pings you back when done
This maps cleanly onto the workflow engine, which is async-first already

The right interface for a long-running agent isn’t a chatbot — it’s a colleague who replies when they’re finished.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · API Mode14 / 17

Same agent,different shell

The Telegram bot is one frontend. The same agent core also runs as a FastAPI server, with the workflow’s async-first shape carried all the way through.

Task managerIn-memory dict oftask\_id → asyncio\.Task, guarded by a lock. POST/taskspawns, GET polls, DELETE cancels.

Rate limitingPer-endpoint per-minute limits viafastapi\-throttle— uploads, creates, polls and deletes each have their own budget

Auth & CORSStarlette middlewares: bearer-token auth + configurable allowed origins

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Recap · Safety15 / 17

A note onsafety

Virtual filesystem— no exposure of the host FS
No shell— the agent cannot run arbitrary bash
Read / write / edit only— no delete primitive at all
No skills— custom behaviour comes in via anAGENTS\.mdfile, not via potentially unvetted instructions
Credential files excludedfrom the virtual FS — the agent can’t evenreadthem

None of this prevents prompt injection from a malicious document the agent has been asked to read. The mitigations bound the blast radius; they don’t eliminate it.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Recap · Anatomy16 / 17

The fullanatomy

The BrainLLM steered into Think / Act / Observe / Stop via structured outputs

The EyesAgentFS — a sandboxed virtual filesystem with bounded primitives

The LimbsLlamaParse, LlamaExtract, LlamaClassify — each a typed tool call

Ears & MouthTelegram (or FastAPI) — async-first, notification-driven

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Recap · Takeaways17 / 17

Keytakeaways

Structured outputsare the single biggest lever for turning an LLM into a reliable agent component
**Layout-aware doc tools (Parse / Extract / Classify)**are what allows the agent to really understand unstructured documents
Event-driven workflowsgive you loops, branches and async for free
Virtual filesystemslet you grant filesystem-style tools without the filesystem-style risk
Async interfacesare the right shape for long-running document work

Thank you!Questions?

Intro

Brain

Loop

World

Recap

@itsclelia: Had a blast yesterday attending at @techeurope_'s Applied AI Conference in Berlin! I had a talk about building document…

The Anatomy of LobsterX

TheAnatomy ofLobsterX

Open on your device

Hi, I’mClelia

What isLobsterX?

Whydissectan agent?

TheBrain: an LLM with a problem

Steering:Structured Outputs

The Loop:Agent Workflows

Event-DrivenExecution

Three windows on theworld

TheEyes: a virtual filesystem

TheLimbs: document tools

Why these toolschange the game

Ears & Mouth: async by default

Same agent,different shell

A note onsafety

The fullanatomy

Keytakeaways

Similar Articles

@jerryjliu0: We gave a full 90 minute workshop on how to build agentic workflows over your enterprise documents at @aiDotEngineer Si…

@itsclelia: Do you actually own your document parsing infrastructure? At @llama_index, we wanted to make that easier, so we built �…

@qianl_cs: We just published the demo video from our latest @DBOS_Inc user group! Big thanks to Adrian Lyjak from @llama_index for…

@Prince_Canuma: My @aiDotEngineer talk is live: "On-device Intelligence using MLX" Huge thanks to @swyx and the team for having me — ha…

@tavilyai: Berlin was geht ab, Tavily ist jetzt in town! We're here with @GradiumAI showing off our new voice integration and host…

Submit Feedback

Similar Articles

@jerryjliu0: We gave a full 90 minute workshop on how to build agentic workflows over your enterprise documents at @aiDotEngineer Si…

@itsclelia: Do you actually own your document parsing infrastructure? At @llama_index, we wanted to make that easier, so we built �…

@qianl_cs: We just published the demo video from our latest @DBOS_Inc user group! Big thanks to Adrian Lyjak from @llama_index for…

@Prince_Canuma: My @aiDotEngineer talk is live: "On-device Intelligence using MLX" Huge thanks to @swyx and the team for having me — ha…

@tavilyai: Berlin was geht ab, Tavily ist jetzt in town! We're here with @GradiumAI showing off our new voice integration and host…