@itsclelia: Had a blast yesterday attending at @techeurope_'s Applied AI Conference in Berlin! I had a talk about building document…
Summary
Attended the Applied AI Conference in Berlin and gave a talk on building document agents, including a detailed walkthrough of LobsterX, a document-processing agent built with LlamaIndex that uses structured outputs and event-driven workflows.
View Cached Full Text
Cached at: 05/30/26, 04:04 AM
Had a blast yesterday attending at @techeurope_’s Applied AI Conference in Berlin! I had a talk about building document agents and agentic development in general, that you can find here: https://astrabert.github.io/agent-anatomy-presentation… Beyond my talk, I met lots of builders with an incredible energy and
The Anatomy of LobsterX
Source: https://astrabert.github.io/agent-anatomy-presentation/ Applied AI Conference · Berlin, 05/28/202601 / 17
TheAnatomy ofLobsterX
a document processing agent
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
Follow Along01b / 17
Open on your device

Scan to follow along on your phone or tablet
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
Introduction02 / 17
Hi, I’mClelia
- Member of Technical Staff @LlamaIndex, where i work on agents, retrieval systems and OSS
- Background in computational biology, then slowly drifted into AI and engineering
- I build small, opinionated agents to stress-test the framework I work on
- Today: a guided tour of one of them→LobsterX🦞
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
Introduction03 / 17
What isLobsterX?
A document-processing AI agent that lives in aTelegram chat. You send it a PDF and a task; it parses, extracts, classifies, reasons, and replies asynchronously when it’s done.
~600
LOC of agent implementation
~1.5k
LOC of workflow orchestration underneath
3
swappable LLM providers (OpenAI, Anthropic, Google)
Small enough to dissect on stage. Real enough to be interesting.
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
Introduction04 / 17
Whydissectan agent?
Most agents look like a black box: prompt in, answer out. The interesting engineering lives in the gap between those two. We’ll walk through it using four anatomical metaphors:
- Brain— the LLM, steered into structured behaviour
- Loop— the event-driven workflow that drives it
- Eyes & Limbs— the filesystem and tools it touches
- Ears & Mouth— how it talks to a human
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
The Brain05 / 17
TheBrain: an LLM with a problem
- The LLM is the only decision-maker — it picks what to think, what to call, when to stop
- But LLMs arenon-deterministic: same prompt, different shapes of output
- For an agent, that’s fatal. You can’t parse “well I think maybe call the tool with...” with a regex and hope for the best
- We need a way toconstrainthe model’s output to something the rest of the system can rely on
If the brain is unreliable, every downstream step inherits that unreliability. The fix has to start at the LLM call itself.
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
The Brain06 / 17
Steering:Structured Outputs
Every LLM call in LobsterX is constrained by atyped JSON schema. The model cannot reply with free-form prose — it must fill in a known shape.
- One schema per operation: a Think looks different from an Act
- Forces a clean separation betweenreasoningandaction
- The LLM wrapper exposesonlystructured-generation methods — there is no “raw chat” escape hatch in the agent code
- Same schema works across OpenAI, Anthropic and Google providers
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
The Loop07 / 17
The Loop:Agent Workflows
LobsterX is built on**LlamaIndex Agent Workflows**: an event-driven, async-first stepwise execution engine.
- Eachstepis a typed Python function that consumes one event type and emits another
- No central orchestrator — theevent typeswire the steps together implicitly
- Async by construction: a long-running tool call doesn’t block anything else
- Loops are not special cases — they’re just steps that re-emit upstream event types
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
The Loop08 / 17
Event-DrivenExecution
InputUser prompt
→
ThinkReason about next step
→
ActCall a tool
→
ObserveProcess result
↺
StopFinal answer
Each arrow is a typed event. Observe re-enters Think until Think decides the task is done and emits Stop — at which point the workflow terminates and the answer goes back to the user.
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
World09 / 17
Three windows on theworld
The brain+loop is a generalist scaffolding. What makes LobsterX adocument agentare the three interfaces it exposes to the world.
FilesystemWhere documents live and where the agent writes its outputs
Document ToolsParse, extract and classify unstructured content via LlamaCloud
Chat InterfaceTelegram bot — async upload and notification
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
World · Filesystem10 / 17
TheEyes: a virtual filesystem
- File ops route through**AgentFS**, a virtualized layer — not the real machine FS
- The agent getsread / write / edit / grep / glob— no delete, no shell execution
- Scope is bounded to a working directory; common credential files (
\.env, and other files are excluded entirely) - Telegram-uploaded PDFs are written into AgentFS, never your real disk
If the agent is jailbroken into writing something destructive, the damage stays inside the virtual FS. Nothing leaks to the host unless you explicitly sync it.
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
World · Tools11 / 17
TheLimbs: document tools
Filesystem ops alone only see plain text. To actually understand unstructured documents, the agent calls three**LlamaCloud**tools — each with its own typed input schema.
LlamaParseFull-text parsing of PDFs, Office docs and more via OCR, VLMs and agentic pipelines
LlamaExtractSchema-driven extraction — you define the JSON shape, the tool fills it in
LlamaClassifyClassification into user-defined categories with confidence signals
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
World · Tools12 / 17
Why these toolschange the game
A document agent is only as good as its eyes on unstructured content. Generic OCR isn’t enough — layout, tables and figures all carry meaning that naive text extraction loses.
LlamaParseLayout-aware parsing for PDFs, DOCX, PPTX, XLSX, images. Tables stay as tables; figures get described by VLMs; reading order is preserved across columns.
LlamaExtractYou hand it a JSON schema, it hands you back populated objects — typed, citation-linked, validated. No glue prompt engineering on the agent side.
LlamaClassifyUser-defined categories with confidence signals — the agent uses it to route documents (invoice? contract? report?) before deciding what to do next.
Each tool exposes a typed input schema, so the Act step can call them with full structured-output guarantees end to end.
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
World · Chat13 / 17
Ears & Mouth: async by default
- Telegram was chosen specifically becausemessaging is async— no spinner, no held-open HTTP connection
- Documents come in as message attachments and land in AgentFS. Text messages are dispatched as workflow inputs
- Document workflows can take minutes to half an hour — the agent pings you back when done
- This maps cleanly onto the workflow engine, which is async-first already
The right interface for a long-running agent isn’t a chatbot — it’s a colleague who replies when they’re finished.
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
World · API Mode14 / 17
Same agent,different shell
The Telegram bot is one frontend. The same agent core also runs as a FastAPI server, with the workflow’s async-first shape carried all the way through.
Task managerIn-memory dict oftask\_id → asyncio\.Task, guarded by a lock. POST/taskspawns, GET polls, DELETE cancels.
Rate limitingPer-endpoint per-minute limits viafastapi\-throttle— uploads, creates, polls and deletes each have their own budget
Auth & CORSStarlette middlewares: bearer-token auth + configurable allowed origins
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
Recap · Safety15 / 17
A note onsafety
- Virtual filesystem— no exposure of the host FS
- No shell— the agent cannot run arbitrary bash
- Read / write / edit only— no delete primitive at all
- No skills— custom behaviour comes in via an
AGENTS\.mdfile, not via potentially unvetted instructions - Credential files excludedfrom the virtual FS — the agent can’t evenreadthem
None of this prevents prompt injection from a malicious document the agent has been asked to read. The mitigations bound the blast radius; they don’t eliminate it.
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
Recap · Anatomy16 / 17
The fullanatomy
B
The BrainLLM steered into Think / Act / Observe / Stop via structured outputs
E
The EyesAgentFS — a sandboxed virtual filesystem with bounded primitives
L
The LimbsLlamaParse, LlamaExtract, LlamaClassify — each a typed tool call
M
Ears & MouthTelegram (or FastAPI) — async-first, notification-driven
Redefine Document Workflows with AI Agents
Intro
Brain
Loop
World
Recap
Recap · Takeaways17 / 17
Keytakeaways
- Structured outputsare the single biggest lever for turning an LLM into a reliable agent component
- **Layout-aware doc tools (Parse / Extract / Classify)**are what allows the agent to really understand unstructured documents
- Event-driven workflowsgive you loops, branches and async for free
- Virtual filesystemslet you grant filesystem-style tools without the filesystem-style risk
- Async interfacesare the right shape for long-running document work
Thank you!Questions?
Intro
Brain
Loop
World
Recap
Similar Articles
@jerryjliu0: We gave a full 90 minute workshop on how to build agentic workflows over your enterprise documents at @aiDotEngineer Si…
At AI Engineer Singapore, LlamaIndex presented a 90-minute workshop on building agentic workflows to extract information from enterprise PDFs; slides will be shared soon.
@itsclelia: Do you actually own your document parsing infrastructure? At @llama_index, we wanted to make that easier, so we built �…
LlamaIndex introduces liteparse-server, an open-source, self-hosted HTTP backend for parsing PDFs, images, and Office documents with spatial layout extraction, OCR, and screenshot generation, designed for AI and data workflows.
@qianl_cs: We just published the demo video from our latest @DBOS_Inc user group! Big thanks to Adrian Lyjak from @llama_index for…
A demo video showcases the integration of DBOS with LlamaAgents, highlighting how developers can add durable workflow capabilities to their AI agent applications without rewriting code.
@Prince_Canuma: My @aiDotEngineer talk is live: "On-device Intelligence using MLX" Huge thanks to @swyx and the team for having me — ha…
The author announces their live talk titled 'On-device Intelligence using MLX' at the aiDotEngineer event, expressing gratitude to the organizers and community contributors.
@tavilyai: Berlin was geht ab, Tavily ist jetzt in town! We're here with @GradiumAI showing off our new voice integration and host…
Tavily, Gradium, Nebius, and Cursor are hosting a full-day hackathon in Berlin on May 29th focused on building autonomous AI agents that can transact and execute. The event includes tech talks, building sessions, and prizes.