@itsclelia: Had a blast yesterday attending at @techeurope_'s Applied AI Conference in Berlin! I had a talk about building document…

X AI KOLs Following Events

Summary

Attended the Applied AI Conference in Berlin and gave a talk on building document agents, including a detailed walkthrough of LobsterX, a document-processing agent built with LlamaIndex that uses structured outputs and event-driven workflows.

Had a blast yesterday attending at @techeurope_'s Applied AI Conference in Berlin! I had a talk about building document agents and agentic development in general, that you can find here: https://astrabert.github.io/agent-anatomy-presentation… Beyond my talk, I met lots of builders with an incredible energy and
Original Article
View Cached Full Text

Cached at: 05/30/26, 04:04 AM

Had a blast yesterday attending at @techeurope_’s Applied AI Conference in Berlin! I had a talk about building document agents and agentic development in general, that you can find here: https://astrabert.github.io/agent-anatomy-presentation… Beyond my talk, I met lots of builders with an incredible energy and


The Anatomy of LobsterX

Source: https://astrabert.github.io/agent-anatomy-presentation/ Applied AI Conference · Berlin, 05/28/202601 / 17

TheAnatomy ofLobsterX

a document processing agent

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Follow Along01b / 17

Open on your device

QR code

Scan to follow along on your phone or tablet

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Introduction02 / 17

Hi, I’mClelia

  • Member of Technical Staff @LlamaIndex, where i work on agents, retrieval systems and OSS
  • Background in computational biology, then slowly drifted into AI and engineering
  • I build small, opinionated agents to stress-test the framework I work on
  • Today: a guided tour of one of them→LobsterX🦞

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Introduction03 / 17

What isLobsterX?

A document-processing AI agent that lives in aTelegram chat. You send it a PDF and a task; it parses, extracts, classifies, reasons, and replies asynchronously when it’s done.

~600

LOC of agent implementation

~1.5k

LOC of workflow orchestration underneath

3

swappable LLM providers (OpenAI, Anthropic, Google)

Small enough to dissect on stage. Real enough to be interesting.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Introduction04 / 17

Whydissectan agent?

Most agents look like a black box: prompt in, answer out. The interesting engineering lives in the gap between those two. We’ll walk through it using four anatomical metaphors:

  • Brain— the LLM, steered into structured behaviour
  • Loop— the event-driven workflow that drives it
  • Eyes & Limbs— the filesystem and tools it touches
  • Ears & Mouth— how it talks to a human

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

The Brain05 / 17

TheBrain: an LLM with a problem

  • The LLM is the only decision-maker — it picks what to think, what to call, when to stop
  • But LLMs arenon-deterministic: same prompt, different shapes of output
  • For an agent, that’s fatal. You can’t parse “well I think maybe call the tool with...” with a regex and hope for the best
  • We need a way toconstrainthe model’s output to something the rest of the system can rely on

If the brain is unreliable, every downstream step inherits that unreliability. The fix has to start at the LLM call itself.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

The Brain06 / 17

Steering:Structured Outputs

Every LLM call in LobsterX is constrained by atyped JSON schema. The model cannot reply with free-form prose — it must fill in a known shape.

  • One schema per operation: a Think looks different from an Act
  • Forces a clean separation betweenreasoningandaction
  • The LLM wrapper exposesonlystructured-generation methods — there is no “raw chat” escape hatch in the agent code
  • Same schema works across OpenAI, Anthropic and Google providers

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

The Loop07 / 17

The Loop:Agent Workflows

LobsterX is built on**LlamaIndex Agent Workflows**: an event-driven, async-first stepwise execution engine.

  • Eachstepis a typed Python function that consumes one event type and emits another
  • No central orchestrator — theevent typeswire the steps together implicitly
  • Async by construction: a long-running tool call doesn’t block anything else
  • Loops are not special cases — they’re just steps that re-emit upstream event types

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

The Loop08 / 17

Event-DrivenExecution

InputUser prompt

ThinkReason about next step

ActCall a tool

ObserveProcess result

StopFinal answer

Each arrow is a typed event. Observe re-enters Think until Think decides the task is done and emits Stop — at which point the workflow terminates and the answer goes back to the user.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World09 / 17

Three windows on theworld

The brain+loop is a generalist scaffolding. What makes LobsterX adocument agentare the three interfaces it exposes to the world.

FilesystemWhere documents live and where the agent writes its outputs

Document ToolsParse, extract and classify unstructured content via LlamaCloud

Chat InterfaceTelegram bot — async upload and notification

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · Filesystem10 / 17

TheEyes: a virtual filesystem

  • File ops route through**AgentFS**, a virtualized layer — not the real machine FS
  • The agent getsread / write / edit / grep / glob— no delete, no shell execution
  • Scope is bounded to a working directory; common credential files (\.env, and other files are excluded entirely)
  • Telegram-uploaded PDFs are written into AgentFS, never your real disk

If the agent is jailbroken into writing something destructive, the damage stays inside the virtual FS. Nothing leaks to the host unless you explicitly sync it.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · Tools11 / 17

TheLimbs: document tools

Filesystem ops alone only see plain text. To actually understand unstructured documents, the agent calls three**LlamaCloud**tools — each with its own typed input schema.

LlamaParseFull-text parsing of PDFs, Office docs and more via OCR, VLMs and agentic pipelines

LlamaExtractSchema-driven extraction — you define the JSON shape, the tool fills it in

LlamaClassifyClassification into user-defined categories with confidence signals

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · Tools12 / 17

Why these toolschange the game

A document agent is only as good as its eyes on unstructured content. Generic OCR isn’t enough — layout, tables and figures all carry meaning that naive text extraction loses.

LlamaParseLayout-aware parsing for PDFs, DOCX, PPTX, XLSX, images. Tables stay as tables; figures get described by VLMs; reading order is preserved across columns.

LlamaExtractYou hand it a JSON schema, it hands you back populated objects — typed, citation-linked, validated. No glue prompt engineering on the agent side.

LlamaClassifyUser-defined categories with confidence signals — the agent uses it to route documents (invoice? contract? report?) before deciding what to do next.

Each tool exposes a typed input schema, so the Act step can call them with full structured-output guarantees end to end.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · Chat13 / 17

Ears & Mouth: async by default

  • Telegram was chosen specifically becausemessaging is async— no spinner, no held-open HTTP connection
  • Documents come in as message attachments and land in AgentFS. Text messages are dispatched as workflow inputs
  • Document workflows can take minutes to half an hour — the agent pings you back when done
  • This maps cleanly onto the workflow engine, which is async-first already

The right interface for a long-running agent isn’t a chatbot — it’s a colleague who replies when they’re finished.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

World · API Mode14 / 17

Same agent,different shell

The Telegram bot is one frontend. The same agent core also runs as a FastAPI server, with the workflow’s async-first shape carried all the way through.

Task managerIn-memory dict oftask\_id → asyncio\.Task, guarded by a lock. POST/taskspawns, GET polls, DELETE cancels.

Rate limitingPer-endpoint per-minute limits viafastapi\-throttle— uploads, creates, polls and deletes each have their own budget

Auth & CORSStarlette middlewares: bearer-token auth + configurable allowed origins

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Recap · Safety15 / 17

A note onsafety

  • Virtual filesystem— no exposure of the host FS
  • No shell— the agent cannot run arbitrary bash
  • Read / write / edit only— no delete primitive at all
  • No skills— custom behaviour comes in via anAGENTS\.mdfile, not via potentially unvetted instructions
  • Credential files excludedfrom the virtual FS — the agent can’t evenreadthem

None of this prevents prompt injection from a malicious document the agent has been asked to read. The mitigations bound the blast radius; they don’t eliminate it.

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Recap · Anatomy16 / 17

The fullanatomy

B

The BrainLLM steered into Think / Act / Observe / Stop via structured outputs

E

The EyesAgentFS — a sandboxed virtual filesystem with bounded primitives

L

The LimbsLlamaParse, LlamaExtract, LlamaClassify — each a typed tool call

M

Ears & MouthTelegram (or FastAPI) — async-first, notification-driven

Redefine Document Workflows with AI Agents

Intro

Brain

Loop

World

Recap

Recap · Takeaways17 / 17

Keytakeaways

  • Structured outputsare the single biggest lever for turning an LLM into a reliable agent component
  • **Layout-aware doc tools (Parse / Extract / Classify)**are what allows the agent to really understand unstructured documents
  • Event-driven workflowsgive you loops, branches and async for free
  • Virtual filesystemslet you grant filesystem-style tools without the filesystem-style risk
  • Async interfacesare the right shape for long-running document work

Thank you!Questions?

Intro

Brain

Loop

World

Recap

Similar Articles