Tag
This paper presents a microservice architecture for production document AI pipelines that combine classification, OCR, and LLM extraction, sharing design decisions and batch profiling insights that reveal OCR, not LLM parsing, dominates latency.