Tag
A developer discusses the high cost of agentic workflows due to treating all inference as realtime, and asks the community for frameworks or patterns that support batch API natively to reduce costs.
The author shares how running multiple persistent AI agent profiles under Hermes led to high API costs, solved by implementing tiered model policies per profile, pre-processing inputs, and using an API gateway for cost visibility, reducing daily costs from $14-18 to $7-10.
A discussion post about the high costs of running LLM agents, with users sharing frustrations and seeking advice on tracking token spending and improving efficiency.