← Research

PAPER 003 JULY 2026 · UPCOMING

Latency budgets for production RAG.

Lessons from architecting a RAG system serving 10M+ daily requests. How to reason about end-to-end latency, what to optimize first, and where the surprising bottlenecks usually are.

  • Infrastructure
  • ML systems

Lessons from architecting a RAG system serving 10M+ daily requests. How to reason about end-to-end latency, what to optimize first, and where the surprising bottlenecks usually are.