Skip to main content

Cost Accountability

The Problem

Without cost visibility, engineers cannot make quality-cost tradeoffs. Without per-stage cost tracking, engineers cannot identify which stage is responsible for cost explosions.

Common anti-pattern: increasing top-k to improve retrieval quality without realising it drives quadratic LLM context processing costs. No mainstream RAG library surfaces this until after the bill arrives.

The RAG Axis Approach

Every pipeline result carries a CostReport. This is not a logging feature. It is part of the core return type.

CostReport contains:

  • tokens_consumed per stage (retrieval, reranking, context assembly, generation)
  • latency_ms per stage
  • estimated_cost_gbp (configurable currency)
  • total_tokens
  • cache_hit (bool, was this query served from cache)

Why Per-Stage

Aggregate cost per query is insufficient. Per-stage cost tells you:

  • Whether reranking a large candidate set is worth the latency
  • Whether your chunking strategy is producing oversized chunks
  • Whether semantic caching would pay for itself at your query volume

Non-Optional

CostReport is non-optional because optional cost tracking is cost tracking that does not exist in production.