Skip to main content

Your First RAG

This walkthrough builds a small retrieval pipeline from scratch using the core RAG Axis packages, then inspects the resulting RunResult in detail.

1. Ingest a Document

from ragaxis.core import Document
from ragaxis.ingestion import load_document

doc = load_document("notes/architecture.md")

load_document returns a frozen Document with metadata such as source path and content hash already populated.

2. Chunk It

from ragaxis.chunking import StructuralChunker

chunker = StructuralChunker()
chunks = chunker.split(doc)

Each Chunk carries parent_doc_id, position, and (once embedded) embedding_model_id, per invariant I1. This provenance is what lets RAG Axis trace a generated answer back to its source.

3. Embed and Index

from ragaxis.embedding import BatchEmbedder
from ragaxis.adapters.reference import InMemoryVectorStore

embedder = BatchEmbedder(model="text-embedding-3-small")
embedded = embedder.embed(chunks)

store = InMemoryVectorStore()
store.upsert(embedded)

The BatchEmbedder records the embedding model version on every chunk, so a later mismatch between index time and query time raises EmbeddingModelMismatchError instead of failing silently.

4. Retrieve

from ragaxis.retrieval import HybridRetriever

retriever = HybridRetriever(store=store)
result = retriever.query("How does RAG Axis assign chunk provenance?")

result is a RetrievalResult. It always carries a confidence signal, either a score or ConfidenceUnknown, per invariant I3.

5. Assemble Context and Generate

from ragaxis.context import ContextBudget, assemble
from ragaxis.generation import generate

budget = ContextBudget(max_tokens=4000)
context = assemble(result.chunks, budget=budget)

run_result = generate(query=result.query, context=context)

If any chunk had to be dropped to fit the budget, assemble emits a ContextTruncationEvent rather than dropping it silently, per invariant I2.

6. Inspect the RunResult

print(run_result.answer)
print(run_result.cost_report.tokens_consumed)
print(run_result.cost_report.estimated_cost_gbp)
print(run_result.cost_report.latency_ms)

Every field on run_result is immutable after construction (invariant I7). The cost_report breaks down tokens, latency, and estimated cost per stage, so you can see exactly where a query spent its time and budget.

Next Steps