How consolidation works

Consolidation is the process that transforms raw friction events into actionable organizational intelligence. It’s not summarization — it’s synthesis.

The pipeline

Events (raw friction data)
  → Batch builder (groups by repo + time window)
    → LLM analysis (resolve contradictions, score confidence)
      → Learnings (structured, queryable, ranked)
        → Backlog items (evidence-ranked PBIs)

1. Event collection

As agents work, they emit structured events via the MCP write tools — decisions, observations, deviations, completions, explorations. These are stored as immutable, append-only rows. An event looks like:

{
  "event_type": "decision",
  "payload": {
    "decision": "Using pgvector for semantic search",
    "rationale": "Keeps the stack simple...",
    "alternatives": "Pinecone, Qdrant",
    "confidence": 0.85
  },
  "session_id": "sess_abc123",
  "timestamp": "2026-04-09T10:30:00Z"
}

Events are the raw material. Individually they’re noisy. In aggregate they reveal where your organization is systematically stuck.

2. Batch building

Consolidation groups recent events by repository and time window (default: last 24 hours, up to 500 events per batch). The batch builder estimates token budgets to stay within LLM context limits and splits large batches automatically.

3. LLM analysis

An LLM reviews each batch alongside existing learnings (to avoid duplication). The consolidation prompt is a 148-line engineered template with few-shot examples. It instructs the LLM to:

Merge contradictions. If Agent A chose approach X and Agent B later chose Y, produce a single learning explaining the transition with evidence from both.
Score confidence. 0.0–1.0 based on evidence strength and consistency. A single event gets low confidence; twelve sessions confirming the same finding gets high confidence.
Detect patterns. When multiple agents independently discover the same thing, consolidate into one high-confidence learning citing all source sessions.
Decay stale knowledge. Old learnings lose confidence if new evidence contradicts them.
Generate backlog items. High-confidence friction patterns that suggest a fix become evidence-ranked PBIs.

4. Output validation

Raw LLM output is validated: confidence clamped to [0, 1], categories normalized, duplicates rejected against existing learnings. Malformed entries are logged and dropped — consolidation never corrupts the learning store.

5. Storage

Valid learnings are stored with:

summary — one sentence capturing the insight
confidence — evidence-based score
category — architecture, bug_pattern, convention, domain_knowledge, pattern, decision
status — active, reinforced, contradicted, obsolete
tags — area tags for filtering
file_patterns — glob patterns for affected files
source_sessions — which sessions contributed
evidence_count — how many independent sources confirm this
suggested_action — what to do about it (if applicable)

Why LLM, not ETL

Reasoning events are inherently messy:

Contradictions are common. Agent A decides to use approach X; Agent B in a later session discovers Y is better. A SQL pipeline would store both. An LLM resolves the contradiction with evidence.
Context collapse. A deviation event in isolation (“used REST instead of gRPC”) is ambiguous. The LLM cross-references the completion event from the same session (“outcome: success”) and derives meaning.
Patterns emerge. When 12 agents all struggle with the auth module, a pipeline stores 12 events. The LLM consolidates them into one high-confidence learning citing 12 sources.

The feedback loop

Consolidation quality improves over time through two mechanisms:

Explicit feedback. When agents call feedback(learning_id, useful: true/false), that signal feeds back into future consolidation runs and trains the learning-to-rank model that powers context results.
Implicit feedback. When context returns learnings and the agent uses some but ignores others, the used learning IDs are tracked as positive signals.
Override events. When a human PM rejects or modifies a backlog item that consolidation generated, that override is captured as a structured event and fed back into future runs — tuning the consolidation prompt for that area.

The more your team uses Oka, the better consolidation gets. This isn’t a platitude — it’s a mechanical consequence of the feedback data accumulating.

Scheduling

Consolidation runs automatically on a configurable interval (default: every hour). You can also trigger it manually via the consolidate MCP tool or the POST /api/reasoning/consolidate HTTP endpoint.

After a burst of activity (end of sprint, major incident), triggering a manual consolidation run ensures insights are available immediately rather than waiting for the next scheduled cycle.

Monitoring

The consolidation pipeline emits OpenTelemetry metrics:

Runs completed/failed
Events processed per run
Learnings produced per run
LLM tokens consumed / latency / cost
Backlog items generated

These are available at reason.oka.so’s metrics endpoint and can be scraped into your existing observability stack.