How consolidation works
Consolidation is the process that transforms raw friction events into actionable organizational intelligence. It’s not summarization — it’s synthesis.
The pipeline
Events (raw friction data) → Batch builder (groups by repo + time window) → LLM analysis (resolve contradictions, score confidence) → Learnings (structured, queryable, ranked) → Backlog items (evidence-ranked PBIs)1. Event collection
As agents work, they emit structured events via the MCP write tools — decisions, observations, deviations, completions, explorations. These are stored as immutable, append-only rows. An event looks like:
{ "event_type": "decision", "payload": { "decision": "Using pgvector for semantic search", "rationale": "Keeps the stack simple...", "alternatives": "Pinecone, Qdrant", "confidence": 0.85 }, "session_id": "sess_abc123", "timestamp": "2026-04-09T10:30:00Z"}Events are the raw material. Individually they’re noisy. In aggregate they reveal where your organization is systematically stuck.
2. Batch building
Consolidation groups recent events by repository and time window (default: last 24 hours, up to 500 events per batch). The batch builder estimates token budgets to stay within LLM context limits and splits large batches automatically.
3. LLM analysis
An LLM reviews each batch alongside existing learnings (to avoid duplication). The consolidation prompt is a 148-line engineered template with few-shot examples. It instructs the LLM to:
-
Merge contradictions. If Agent A chose approach X and Agent B later chose Y, produce a single learning explaining the transition with evidence from both.
-
Score confidence. 0.0–1.0 based on evidence strength and consistency. A single event gets low confidence; twelve sessions confirming the same finding gets high confidence.
-
Detect patterns. When multiple agents independently discover the same thing, consolidate into one high-confidence learning citing all source sessions.
-
Decay stale knowledge. Old learnings lose confidence if new evidence contradicts them.
-
Generate backlog items. High-confidence friction patterns that suggest a fix become evidence-ranked PBIs.
4. Output validation
Raw LLM output is validated: confidence clamped to [0, 1], categories normalized, duplicates rejected against existing learnings. Malformed entries are logged and dropped — consolidation never corrupts the learning store.
5. Storage
Valid learnings are stored with:
- summary — one sentence capturing the insight
- confidence — evidence-based score
- category — architecture, bug_pattern, convention, domain_knowledge, pattern, decision
- status — active, reinforced, contradicted, obsolete
- tags — area tags for filtering
- file_patterns — glob patterns for affected files
- source_sessions — which sessions contributed
- evidence_count — how many independent sources confirm this
- suggested_action — what to do about it (if applicable)
Why LLM, not ETL
Reasoning events are inherently messy:
-
Contradictions are common. Agent A decides to use approach X; Agent B in a later session discovers Y is better. A SQL pipeline would store both. An LLM resolves the contradiction with evidence.
-
Context collapse. A deviation event in isolation (“used REST instead of gRPC”) is ambiguous. The LLM cross-references the completion event from the same session (“outcome: success”) and derives meaning.
-
Patterns emerge. When 12 agents all struggle with the auth module, a pipeline stores 12 events. The LLM consolidates them into one high-confidence learning citing 12 sources.
The feedback loop
Consolidation quality improves over time through two mechanisms:
-
Explicit feedback. When agents call
feedback(learning_id, useful: true/false), that signal feeds back into future consolidation runs and trains the learning-to-rank model that powerscontextresults. -
Implicit feedback. When
contextreturns learnings and the agent uses some but ignores others, the used learning IDs are tracked as positive signals. -
Override events. When a human PM rejects or modifies a backlog item that consolidation generated, that override is captured as a structured event and fed back into future runs — tuning the consolidation prompt for that area.
The more your team uses Oka, the better consolidation gets. This isn’t a platitude — it’s a mechanical consequence of the feedback data accumulating.
Scheduling
Consolidation runs automatically on a configurable interval (default:
every hour). You can also trigger it manually via the consolidate MCP
tool or the POST /api/reasoning/consolidate HTTP endpoint.
After a burst of activity (end of sprint, major incident), triggering a manual consolidation run ensures insights are available immediately rather than waiting for the next scheduled cycle.
Monitoring
The consolidation pipeline emits OpenTelemetry metrics:
- Runs completed/failed
- Events processed per run
- Learnings produced per run
- LLM tokens consumed / latency / cost
- Backlog items generated
These are available at reason.oka.so’s metrics endpoint and can be
scraped into your existing observability stack.