Appearance
ADR-006: Compaction Pipeline Architecture
Status
Accepted (v0.5.0)
Context
Agent memory grows unbounded in long-running sessions. A single conversation can accumulate thousands of context entries, eventually exceeding model context windows and degrading response quality. Academic research (Morph FlashCompact) demonstrates that multi-stage compression achieves significantly better retention (37% loss) compared to single-pass approaches (98% loss), validating a layered strategy.
The initial approach of truncating old messages was too aggressive — important context from early in the conversation was lost, causing the agent to repeat questions or forget constraints the user had already provided.
Decision
Implement a 6-stage layered compaction pipeline with pressure-based triggers:
- Snip — Remove tool call/result pairs that are no longer referenced. Lightest operation, no LLM involvement. Triggered at 80% memory pressure.
- Micro — Summarize individual tool results into one-line summaries. Lightweight LLM call per result. Triggered at 84% memory pressure.
- Collapse — Merge consecutive same-role messages into single messages. No LLM involvement. Triggered at 88% memory pressure.
- Auto — LLM-generated summary of a sliding window of messages, replacing the window with a summary message. Triggered at 92% memory pressure.
- Partial — Summarize everything except the last N messages (configurable). Aggressive LLM summarization. Triggered at 95% memory pressure.
- Full — Emergency compaction: summarize the entire conversation into a single context message. Only triggered at 98% memory pressure as a last resort.
Pressure-based triggers start at 80% of the configured token budget. Each stage is progressively more aggressive. The pipeline short-circuits when pressure drops below the threshold for the next stage.
Circuit breaker protects against LLM-backed compaction failures (stages 2, 4, 5, 6). After 3 consecutive failures, the circuit opens and the pipeline falls back to non-LLM stages only. The circuit resets after a configurable cooldown period.
All thresholds are configurable via the CompactionThresholds record (9 fields): snipThreshold, microThreshold, collapseThreshold, autoThreshold, partialThreshold, fullThreshold, retainCount, circuitBreakerFailures, circuitBreakerCooldown.
Consequences
- Positive: Memory pressure is handled gracefully with proportional response — light operations first, heavy operations only when necessary.
- Positive: Each stage is independently configurable and testable. Operators can tune thresholds per use case (e.g., customer support may want higher
retainCount). - Positive: The 80% trigger threshold is validated by industry research (Morph FlashCompact) as the optimal point to begin compaction before quality degradation.
- Positive: Circuit breaker prevents cascading failures when the LLM is unavailable — the agent continues operating with reduced but functional context management.
- Negative: 6 stages add complexity. Operators must understand the pipeline to tune it effectively. Mitigated by sensible defaults in
CompactionThresholds. - Negative: LLM-backed stages (Micro, Auto, Partial, Full) incur additional API costs. The pressure-based triggering minimizes unnecessary calls.
References
CompactionStrategy.java— SPI for compaction stagesCompactionConfig.java— Threshold configurationCompactionResult.java— Stage execution results- Morph FlashCompact research — Multi-stage compression validation