Appearance
ADR-022 — KairoEventBus → OpenTelemetry Exporter (v0.9)
Status
Accepted — implemented in v0.9.0. Exporter lives in kairo-observability (io.kairo.observability.event.KairoEventOTelExporter); auto-configuration ships in kairo-spring-boot-starter-observability. Supersedes the stub note in the v0.7 security-observability schema ("An OTel-based SecurityEventSink implementation is planned for v0.8") — v0.9 turns that plan into a concrete exporter that covers every KairoEventBus domain, not just security.
Context
ADR-018 introduced KairoEventBus as the single publication surface for execution, evolution, security, and team-orchestration envelopes. With the bus in place, shipping observability is no longer each emitter's concern — but we still needed a concrete bridge to OpenTelemetry for operators who run Grafana / Datadog / Splunk on the other end.
The v0.9 remaining-P0 scoping doc (docs/roadmap/v0.9-remaining-p0-scoping.md, item P0#3 / D3) locked the shape:
- Bridge every
KairoEventenvelope to an OTelLogRecord(not span, not metric) — logs match the "point-in-time structured event" model every existing Kairo sink already uses. - Security envelopes are always-on; execution/team/evolution honour a configurable
samplingRatioso sustained throughput does not drown the backend. - Attribute namespace is
kairo.<domain>.*— every bus attribute gets flattened under this prefix so dashboards can pivot across domains without colliding with unrelated OTel resource attributes. - The exporter must never throw across the bus boundary: a downstream OTel failure must not break Kairo's publish path.
Decision
Component surface
KairoEventOTelExporter(final class,@Experimentalthrough v0.9):- Constructor:
(KairoEventBus, LoggerProvider, Set<String> includeDomains, double samplingRatio, @Nullable Pattern redactKeys). - Lifecycle:
start()subscribes tobus.subscribe();stop()disposes the subscription. Both idempotent. - Counters:
exportedCount(),droppedByDomainCount(),droppedBySamplingCount()— exposed for tests and lightweight dashboards. Production deployments should prefer their own OTel SDK metrics. - Errors inside
export()are caught, logged at WARN, and swallowed — the bus never sees them.
- Constructor:
Mapping contract
- Envelope-level attributes on every record:
kairo.event.id,kairo.domain,kairo.event.type. - Per-domain attributes: for each entry in
event.attributes(), the flat key iskairo.<domain>.<originalKey>and the value isString.valueOf(v). Nested payload objects are not flattened — publishers are responsible for producing bus-safe attributes. LogRecord.body=event.eventType();LogRecord.severityText= domain name;LogRecord.observedTimestamp=event.timestamp().- Severity =
WARNforsecurity,INFOfor every other domain.
Filtering + sampling
includeDomainsdrops any envelope whosedomain()is not in the set. Counted indroppedByDomainCount.- Non-security envelopes are additionally gated by
ThreadLocalRandom.nextDouble() < samplingRatio. Ratio is validated on construction; only[0.0, 1.0]is accepted. Security envelopes skip this check entirely.
Redaction
- Optional
Patternmatched against the flat key (e.g.kairo.execution.prompt). On match, the value is replaced with the literal<redacted>before being attached to the record. Redaction is key-driven because attribute values are already opaque to the exporter; operators compose the pattern from the starter'sredactAttributePatternslist.
Spring Boot starter
kairo-spring-boot-starter-observability:
- Opt-in via
kairo.observability.event-otel.enabled=true(defaultfalse, consistent with every other v0.9 starter). - Wires a single
KairoEventOTelExporterbean when aLoggerProviderbean is available. WithoutLoggerProviderthe starter deliberately stays inert — applications bring their own OTel SDK (typically the OTel Spring Boot starter), matching the deny-safe posture in ADR-021. auto-start=true(default) drivesexporter.start()in the bean factory; operators who want manual lifecycle flip it off.- Default
include-domains=[security],sampling-ratio=1.0, no redaction. Execution/team/evolution require explicit opt-in.
Consequences
- Pros
- One contract covers every bus domain — operators dashboard-once, not per-subsystem.
- Sampling keeps Kairo honest under load: a chatty execution log does not DoS the OTel backend, but security decisions always make it out.
- Redaction is enforced in the exporter, not the publisher, so the bus keeps full fidelity for in-process subscribers while external observability is sanitised.
- Deny-safe starter posture — no auto-configured transport unless the application brings a
LoggerProvider.
- Cons
- Flattening-only: nested structures in
event.attributes()are stringified, which loses structure. Publishers that care must pre-flatten (documented in the security-observability-schema guide). - Uniform sampler: one ratio across execution/team/evolution. Per-domain sampling is deferred — adapters that need it can wrap the bus with a domain-specific filter before publishing.
- Sampling uses
ThreadLocalRandom, not a deterministic hash over trace id. That is fine for log-record sampling but means two different Kairo processes won't agree on which envelope to drop. Future work: parent-based sampling once Kairo emits trace context.
- Flattening-only: nested structures in
- Deferred to post-v0.9
- Span emission for execution envelopes (OTel
Tracerintegration). Logs were deliberately chosen first because every existing Kairo sink expects a point-in-time record, not a span. - Parent-based / trace-id sampling (needs Kairo to emit trace context on the bus first).
- Metric aggregates (e.g.
kairo.events.dropped.totalas an OTel counter). Today we exposedroppedBySamplingCount()/droppedByDomainCount()as plain Java getters; migrating them toLongCounteris a drop-in change.
- Span emission for execution envelopes (OTel
Non-goals (v0.9)
- Replacing
LoggingSecurityEventSink. The SLF4J sink keeps running in parallel for legacy dashboards; the OTel path is strictly additive. - Shipping an OTLP exporter inside this module. Applications wire the concrete OTel exporter (OTLP / Jaeger / Zipkin / File) themselves via their
LoggerProvider. - A metrics pipeline. Logs only in v0.9; metrics and spans come with ADR-023/024 when we have real telemetry from v0.9 adopters to justify their shape.