Trama

Observability

Track Trama runtime health with Prometheus metrics, Grafana dashboards, and OpenTelemetry traces across queue claims, failures, latency, and Redis shard ownership. Use this page when you need to answer “is the orchestrator healthy?” and “where are workflows getting stuck?”

Best for SREs Metrics and tracing Production operations

What you can answer with this data

Use Trama observability to answer whether workflows are flowing through the queue, whether failures are climbing, whether Redis shard ownership is healthy, and whether end-to-end latency is staying within expectations.

Metrics Catalog

MetricTypeDescription
saga_enqueue_totalCounterQueue ingress
saga_dequeue_totalCounterQueue claims
saga_processed_totalCounterProcessed outcomes
saga_failed_totalCounterFailures by reason
saga_retried_totalCounterRetry scheduling rate
saga_rate_limited_totalCounterRate-limited executions
saga_redis_claim_scans_totalCounterRedis shard claim scans performed by queue claimers.
saga_redis_active_podsGaugeHealthy worker pods seen in Redis membership.
saga_redis_owned_shardsGaugeVirtual shards currently assigned to this worker.
saga_redis_membership_refresh_age_msGaugeAge of the last successful membership refresh.
saga_duration_secondsHistogramEnd-to-end duration
saga_step_duration_success_secondsHistogramSuccessful step duration (v1)
saga_node_duration_secondsHistogramPer-node execution duration (v2)
saga_callback_received_totalCounterAsync callbacks delivered to the orchestrator
saga_callback_rejected_totalCounterAsync callbacks rejected (expired, bad signature, replay, attempt mismatch)
saga_callback_timeout_totalCounterAsync tasks that timed out waiting for a callback
saga_switch_evaluations_totalCounterSwitch node evaluations

The Redis cluster rollout adds worker-coordination metrics so you can observe shard ownership, claim pressure, and membership freshness without inspecting Redis directly.

Definition-Level Labels

Grafana Dashboard

Import grafana/trama-saga-dashboard.json.

Suggested Alerts

Tracing

OpenTelemetry spans cover request handling and saga processing when telemetry is enabled.