Trama

Configuration

Review the Trama configuration surface for workers, Redis topology, queue sharding, telemetry, and deployment profiles. Start with local defaults, then move into worker scaling and Redis Cluster settings when preparing production environments.

Reference page Operators and platform teams Local to production

Start here

For local development, focus on runtime enablement, Redis/Postgres connectivity, and metrics. For production, focus on worker count, Redis topology, queue behavior, telemetry, and shard ownership settings.

Core Keys

KeyDefaultPurpose
RUNTIME
runtime.enabledtrueEnable queue workers on this instance.
runtime.workerCount4Number of concurrent worker coroutines.
runtime.bufferSize200Internal dispatch channel buffer size.
runtime.emptyPollDelayMillis50Delay between polls when the queue is empty.
runtime.maxStepsPerExecution25Maximum nodes processed per worker cycle before re-enqueue.
runtime.storeREDISExecution state backend: REDIS or POSTGRES.
runtime.callback.baseUrl""Public base URL injected as {{runtime.callback.url}} in async task requests. Required for async tasks.
runtime.callback.hmacSecret""HMAC-SHA256 secret for signing and validating callback tokens. Required for async tasks.
runtime.callback.hmacKiddefaultKey ID embedded in the token header for future key rotation.
REDIS
redis.topologySTANDALONERedis client mode: STANDALONE or CLUSTER.
redis.urlredis://localhost:6379Connection URL used in standalone mode.
redis.pool.maxTotal16Maximum total connections in the Redis connection pool.
redis.pool.maxIdle16Maximum idle connections kept in the pool.
redis.pool.minIdle0Minimum idle connections maintained in the pool.
redis.cluster.nodes[]Seed nodes used when redis.topology=CLUSTER.
redis.queue.keyPrefixsaga:executionsPrefix for shard-aware ready and in-flight queue keys.
redis.consumer.batchSize50Messages claimed per poll cycle.
redis.consumer.processingTimeoutMillis60000In-flight lease timeout before a message is requeued.
redis.consumer.requeueIntervalMillis5000How often expired in-flight messages are recovered.
redis.sharding.virtualShardCount1024Virtual shard fan-out used by rendezvous allocation.
redis.sharding.membershipKeysaga:runtime:podsRedis ZSET tracking live worker pods.
redis.sharding.membershipTtlMillis10000Worker membership entry TTL.
redis.sharding.heartbeatIntervalMillis3000How often each worker refreshes its membership entry.
redis.sharding.refreshIntervalMillis2000How often each worker recalculates shard ownership.
redis.sharding.claimerCount4Parallel shard claimers per worker process.
DATABASE
database.hostdbPostgres host. Used when runtime.store=POSTGRES.
database.port5432Postgres port.
database.databasesagaPostgres database name.
database.usersagaPostgres user.
database.passwordsagaPostgres password.
database.pool.maxPoolSize10Maximum JDBC connection pool size.
database.pool.minIdle1Minimum idle JDBC connections.
HTTP CLIENT
http.connectTimeoutMillis10000Outbound HTTP connect timeout for workflow task requests.
http.requestTimeoutMillis30000Total outbound HTTP request timeout.
http.socketTimeoutMillis30000Outbound HTTP socket read timeout.
RATE LIMIT
rateLimit.enabledtrueEnable per-saga rate limiting.
rateLimit.maxFailures5Failure count within the window before an execution is blocked.
rateLimit.windowMillis60000Sliding window duration for failure counting.
rateLimit.blockMillis60000How long a blocked execution is held before retry is allowed.
rateLimit.keyPrefixsaga:rateRedis key prefix for rate-limit counters.
MAINTENANCE
maintenance.enabledtrueEnable the background partition maintenance job (Postgres only).
maintenance.partitionLookaheadMonths13How many months ahead to pre-create Postgres partitions.
maintenance.partitionStartOffsetMonths1How many months back to start partition creation.
maintenance.retentionDays15Days of execution history to retain before purging.
maintenance.intervalMillis3600000How often the maintenance job runs (1 hour default).
METRICS & TELEMETRY
metrics.enabledtrueExpose Prometheus metrics at /metrics.
telemetry.enabledfalseEnable OpenTelemetry span export.
telemetry.serviceNametramaService name attached to all exported spans.
telemetry.otlpEndpointhttp://localhost:4317OTLP gRPC endpoint for trace export.

Async Callback

Async task nodes pause a workflow and wait for an external system to POST to a callback URL. The three runtime.callback.* keys must be set for this to work. The runtime injects {{runtime.callback.url}} and {{runtime.callback.token}} template variables into the outbound request body, so the external service knows where to call back and which token to present.

runtime:
  callback:
    baseUrl: "https://trama.internal"   # must be reachable by downstream services
    hmacSecret: "change-me-in-production"
    hmacKid: "default"

Redis Cluster

Trama now supports Redis Cluster-aware queue sharding. Ready/in-flight keys, execution metadata, and rate-limit keys are written with hash tags so multi-key Lua operations stay inside one Redis slot.

redis:
  topology: "CLUSTER"
  cluster:
    nodes:
      - "redis://redis-cluster-0.redis:6379"
      - "redis://redis-cluster-1.redis:6379"
      - "redis://redis-cluster-2.redis:6379"
  queue:
    keyPrefix: "saga:executions"
  consumer:
    batchSize: 100
    processingTimeoutMillis: 60000
    requeueIntervalMillis: 5000
  sharding:
    podId: "${HOSTNAME}"
    virtualShardCount: 1024
    membershipKey: "saga:runtime:pods"
    membershipTtlMillis: 10000
    heartbeatIntervalMillis: 3000
    refreshIntervalMillis: 2000
    claimerCount: 4

Environment Overrides

Supported overrides include:

RUNTIME_ENABLED
METRICS_ENABLED
TELEMETRY_ENABLED
REDIS_URL
REDIS_TOPOLOGY
REDIS_CLUSTER_NODES
DATABASE_HOST
DATABASE_PORT
DATABASE_DATABASE
DATABASE_USER
DATABASE_PASSWORD

Profiles

Dev profile

runtime:
  workerCount: 2
  emptyPollDelayMillis: 100
metrics:
  enabled: true
telemetry:
  enabled: false

Prod profile (split API and Workers)

In production, run API pods separately from worker pods. API instances should not process queue jobs.

API deployment profile

runtime:
  enabled: false
metrics:
  enabled: true
telemetry:
  enabled: true
database:
  pool:
    maxPoolSize: 20

Worker deployment profile

runtime:
  enabled: true
  workerCount: 8
  bufferSize: 500
  maxStepsPerExecution: 25
redis:
  consumer:
    batchSize: 100
    processingTimeoutMillis: 60000
rateLimit:
  enabled: true
metrics:
  enabled: true
telemetry:
  enabled: true

Worker deployment profile with Redis Cluster

runtime:
  enabled: true
  workerCount: 8
  bufferSize: 500
redis:
  topology: "CLUSTER"
  cluster:
    nodes:
      - "redis://redis-cluster-0.redis:6379"
      - "redis://redis-cluster-1.redis:6379"
      - "redis://redis-cluster-2.redis:6379"
  consumer:
    batchSize: 100
    processingTimeoutMillis: 60000
    requeueIntervalMillis: 5000
  sharding:
    podId: "${HOSTNAME}"
    virtualShardCount: 1024
    membershipKey: "saga:runtime:pods"
    membershipTtlMillis: 10000
    heartbeatIntervalMillis: 3000
    refreshIntervalMillis: 2000
    claimerCount: 4
metrics:
  enabled: true
telemetry:
  enabled: true

Scale workers horizontally by running multiple worker deployments/replicas with this same worker profile.