Configuration
Review the Trama configuration surface for workers, Redis topology, queue sharding, telemetry, and deployment profiles. Start with local defaults, then move into worker scaling and Redis Cluster settings when preparing production environments.
Start here
For local development, focus on runtime enablement, Redis/Postgres connectivity, and metrics. For production, focus on worker count, Redis topology, queue behavior, telemetry, and shard ownership settings.
Core Keys
| Key | Default | Purpose |
|---|---|---|
| RUNTIME | ||
runtime.enabled | true | Enable queue workers on this instance. |
runtime.workerCount | 4 | Number of concurrent worker coroutines. |
runtime.bufferSize | 200 | Internal dispatch channel buffer size. |
runtime.emptyPollDelayMillis | 50 | Delay between polls when the queue is empty. |
runtime.maxStepsPerExecution | 25 | Maximum nodes processed per worker cycle before re-enqueue. |
runtime.store | REDIS | Execution state backend: REDIS or POSTGRES. |
runtime.callback.baseUrl | "" | Public base URL injected as {{runtime.callback.url}} in async task requests. Required for async tasks. |
runtime.callback.hmacSecret | "" | HMAC-SHA256 secret for signing and validating callback tokens. Required for async tasks. |
runtime.callback.hmacKid | default | Key ID embedded in the token header for future key rotation. |
| REDIS | ||
redis.topology | STANDALONE | Redis client mode: STANDALONE or CLUSTER. |
redis.url | redis://localhost:6379 | Connection URL used in standalone mode. |
redis.pool.maxTotal | 16 | Maximum total connections in the Redis connection pool. |
redis.pool.maxIdle | 16 | Maximum idle connections kept in the pool. |
redis.pool.minIdle | 0 | Minimum idle connections maintained in the pool. |
redis.cluster.nodes | [] | Seed nodes used when redis.topology=CLUSTER. |
redis.queue.keyPrefix | saga:executions | Prefix for shard-aware ready and in-flight queue keys. |
redis.consumer.batchSize | 50 | Messages claimed per poll cycle. |
redis.consumer.processingTimeoutMillis | 60000 | In-flight lease timeout before a message is requeued. |
redis.consumer.requeueIntervalMillis | 5000 | How often expired in-flight messages are recovered. |
redis.sharding.virtualShardCount | 1024 | Virtual shard fan-out used by rendezvous allocation. |
redis.sharding.membershipKey | saga:runtime:pods | Redis ZSET tracking live worker pods. |
redis.sharding.membershipTtlMillis | 10000 | Worker membership entry TTL. |
redis.sharding.heartbeatIntervalMillis | 3000 | How often each worker refreshes its membership entry. |
redis.sharding.refreshIntervalMillis | 2000 | How often each worker recalculates shard ownership. |
redis.sharding.claimerCount | 4 | Parallel shard claimers per worker process. |
| DATABASE | ||
database.host | db | Postgres host. Used when runtime.store=POSTGRES. |
database.port | 5432 | Postgres port. |
database.database | saga | Postgres database name. |
database.user | saga | Postgres user. |
database.password | saga | Postgres password. |
database.pool.maxPoolSize | 10 | Maximum JDBC connection pool size. |
database.pool.minIdle | 1 | Minimum idle JDBC connections. |
| HTTP CLIENT | ||
http.connectTimeoutMillis | 10000 | Outbound HTTP connect timeout for workflow task requests. |
http.requestTimeoutMillis | 30000 | Total outbound HTTP request timeout. |
http.socketTimeoutMillis | 30000 | Outbound HTTP socket read timeout. |
| RATE LIMIT | ||
rateLimit.enabled | true | Enable per-saga rate limiting. |
rateLimit.maxFailures | 5 | Failure count within the window before an execution is blocked. |
rateLimit.windowMillis | 60000 | Sliding window duration for failure counting. |
rateLimit.blockMillis | 60000 | How long a blocked execution is held before retry is allowed. |
rateLimit.keyPrefix | saga:rate | Redis key prefix for rate-limit counters. |
| MAINTENANCE | ||
maintenance.enabled | true | Enable the background partition maintenance job (Postgres only). |
maintenance.partitionLookaheadMonths | 13 | How many months ahead to pre-create Postgres partitions. |
maintenance.partitionStartOffsetMonths | 1 | How many months back to start partition creation. |
maintenance.retentionDays | 15 | Days of execution history to retain before purging. |
maintenance.intervalMillis | 3600000 | How often the maintenance job runs (1 hour default). |
| METRICS & TELEMETRY | ||
metrics.enabled | true | Expose Prometheus metrics at /metrics. |
telemetry.enabled | false | Enable OpenTelemetry span export. |
telemetry.serviceName | trama | Service name attached to all exported spans. |
telemetry.otlpEndpoint | http://localhost:4317 | OTLP gRPC endpoint for trace export. |
Async Callback
Async task nodes pause a workflow and wait for an external system to POST to a callback URL. The three runtime.callback.* keys must be set for this to work. The runtime injects {{runtime.callback.url}} and {{runtime.callback.token}} template variables into the outbound request body, so the external service knows where to call back and which token to present.
runtime:
callback:
baseUrl: "https://trama.internal" # must be reachable by downstream services
hmacSecret: "change-me-in-production"
hmacKid: "default"Redis Cluster
Trama now supports Redis Cluster-aware queue sharding. Ready/in-flight keys, execution metadata, and rate-limit keys are written with hash tags so multi-key Lua operations stay inside one Redis slot.
- Use
redis.topology: CLUSTERto enable the Lettuce cluster client. - Set
redis.cluster.nodesto one or more reachable cluster seed nodes. - Keep the same
redis.queue.keyPrefixon every worker pod. - Give each worker process a unique
redis.sharding.podIdso rendezvous ownership is stable. - Tune
redis.sharding.virtualShardCountandredis.sharding.claimerCounttogether when scaling out.
redis:
topology: "CLUSTER"
cluster:
nodes:
- "redis://redis-cluster-0.redis:6379"
- "redis://redis-cluster-1.redis:6379"
- "redis://redis-cluster-2.redis:6379"
queue:
keyPrefix: "saga:executions"
consumer:
batchSize: 100
processingTimeoutMillis: 60000
requeueIntervalMillis: 5000
sharding:
podId: "${HOSTNAME}"
virtualShardCount: 1024
membershipKey: "saga:runtime:pods"
membershipTtlMillis: 10000
heartbeatIntervalMillis: 3000
refreshIntervalMillis: 2000
claimerCount: 4Environment Overrides
Supported overrides include:
RUNTIME_ENABLED
METRICS_ENABLED
TELEMETRY_ENABLED
REDIS_URL
REDIS_TOPOLOGY
REDIS_CLUSTER_NODES
DATABASE_HOST
DATABASE_PORT
DATABASE_DATABASE
DATABASE_USER
DATABASE_PASSWORDProfiles
Dev profile
runtime:
workerCount: 2
emptyPollDelayMillis: 100
metrics:
enabled: true
telemetry:
enabled: falseProd profile (split API and Workers)
In production, run API pods separately from worker pods. API instances should not process queue jobs.
API deployment profile
runtime:
enabled: false
metrics:
enabled: true
telemetry:
enabled: true
database:
pool:
maxPoolSize: 20Worker deployment profile
runtime:
enabled: true
workerCount: 8
bufferSize: 500
maxStepsPerExecution: 25
redis:
consumer:
batchSize: 100
processingTimeoutMillis: 60000
rateLimit:
enabled: true
metrics:
enabled: true
telemetry:
enabled: trueWorker deployment profile with Redis Cluster
runtime:
enabled: true
workerCount: 8
bufferSize: 500
redis:
topology: "CLUSTER"
cluster:
nodes:
- "redis://redis-cluster-0.redis:6379"
- "redis://redis-cluster-1.redis:6379"
- "redis://redis-cluster-2.redis:6379"
consumer:
batchSize: 100
processingTimeoutMillis: 60000
requeueIntervalMillis: 5000
sharding:
podId: "${HOSTNAME}"
virtualShardCount: 1024
membershipKey: "saga:runtime:pods"
membershipTtlMillis: 10000
heartbeatIntervalMillis: 3000
refreshIntervalMillis: 2000
claimerCount: 4
metrics:
enabled: true
telemetry:
enabled: trueScale workers horizontally by running multiple worker deployments/replicas with this same worker profile.