John Weldon

Why Your JetStream Handler Runs Twice

A common JetStream debugging scenario: your message handler processes a message, but before it finishes, the same message arrives again. The handler runs twice. Sometimes three or four times. Throughput drops. Downstream systems get duplicate requests.

This is JetStream working as designed. The AckWait timer expired before your handler acknowledged the message, so the server redelivered it. The fix is either to change the timer, signal progress, or rethink how you’re consuming.

The Default Problem

JetStream push consumers ship with these defaults:

Setting Default Effect
AckWait 30s Time before redelivery
MaxAckPending 1000 Max unacked messages in flight

For a single-threaded handler processing messages sequentially, the worst case is revealing. If the server pushes 1,000 messages to the client (filling MaxAckPending), the first message has 30 seconds before the server considers it timed out. But the handler has to process all 1,000 sequentially. That gives each message an effective budget of:

AckWait / MaxAckPending = 30,000ms / 1000 = 30ms

If your handler takes more than 30ms on average, earlier messages will be redelivered while the handler is still working through the queue. Each redelivery adds another copy to the pending queue, making the problem compound.

This worst case only applies when messages are arriving faster than the handler can process them. For low-throughput streams, AckWait is the effective budget per message.

Fix 1: Tune the Consumer

Match AckWait and MaxAckPending to your actual processing characteristics:

AckWait = 3x your maximum expected processing time
MaxAckPending = AckWait / average processing time

For handlers that take 100ms to 1 second per message:

consumerConfig := &nats.ConsumerConfig{
    AckWait:       60 * time.Second,
    MaxAckPending: 100,
}
// Effective budget: 600ms per message
Handler Speed AckWait MaxAckPending Effective Budget
Fast (<100ms) 30s 1000 30ms (default)
Medium (100ms-1s) 60s 100-500 120-600ms
Slow (1-10s) 120s 10-50 2.4-12s
Very slow (10s+) 300s 5-10 30-60s

Fix 2: Signal Progress

For handlers with unpredictable processing times, call msg.InProgress() periodically to reset the server’s ack timer:

func processWithHeartbeat(msg *nats.Msg, handler func(*nats.Msg) error) {
    done := make(chan struct{})

    // Send progress signals every 10 seconds
    go func() {
        ticker := time.NewTicker(10 * time.Second)
        defer ticker.Stop()
        for {
            select {
            case <-ticker.C:
                msg.InProgress()
            case <-done:
                return
            }
        }
    }()

    err := handler(msg)
    close(done)

    if err != nil {
        msg.Nak()
        return
    }
    msg.Ack()
}

Set the progress interval to roughly 1/3 of AckWait. If AckWait is 30 seconds, signal progress every 10 seconds. This provides margin for network latency.

Note: InProgress() is fire-and-forget. If the network drops the signal, the server timer continues and redelivery may still occur. For critical workloads, make your handlers idempotent regardless.

Fix 3: Use Pull Consumers

Pull consumers give you explicit control over message fetching:

cons, _ := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
    FilterSubject: "orders.*",
    AckWait:       60 * time.Second,
})

for {
    batch, _ := cons.Fetch(10, jetstream.FetchMaxWait(30*time.Second))
    for msg := range batch.Messages() {
        if err := processOrder(msg); err != nil {
            msg.Nak()
            continue
        }
        msg.Ack()
    }
}

With pull consumers, you control when to fetch more messages. This naturally provides back-pressure: if processing is slow, you simply don’t fetch the next batch until you’re ready. The redelivery problem is significantly reduced (though not eliminated – if you fetch a batch of 10 and each takes 30 seconds, later messages in the batch still face AckWait pressure).

For complete elimination of in-batch timeout risk, fetch one message at a time.

Why This Is Application-Level

These patterns are implemented in your code rather than the client library by design. Deduplication requires knowing what “successfully processed” means for your specific use case. Automatic progress signals could mask genuinely broken handlers, delaying failure detection. These are policy decisions that belong in application code.

BackOff Interaction

If you configure BackOff on a consumer, be aware that it replaces AckWait with the first BackOff value:

consumerConfig := &nats.ConsumerConfig{
    AckWait: 60 * time.Second,                          // overridden by BackOff[0]
    BackOff: []time.Duration{10*time.Second, 30*time.Second, 120*time.Second},
}
// Effective first-delivery timeout: 10 seconds, not 60

If you tune AckWait per Fix 1 and later add BackOff, the effective timeout drops to BackOff[0], potentially reintroducing the redelivery problem.

Note: BackOff applies only to timer-based redeliveries. If your handler calls msg.Nak(), the message is redelivered immediately regardless of BackOff configuration. Use msg.NakWithDelay() to control nak redelivery timing.1

Monitoring

Track redelivery through message metadata:

meta, _ := msg.Metadata()
if meta.NumDelivered > 1 {
    log.Printf("redelivery #%d for stream seq %d",
        meta.NumDelivered, meta.Sequence.Stream)
}

Configure MaxDeliver to limit retry attempts for poison messages:

consumerConfig := &nats.ConsumerConfig{
    MaxDeliver: 5, // stop after 5 attempts
}

When a message exceeds MaxDeliver, JetStream publishes an advisory to $JS.EVENT.ADVISORY.CONSUMER.MAX_DELIVERIES.<stream>.<consumer>. Subscribe to this for visibility into exhausted messages.2


  1. JetStream Consumers - BackOff - BackOff defines escalating delays for timer-based redeliveries. Explicit nak() calls bypass BackOff and trigger immediate redelivery. ↩︎

  2. Consumer Configuration - AckWait defaults to 30 seconds, MaxAckPending to 1000, MaxDeliver to -1 (unlimited). Advisory events are published for max delivery exceedances. ↩︎