Skip to content

Retry Strategies

Retry suspends invocation

When a step throws an exception, the SDK uses the step's retry strategy to define the retry behaviour. When the strategy logic requires a retry, the SDK checkpoints the error and the scheduled resume time, and then ends the Lambda invocation. The backend starts a new invocation for the execution at the scheduled resume time and the SDK replays the step body.

Retries do not consume Lambda execution time while waiting for the next retry.

When a step exhausts all retry attempts, the SDK checkpoints the final error and throws it to your handler. If you configure no retry strategy on a step, any exception propagates immediately without retrying.

Configure a retry strategy

A retry strategy is a function that takes the error and the current attempt number, and returns a decision. The decision is either to retry with a given delay, or to stop. You can write a retry strategy directly yourself or use the built-in helper to build a ready-made retry strategy from configuration.

RetryStrategy helper

Use createRetryStrategy() to build a strategy, then pass it as retryStrategy in StepConfig.

import {
  withDurableExecution,
  createRetryStrategy,
  StepConfig,
} from "@aws/durable-execution-sdk-js";

const retryStrategy = createRetryStrategy({
  maxAttempts: 5,
  initialDelay: { seconds: 2 },
  maxDelay: { minutes: 1 },
  backoffRate: 2,
});

const stepConfig: StepConfig<string> = { retryStrategy };

export const handler = withDurableExecution(async (event, context) => {
  const result = await context.step(
    "call-external-api",
    async () => callExternalApi(),
    stepConfig,
  );
  return result;
});

async function callExternalApi(): Promise<string> {
  return "ok";
}

Use create_retry_strategy() with a RetryStrategyConfig, then pass it as retry_strategy in StepConfig.

from aws_durable_execution_sdk_python.config import Duration, StepConfig
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
from aws_durable_execution_sdk_python.retries import RetryStrategyConfig, create_retry_strategy

retry_strategy = create_retry_strategy(
    RetryStrategyConfig(
        max_attempts=5,
        initial_delay=Duration.from_seconds(2),
        max_delay=Duration.from_minutes(1),
        backoff_rate=2.0,
    )
)

step_config = StepConfig(retry_strategy=retry_strategy)


@durable_step
def call_external_api(step_context: StepContext) -> str:
    return "ok"


@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> str:
    result = context.step(call_external_api(), config=step_config)
    return result

Use RetryStrategies.exponentialBackoff() to build a strategy, then pass it to StepConfig.builder().retryStrategy().

import java.time.Duration;
import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.StepContext;
import software.amazon.lambda.durable.config.StepConfig;
import software.amazon.lambda.durable.retry.JitterStrategy;
import software.amazon.lambda.durable.retry.RetryStrategies;

public class ExponentialBackoffExample extends DurableHandler<Object, String> {
    @Override
    public String handleRequest(Object input, DurableContext context) {
        StepConfig config = StepConfig.builder()
            .retryStrategy(RetryStrategies.exponentialBackoff(
                3,
                Duration.ofSeconds(1),
                Duration.ofSeconds(10),
                2.0,
                JitterStrategy.FULL))
            .build();

        String result = context.step("retry_step", String.class,
            (StepContext ctx) -> "Step with exponential backoff",
            config);

        return "Result: " + result;
    }
}

RetryStrategyConfig signature

import {
  RetryStrategyConfig,
  JitterStrategy,
  Duration,
} from "@aws/durable-execution-sdk-js";

// RetryStrategyConfig shape:
// {
//   maxAttempts?: number          // default: 3
//   initialDelay?: Duration       // default: { seconds: 5 }
//   maxDelay?: Duration           // default: { minutes: 5 }
//   backoffRate?: number          // default: 2
//   jitter?: JitterStrategy       // default: JitterStrategy.FULL
//   retryableErrors?: (string | RegExp)[]
//   retryableErrorTypes?: (new () => Error)[]
// }

Parameters:

  • maxAttempts (optional) Total attempts including the initial attempt. Default: 3.
  • initialDelay (optional) Delay before the first retry. Default: { seconds: 5 }.
  • maxDelay (optional) Maximum delay between retries. Default: { minutes: 5 }.
  • backoffRate (optional) Multiplier applied to the delay on each retry. Default: 2.
  • jitter (optional) A JitterStrategy value. Default: JitterStrategy.FULL.
  • retryableErrors (optional) Array of strings or RegExp patterns matched against the error message. The SDK retries all errors when you set neither retryableErrors nor retryableErrorTypes.
  • retryableErrorTypes (optional) Array of error classes. The SDK retries only errors that are instances of these classes. When you set both filters, the SDK retries an error if it matches either (OR logic).
import re
from dataclasses import dataclass
from aws_durable_execution_sdk_python.config import Duration, JitterStrategy
from aws_durable_execution_sdk_python.retries import RetryDecision


@dataclass
class RetryStrategyConfig:
    max_attempts: int = 3
    initial_delay: Duration = Duration.from_seconds(5)
    max_delay: Duration = Duration.from_minutes(5)
    backoff_rate: float = 2.0
    jitter_strategy: JitterStrategy = JitterStrategy.FULL
    retryable_errors: list[str | re.Pattern] | None = None
    retryable_error_types: list[type[Exception]] | None = None

Parameters:

  • max_attempts (optional) Total attempts including the initial attempt. Default: 3.
  • initial_delay (optional) A Duration. Default: Duration.from_seconds(5).
  • max_delay (optional) A Duration. Default: Duration.from_minutes(5).
  • backoff_rate (optional) Multiplier applied to the delay on each retry. Default: 2.0.
  • jitter_strategy (optional) A JitterStrategy value. Default: JitterStrategy.FULL.
  • retryable_errors (optional) List of strings or compiled re.Pattern objects matched against the error message. The SDK retries all errors when you set neither retryable_errors nor retryable_error_types.
  • retryable_error_types (optional) List of exception classes. The SDK retries only exceptions that are instances of these classes. When you set both filters, the SDK retries an error if it matches either (OR logic).
RetryStrategy RetryStrategies.exponentialBackoff(
    int maxAttempts,
    Duration initialDelay,
    Duration maxDelay,
    double backoffRate,
    JitterStrategy jitter
)

RetryStrategy RetryStrategies.fixedDelay(
    int maxAttempts,
    Duration fixedDelay
)

Parameters:

  • maxAttempts Total attempts including the initial attempt.
  • initialDelay A java.time.Duration. Minimum 1 second.
  • maxDelay A java.time.Duration. Minimum 1 second.
  • backoffRate Multiplier applied to the delay on each retry.
  • jitter A JitterStrategy value: FULL, HALF, or NONE.

Java does not have built-in error type filtering. Filter by error type manually inside the RetryStrategy lambda. See Retrying specific errors.

JitterStrategy

import { JitterStrategy } from "@aws/durable-execution-sdk-js";

enum JitterStrategy {
  NONE = "NONE", // exact calculated delay
  FULL = "FULL", // random between 0 and base_delay
  HALF = "HALF", // random between 50% and 100% of base_delay
}
from aws_durable_execution_sdk_python.config import JitterStrategy

class JitterStrategy(StrEnum):
    NONE = "NONE"  # exact calculated delay
    FULL = "FULL"  # random between 0 and base_delay
    HALF = "HALF"  # random between 50% and 100% of base_delay
import software.amazon.lambda.durable.retry.JitterStrategy;

enum JitterStrategy {
    NONE, // exact calculated delay
    FULL, // random between 0 and base_delay
    HALF  // random between 50% and 100% of base_delay
}

Delay calculation

The SDK calculates the delay before each retry using exponential backoff with jitter:

base_delay = min(initial_delay × backoff_rate ^ (attempt - 1), max_delay)
final_delay = jitter(base_delay), minimum 1 second
  • JitterStrategy.FULL randomizes the delay between 0 and base_delay. This spreads retries across time and avoids many clients retrying simultaneously after a shared failure.
  • JitterStrategy.HALF randomizes between 50% and 100% of base_delay.
  • JitterStrategy.NONE uses the exact calculated delay.

Write a custom strategy

You can write your own retry strategy directly. The SDK calls it with the error and the current attempt number after each failure. The attempt number is one-indexed.

RetryStrategy signature

import { RetryDecision } from "@aws/durable-execution-sdk-js";

// (error: Error, attemptCount: number) => RetryDecision
// attemptCount is one-indexed: 1 on the first retry, 2 on the second, etc.

type RetryStrategy = (error: Error, attemptCount: number) => RetryDecision;
from collections.abc import Callable
from aws_durable_execution_sdk_python.retries import RetryDecision

# retry_strategy: Callable[[Exception, int], RetryDecision]
# attempt_count is one-indexed: 1 on the first retry, 2 on the second, etc.
import software.amazon.lambda.durable.retry.RetryDecision;
import software.amazon.lambda.durable.retry.RetryStrategy;

// @FunctionalInterface
// interface RetryStrategy {
//     RetryDecision makeRetryDecision(Throwable error, int attempt);
// }
// attempt is one-indexed: 1 on the first retry, 2 on the second, etc.

Example

Return { shouldRetry: false } to stop, or { shouldRetry: true, delay: { seconds: N } } to retry.

import {
  withDurableExecution,
  StepConfig,
  RetryDecision,
} from "@aws/durable-execution-sdk-js";

// A retry strategy is a plain function: (error: Error, attemptCount: number) => RetryDecision
// attemptCount is 1-based: 1 on the first retry, 2 on the second, etc.
const customRetryStrategy = (error: Error, attemptCount: number): RetryDecision => {
  if (attemptCount >= 4) {
    return { shouldRetry: false };
  }
  // Fixed 2-second delay regardless of attempt number
  return { shouldRetry: true, delay: { seconds: 2 } };
};

const stepConfig: StepConfig<string> = { retryStrategy: customRetryStrategy };

export const handler = withDurableExecution(async (event, context) => {
  const result = await context.step(
    "call-api",
    async () => callApi(),
    stepConfig,
  );
  return result;
});

async function callApi(): Promise<string> {
  return "ok";
}

Use RetryDecision.retry(Duration) or RetryDecision.no_retry().

from aws_durable_execution_sdk_python.config import Duration, StepConfig
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
from aws_durable_execution_sdk_python.retries import RetryDecision


# A retry strategy is a plain callable: (Exception, int) -> RetryDecision
# attempt_count is 1-based: 1 on the first retry, 2 on the second, etc.
def custom_retry_strategy(error: Exception, attempt_count: int) -> RetryDecision:
    if attempt_count >= 4:
        return RetryDecision.no_retry()
    # Fixed 2-second delay regardless of attempt number
    return RetryDecision.retry(Duration.from_seconds(2))


step_config = StepConfig(retry_strategy=custom_retry_strategy)


@durable_step
def call_api(step_context: StepContext) -> str:
    return "ok"


@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> str:
    return context.step(call_api(), config=step_config)

Use RetryDecision.retry(Duration) or RetryDecision.fail().

import java.time.Duration;
import java.util.Map;
import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.config.StepConfig;
import software.amazon.lambda.durable.retry.RetryDecision;
import software.amazon.lambda.durable.retry.RetryStrategy;

public class CustomRetryStrategyExample extends DurableHandler<Map<String, Object>, String> {

    // RetryStrategy is a functional interface: (Throwable error, int attempt) -> RetryDecision
    // attempt is 1-based: 1 on the first retry, 2 on the second, etc.
    private static final RetryStrategy customRetryStrategy = (error, attempt) -> {
        if (attempt >= 4) {
            return RetryDecision.fail();
        }
        // Fixed 2-second delay regardless of attempt number
        return RetryDecision.retry(Duration.ofSeconds(2));
    };

    @Override
    public String handleRequest(Map<String, Object> event, DurableContext context) {
        return context.step("call-api", String.class,
                stepCtx -> callApi(),
                StepConfig.builder().retryStrategy(customRetryStrategy).build());
    }

    private String callApi() {
        return "ok";
    }
}

Retry presets

The SDK ships with preset strategies for common cases:

import { withDurableExecution, retryPresets } from "@aws/durable-execution-sdk-js";

export const handler = withDurableExecution(async (event, context) => {
  // Default: 6 attempts, 5s initial delay, 60s max, 2x backoff, full jitter
  const result = await context.step(
    "call-api",
    async () => callApi(),
    { retryStrategy: retryPresets.default },
  );

  // No retry: fail immediately on first error
  const critical = await context.step(
    "charge-payment",
    async () => chargePayment(),
    { retryStrategy: retryPresets.noRetry },
  );

  return { result, critical };
});

async function callApi(): Promise<string> { return "ok"; }
async function chargePayment(): Promise<string> { return "charged"; }

retryPresets.default 6 attempts, 5s initial delay, 60s max, 2x backoff, full jitter.

retryPresets.noRetry 1 attempt, fails immediately on error.

from aws_durable_execution_sdk_python.retries import RetryPresets
from aws_durable_execution_sdk_python.config import StepConfig

# No retries
step_config = StepConfig(retry_strategy=RetryPresets.none())

# Default retries (6 attempts, 5s initial delay)
step_config = StepConfig(retry_strategy=RetryPresets.default())

# Quick retries for transient errors (3 attempts)
step_config = StepConfig(retry_strategy=RetryPresets.transient())

# Longer retries for resource availability (5 attempts, up to 5 minutes)
step_config = StepConfig(retry_strategy=RetryPresets.resource_availability())

# Aggressive retries for critical operations (10 attempts)
step_config = StepConfig(retry_strategy=RetryPresets.critical())

RetryPresets.default() 6 attempts, 5s initial delay, 60s max, 2x backoff, full jitter.

RetryPresets.none() 1 attempt, fails immediately on error.

RetryPresets.transient() 3 attempts, 2x backoff, half jitter.

RetryPresets.resource_availability() 5 attempts, 5s initial delay, 5 min max, 2x backoff.

RetryPresets.critical() 10 attempts, 1s initial delay, 60s max, 1.5x backoff, no jitter.

import java.util.Map;
import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.config.StepConfig;
import software.amazon.lambda.durable.retry.RetryStrategies;

public class RetryPresetsExample extends DurableHandler<Map<String, Object>, Map<String, Object>> {

    @Override
    public Map<String, Object> handleRequest(Map<String, Object> event, DurableContext context) {
        // Default: 6 attempts, 5s initial delay, 60s max, 2x backoff, full jitter
        String result = context.step("call-api", String.class,
                stepCtx -> callApi(),
                StepConfig.builder().retryStrategy(RetryStrategies.Presets.DEFAULT).build());

        // No retry: fail immediately on first error
        String critical = context.step("charge-payment", String.class,
                stepCtx -> chargePayment(),
                StepConfig.builder().retryStrategy(RetryStrategies.Presets.NO_RETRY).build());

        return Map.of("result", result, "critical", critical);
    }

    private String callApi() { return "ok"; }
    private String chargePayment() { return "charged"; }
}

RetryStrategies.Presets.DEFAULT 6 attempts, 5s initial delay, 60s max, 2x backoff, full jitter.

RetryStrategies.Presets.NO_RETRY Fails immediately on first error.

Retry only specific errors

You can retry only certain error types and fail immediately on others.

Use retryableErrorTypes to specify which error classes to retry.

import {
  withDurableExecution,
  createRetryStrategy,
} from "@aws/durable-execution-sdk-js";

class RateLimitError extends Error {}
class ServiceUnavailableError extends Error {}

const retryStrategy = createRetryStrategy({
  maxAttempts: 5,
  initialDelay: { seconds: 2 },
  // Only retry these specific error types; all other errors fail immediately
  retryableErrorTypes: [RateLimitError, ServiceUnavailableError],
});

export const handler = withDurableExecution(async (event, context) => {
  const result = await context.step(
    "call-api",
    async () => {
      // Throws RateLimitError or ServiceUnavailableError on transient failures
      return callApi();
    },
    { retryStrategy },
  );
  return result;
});

async function callApi(): Promise<string> {
  return "ok";
}

Use retryable_error_types to specify which exception classes to retry.

from aws_durable_execution_sdk_python.config import StepConfig
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
from aws_durable_execution_sdk_python.retries import RetryStrategyConfig, create_retry_strategy


class RateLimitError(Exception):
    pass


class ServiceUnavailableError(Exception):
    pass


retry_strategy = create_retry_strategy(
    RetryStrategyConfig(
        max_attempts=5,
        # Only retry these specific error types; all other errors fail immediately
        retryable_error_types=[RateLimitError, ServiceUnavailableError],
    )
)

step_config = StepConfig(retry_strategy=retry_strategy)


@durable_step
def call_api(step_context: StepContext) -> str:
    # Raises RateLimitError or ServiceUnavailableError on transient failures
    return "ok"


@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> str:
    return context.step(call_api(), config=step_config)

RetryStrategy is a functional interface. Check the error type in the lambda and return RetryDecision.fail() for errors you do not want to retry.

import java.time.Duration;
import java.util.Map;
import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.config.StepConfig;
import software.amazon.lambda.durable.retry.JitterStrategy;
import software.amazon.lambda.durable.retry.RetryDecision;
import software.amazon.lambda.durable.retry.RetryStrategies;
import software.amazon.lambda.durable.retry.RetryStrategy;

public class RetrySpecificErrorsExample extends DurableHandler<Map<String, Object>, String> {

    static class RateLimitException extends RuntimeException {
        public RateLimitException(String message) { super(message); }
    }

    static class ServiceUnavailableException extends RuntimeException {
        public ServiceUnavailableException(String message) { super(message); }
    }

    // RetryStrategy is a functional interface: (Throwable error, int attempt) -> RetryDecision
    // Filter by error type manually, then delegate to exponential backoff for the delay.
    private static final RetryStrategy retryStrategy = (error, attempt) -> {
        if (!(error instanceof RateLimitException) && !(error instanceof ServiceUnavailableException)) {
            return RetryDecision.fail(); // all other errors fail immediately
        }
        return RetryStrategies.exponentialBackoff(5, Duration.ofSeconds(2), Duration.ofMinutes(1), 2.0, JitterStrategy.FULL)
                .makeRetryDecision(error, attempt);
    };

    @Override
    public String handleRequest(Map<String, Object> event, DurableContext context) {
        return context.step("call-api", String.class,
                stepCtx -> callApi(),
                StepConfig.builder().retryStrategy(retryStrategy).build());
    }

    private String callApi() { return "ok"; }
}

See also