Retry Strategies¶
Retry suspends invocation¶
When a step throws an exception, the SDK uses the step's retry strategy to define the retry behaviour. When the strategy logic requires a retry, the SDK checkpoints the error and the scheduled resume time, and then ends the Lambda invocation. The backend starts a new invocation for the execution at the scheduled resume time and the SDK replays the step body.
Retries do not consume Lambda execution time while waiting for the next retry.
When a step exhausts all retry attempts, the SDK checkpoints the final error and throws it to your handler. If you configure no retry strategy on a step, any exception propagates immediately without retrying.
Configure a retry strategy¶
A retry strategy is a function that takes the error and the current attempt number, and returns a decision. The decision is either to retry with a given delay, or to stop. You can write a retry strategy directly yourself or use the built-in helper to build a ready-made retry strategy from configuration.
RetryStrategy helper¶
Use createRetryStrategy() to build a strategy, then pass it as retryStrategy in
StepConfig.
import {
withDurableExecution,
createRetryStrategy,
StepConfig,
} from "@aws/durable-execution-sdk-js";
const retryStrategy = createRetryStrategy({
maxAttempts: 5,
initialDelay: { seconds: 2 },
maxDelay: { minutes: 1 },
backoffRate: 2,
});
const stepConfig: StepConfig<string> = { retryStrategy };
export const handler = withDurableExecution(async (event, context) => {
const result = await context.step(
"call-external-api",
async () => callExternalApi(),
stepConfig,
);
return result;
});
async function callExternalApi(): Promise<string> {
return "ok";
}
Use create_retry_strategy() with a RetryStrategyConfig, then pass it as
retry_strategy in StepConfig.
from aws_durable_execution_sdk_python.config import Duration, StepConfig
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
from aws_durable_execution_sdk_python.retries import RetryStrategyConfig, create_retry_strategy
retry_strategy = create_retry_strategy(
RetryStrategyConfig(
max_attempts=5,
initial_delay=Duration.from_seconds(2),
max_delay=Duration.from_minutes(1),
backoff_rate=2.0,
)
)
step_config = StepConfig(retry_strategy=retry_strategy)
@durable_step
def call_external_api(step_context: StepContext) -> str:
return "ok"
@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> str:
result = context.step(call_external_api(), config=step_config)
return result
Use RetryStrategies.exponentialBackoff() to build a strategy, then pass it to
StepConfig.builder().retryStrategy().
import java.time.Duration;
import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.StepContext;
import software.amazon.lambda.durable.config.StepConfig;
import software.amazon.lambda.durable.retry.JitterStrategy;
import software.amazon.lambda.durable.retry.RetryStrategies;
public class ExponentialBackoffExample extends DurableHandler<Object, String> {
@Override
public String handleRequest(Object input, DurableContext context) {
StepConfig config = StepConfig.builder()
.retryStrategy(RetryStrategies.exponentialBackoff(
3,
Duration.ofSeconds(1),
Duration.ofSeconds(10),
2.0,
JitterStrategy.FULL))
.build();
String result = context.step("retry_step", String.class,
(StepContext ctx) -> "Step with exponential backoff",
config);
return "Result: " + result;
}
}
RetryStrategyConfig signature¶
import {
RetryStrategyConfig,
JitterStrategy,
Duration,
} from "@aws/durable-execution-sdk-js";
// RetryStrategyConfig shape:
// {
// maxAttempts?: number // default: 3
// initialDelay?: Duration // default: { seconds: 5 }
// maxDelay?: Duration // default: { minutes: 5 }
// backoffRate?: number // default: 2
// jitter?: JitterStrategy // default: JitterStrategy.FULL
// retryableErrors?: (string | RegExp)[]
// retryableErrorTypes?: (new () => Error)[]
// }
Parameters:
maxAttempts(optional) Total attempts including the initial attempt. Default:3.initialDelay(optional) Delay before the first retry. Default:{ seconds: 5 }.maxDelay(optional) Maximum delay between retries. Default:{ minutes: 5 }.backoffRate(optional) Multiplier applied to the delay on each retry. Default:2.jitter(optional) AJitterStrategyvalue. Default:JitterStrategy.FULL.retryableErrors(optional) Array of strings orRegExppatterns matched against the error message. The SDK retries all errors when you set neitherretryableErrorsnorretryableErrorTypes.retryableErrorTypes(optional) Array of error classes. The SDK retries only errors that are instances of these classes. When you set both filters, the SDK retries an error if it matches either (OR logic).
import re
from dataclasses import dataclass
from aws_durable_execution_sdk_python.config import Duration, JitterStrategy
from aws_durable_execution_sdk_python.retries import RetryDecision
@dataclass
class RetryStrategyConfig:
max_attempts: int = 3
initial_delay: Duration = Duration.from_seconds(5)
max_delay: Duration = Duration.from_minutes(5)
backoff_rate: float = 2.0
jitter_strategy: JitterStrategy = JitterStrategy.FULL
retryable_errors: list[str | re.Pattern] | None = None
retryable_error_types: list[type[Exception]] | None = None
Parameters:
max_attempts(optional) Total attempts including the initial attempt. Default:3.initial_delay(optional) ADuration. Default:Duration.from_seconds(5).max_delay(optional) ADuration. Default:Duration.from_minutes(5).backoff_rate(optional) Multiplier applied to the delay on each retry. Default:2.0.jitter_strategy(optional) AJitterStrategyvalue. Default:JitterStrategy.FULL.retryable_errors(optional) List of strings or compiledre.Patternobjects matched against the error message. The SDK retries all errors when you set neitherretryable_errorsnorretryable_error_types.retryable_error_types(optional) List of exception classes. The SDK retries only exceptions that are instances of these classes. When you set both filters, the SDK retries an error if it matches either (OR logic).
RetryStrategy RetryStrategies.exponentialBackoff(
int maxAttempts,
Duration initialDelay,
Duration maxDelay,
double backoffRate,
JitterStrategy jitter
)
RetryStrategy RetryStrategies.fixedDelay(
int maxAttempts,
Duration fixedDelay
)
Parameters:
maxAttemptsTotal attempts including the initial attempt.initialDelayAjava.time.Duration. Minimum 1 second.maxDelayAjava.time.Duration. Minimum 1 second.backoffRateMultiplier applied to the delay on each retry.jitterAJitterStrategyvalue:FULL,HALF, orNONE.
Java does not have built-in error type filtering. Filter by error type manually inside
the RetryStrategy lambda. See Retrying specific errors.
JitterStrategy¶
Delay calculation¶
The SDK calculates the delay before each retry using exponential backoff with jitter:
base_delay = min(initial_delay × backoff_rate ^ (attempt - 1), max_delay)
final_delay = jitter(base_delay), minimum 1 second
JitterStrategy.FULLrandomizes the delay between 0 andbase_delay. This spreads retries across time and avoids many clients retrying simultaneously after a shared failure.JitterStrategy.HALFrandomizes between 50% and 100% ofbase_delay.JitterStrategy.NONEuses the exact calculated delay.
Write a custom strategy¶
You can write your own retry strategy directly. The SDK calls it with the error and the current attempt number after each failure. The attempt number is one-indexed.
RetryStrategy signature¶
import software.amazon.lambda.durable.retry.RetryDecision;
import software.amazon.lambda.durable.retry.RetryStrategy;
// @FunctionalInterface
// interface RetryStrategy {
// RetryDecision makeRetryDecision(Throwable error, int attempt);
// }
// attempt is one-indexed: 1 on the first retry, 2 on the second, etc.
Example¶
Return { shouldRetry: false } to stop, or
{ shouldRetry: true, delay: { seconds: N } } to retry.
import {
withDurableExecution,
StepConfig,
RetryDecision,
} from "@aws/durable-execution-sdk-js";
// A retry strategy is a plain function: (error: Error, attemptCount: number) => RetryDecision
// attemptCount is 1-based: 1 on the first retry, 2 on the second, etc.
const customRetryStrategy = (error: Error, attemptCount: number): RetryDecision => {
if (attemptCount >= 4) {
return { shouldRetry: false };
}
// Fixed 2-second delay regardless of attempt number
return { shouldRetry: true, delay: { seconds: 2 } };
};
const stepConfig: StepConfig<string> = { retryStrategy: customRetryStrategy };
export const handler = withDurableExecution(async (event, context) => {
const result = await context.step(
"call-api",
async () => callApi(),
stepConfig,
);
return result;
});
async function callApi(): Promise<string> {
return "ok";
}
Use RetryDecision.retry(Duration) or RetryDecision.no_retry().
from aws_durable_execution_sdk_python.config import Duration, StepConfig
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
from aws_durable_execution_sdk_python.retries import RetryDecision
# A retry strategy is a plain callable: (Exception, int) -> RetryDecision
# attempt_count is 1-based: 1 on the first retry, 2 on the second, etc.
def custom_retry_strategy(error: Exception, attempt_count: int) -> RetryDecision:
if attempt_count >= 4:
return RetryDecision.no_retry()
# Fixed 2-second delay regardless of attempt number
return RetryDecision.retry(Duration.from_seconds(2))
step_config = StepConfig(retry_strategy=custom_retry_strategy)
@durable_step
def call_api(step_context: StepContext) -> str:
return "ok"
@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> str:
return context.step(call_api(), config=step_config)
Use RetryDecision.retry(Duration) or RetryDecision.fail().
import java.time.Duration;
import java.util.Map;
import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.config.StepConfig;
import software.amazon.lambda.durable.retry.RetryDecision;
import software.amazon.lambda.durable.retry.RetryStrategy;
public class CustomRetryStrategyExample extends DurableHandler<Map<String, Object>, String> {
// RetryStrategy is a functional interface: (Throwable error, int attempt) -> RetryDecision
// attempt is 1-based: 1 on the first retry, 2 on the second, etc.
private static final RetryStrategy customRetryStrategy = (error, attempt) -> {
if (attempt >= 4) {
return RetryDecision.fail();
}
// Fixed 2-second delay regardless of attempt number
return RetryDecision.retry(Duration.ofSeconds(2));
};
@Override
public String handleRequest(Map<String, Object> event, DurableContext context) {
return context.step("call-api", String.class,
stepCtx -> callApi(),
StepConfig.builder().retryStrategy(customRetryStrategy).build());
}
private String callApi() {
return "ok";
}
}
Retry presets¶
The SDK ships with preset strategies for common cases:
import { withDurableExecution, retryPresets } from "@aws/durable-execution-sdk-js";
export const handler = withDurableExecution(async (event, context) => {
// Default: 6 attempts, 5s initial delay, 60s max, 2x backoff, full jitter
const result = await context.step(
"call-api",
async () => callApi(),
{ retryStrategy: retryPresets.default },
);
// No retry: fail immediately on first error
const critical = await context.step(
"charge-payment",
async () => chargePayment(),
{ retryStrategy: retryPresets.noRetry },
);
return { result, critical };
});
async function callApi(): Promise<string> { return "ok"; }
async function chargePayment(): Promise<string> { return "charged"; }
retryPresets.default 6 attempts, 5s initial delay, 60s max, 2x backoff, full
jitter.
retryPresets.noRetry 1 attempt, fails immediately on error.
from aws_durable_execution_sdk_python.retries import RetryPresets
from aws_durable_execution_sdk_python.config import StepConfig
# No retries
step_config = StepConfig(retry_strategy=RetryPresets.none())
# Default retries (6 attempts, 5s initial delay)
step_config = StepConfig(retry_strategy=RetryPresets.default())
# Quick retries for transient errors (3 attempts)
step_config = StepConfig(retry_strategy=RetryPresets.transient())
# Longer retries for resource availability (5 attempts, up to 5 minutes)
step_config = StepConfig(retry_strategy=RetryPresets.resource_availability())
# Aggressive retries for critical operations (10 attempts)
step_config = StepConfig(retry_strategy=RetryPresets.critical())
RetryPresets.default() 6 attempts, 5s initial delay, 60s max, 2x backoff, full
jitter.
RetryPresets.none() 1 attempt, fails immediately on error.
RetryPresets.transient() 3 attempts, 2x backoff, half jitter.
RetryPresets.resource_availability() 5 attempts, 5s initial delay, 5 min max, 2x
backoff.
RetryPresets.critical() 10 attempts, 1s initial delay, 60s max, 1.5x backoff, no
jitter.
import java.util.Map;
import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.config.StepConfig;
import software.amazon.lambda.durable.retry.RetryStrategies;
public class RetryPresetsExample extends DurableHandler<Map<String, Object>, Map<String, Object>> {
@Override
public Map<String, Object> handleRequest(Map<String, Object> event, DurableContext context) {
// Default: 6 attempts, 5s initial delay, 60s max, 2x backoff, full jitter
String result = context.step("call-api", String.class,
stepCtx -> callApi(),
StepConfig.builder().retryStrategy(RetryStrategies.Presets.DEFAULT).build());
// No retry: fail immediately on first error
String critical = context.step("charge-payment", String.class,
stepCtx -> chargePayment(),
StepConfig.builder().retryStrategy(RetryStrategies.Presets.NO_RETRY).build());
return Map.of("result", result, "critical", critical);
}
private String callApi() { return "ok"; }
private String chargePayment() { return "charged"; }
}
RetryStrategies.Presets.DEFAULT 6 attempts, 5s initial delay, 60s max, 2x backoff,
full jitter.
RetryStrategies.Presets.NO_RETRY Fails immediately on first error.
Retry only specific errors¶
You can retry only certain error types and fail immediately on others.
Use retryableErrorTypes to specify which error classes to retry.
import {
withDurableExecution,
createRetryStrategy,
} from "@aws/durable-execution-sdk-js";
class RateLimitError extends Error {}
class ServiceUnavailableError extends Error {}
const retryStrategy = createRetryStrategy({
maxAttempts: 5,
initialDelay: { seconds: 2 },
// Only retry these specific error types; all other errors fail immediately
retryableErrorTypes: [RateLimitError, ServiceUnavailableError],
});
export const handler = withDurableExecution(async (event, context) => {
const result = await context.step(
"call-api",
async () => {
// Throws RateLimitError or ServiceUnavailableError on transient failures
return callApi();
},
{ retryStrategy },
);
return result;
});
async function callApi(): Promise<string> {
return "ok";
}
Use retryable_error_types to specify which exception classes to retry.
from aws_durable_execution_sdk_python.config import StepConfig
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
from aws_durable_execution_sdk_python.retries import RetryStrategyConfig, create_retry_strategy
class RateLimitError(Exception):
pass
class ServiceUnavailableError(Exception):
pass
retry_strategy = create_retry_strategy(
RetryStrategyConfig(
max_attempts=5,
# Only retry these specific error types; all other errors fail immediately
retryable_error_types=[RateLimitError, ServiceUnavailableError],
)
)
step_config = StepConfig(retry_strategy=retry_strategy)
@durable_step
def call_api(step_context: StepContext) -> str:
# Raises RateLimitError or ServiceUnavailableError on transient failures
return "ok"
@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> str:
return context.step(call_api(), config=step_config)
RetryStrategy is a functional interface. Check the error type in the lambda and return
RetryDecision.fail() for errors you do not want to retry.
import java.time.Duration;
import java.util.Map;
import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.config.StepConfig;
import software.amazon.lambda.durable.retry.JitterStrategy;
import software.amazon.lambda.durable.retry.RetryDecision;
import software.amazon.lambda.durable.retry.RetryStrategies;
import software.amazon.lambda.durable.retry.RetryStrategy;
public class RetrySpecificErrorsExample extends DurableHandler<Map<String, Object>, String> {
static class RateLimitException extends RuntimeException {
public RateLimitException(String message) { super(message); }
}
static class ServiceUnavailableException extends RuntimeException {
public ServiceUnavailableException(String message) { super(message); }
}
// RetryStrategy is a functional interface: (Throwable error, int attempt) -> RetryDecision
// Filter by error type manually, then delegate to exponential backoff for the delay.
private static final RetryStrategy retryStrategy = (error, attempt) -> {
if (!(error instanceof RateLimitException) && !(error instanceof ServiceUnavailableException)) {
return RetryDecision.fail(); // all other errors fail immediately
}
return RetryStrategies.exponentialBackoff(5, Duration.ofSeconds(2), Duration.ofMinutes(1), 2.0, JitterStrategy.FULL)
.makeRetryDecision(error, attempt);
};
@Override
public String handleRequest(Map<String, Object> event, DurableContext context) {
return context.step("call-api", String.class,
stepCtx -> callApi(),
StepConfig.builder().retryStrategy(retryStrategy).build());
}
private String callApi() { return "ok"; }
}