Getting started
Table of Contents
Overview
This guide explains the fundamental concepts behind durable execution and how the SDK works. You'll understand:
- The difference between
aws-durable-execution-sdk-pythonandaws-durable-execution-sdk-python-testing - How checkpoints and replay enable reliable workflows
- Why your function code runs multiple times but side effects happen once
- The development workflow from writing to testing to deployment
The two SDKs
The durable execution ecosystem has two separate packages:
Execution SDK (aws-durable-execution-sdk-python)
This is the core SDK that runs in your Lambda functions. It provides:
DurableContext- The main interface for durable operations- Operations - Steps, waits, callbacks, parallel, map, child contexts
- Decorators -
@durable_execution,@durable_step, etc. - Configuration - StepConfig, CallbackConfig, retry strategies
- Serialization - How data is saved in checkpoints
Install it in your Lambda deployment package:
Testing SDK (aws-durable-execution-sdk-python-testing)
This is a separate SDK for testing your durable functions. It provides:
DurableFunctionTestRunner- Run functions locally without AWSDurableFunctionCloudTestRunner- Test deployed Lambda functions- Pytest integration - Fixtures and markers for writing tests
- Result inspection - Examine execution state and operation results
Install it in your development environment only:
Key distinction: The execution SDK runs in production Lambda. The testing SDK runs on your laptop or CI/CD. They're separate concerns.
How durable execution works
Let's trace through a simple workflow to understand the execution model:
First invocation (t=0s):
- Lambda invokes your function
fetch_dataexecutes and calls an external API- Result is checkpointed to AWS
context.wait(Duration.from_seconds(30))is reached- Function returns, Lambda can recycle the environment
Second invocation (t=30s):
- Lambda invokes your function again
- Function code runs from the beginning
fetch_datareturns the checkpointed result instantly (no API call)context.wait(Duration.from_seconds(30))is already complete, execution continuesprocess_dataexecutes for the first time- Result is checkpointed
- Function returns the final result
Key insights:
- Your function code runs twice, but
fetch_dataonly calls the API once - The wait doesn't block Lambda - your environment can be recycled
- You write linear code that looks synchronous
- The SDK handles all the complexity of state management
Your development workflow
flowchart LR
subgraph dev["Development (Local)"]
direction LR
A["1. Write Function<br/>aws-durable-execution-sdk-python"]
B["2. Write Tests<br/>aws-durable-execution-sdk-python-testing"]
C["3. Run Tests<br/>pytest"]
end
subgraph prod["Production (AWS)"]
direction LR
D["4. Deploy<br/>SAM/CDK/Terraform"]
E["5. Test in Cloud<br/>pytest --runner-mode=cloud"]
end
A --> B --> C --> D --> E
style dev fill:#e3f2fd
style prod fill:#fff3e0
Here's how you build and test durable functions:
1. Write your function (execution SDK)
Install the execution SDK and write your Lambda handler:
from aws_durable_execution_sdk_python import (
DurableContext,
durable_execution,
durable_step,
)
@durable_step
def my_step(step_context, data):
# Your business logic
return result
@durable_execution
def handler(event, context: DurableContext):
result = context.step(my_step(event["data"]))
return result
2. Test locally (testing SDK)
Install the testing SDK and write tests:
import pytest
from aws_durable_execution_sdk_python.execution import InvocationStatus
from my_function import handler
@pytest.mark.durable_execution(handler=handler, lambda_function_name="my_function")
def test_my_function(durable_runner):
with durable_runner:
result = durable_runner.run(input={"data": "test"}, timeout=10)
assert result.status == InvocationStatus.SUCCEEDED
Run tests without AWS credentials:
3. Deploy to Lambda
Package your function with the execution SDK (not the testing SDK) and deploy using your preferred tool (SAM, CDK, Terraform, etc.).
4. Test in the cloud (optional)
Run the same tests against your deployed function:
export AWS_REGION=us-west-2
export QUALIFIED_FUNCTION_NAME="MyFunction:$LATEST"
export LAMBDA_FUNCTION_TEST_NAME="my_function"
pytest --runner-mode=cloud test_my_function.py
Quick start
Ready to build your first durable function? Here's a minimal example:
from aws_durable_execution_sdk_python import (
DurableContext,
durable_execution,
durable_step,
StepContext,
)
@durable_step
def greet_user(step_context: StepContext, name: str) -> str:
"""Generate a greeting."""
return f"Hello {name}!"
@durable_execution
def handler(event: dict, context: DurableContext) -> str:
"""Simple durable function."""
name = event.get("name", "World")
greeting = context.step(greet_user(name))
return greeting
Deploy this to Lambda and you have a durable function. The greet_user step is checkpointed automatically.
Using a custom boto3 Lambda client
If you need to customize the boto3 Lambda client used for durable execution operations (for example, to configure custom endpoints, retry settings, or credentials), you can pass a boto3_client parameter to the decorator. The client must be a boto3 Lambda client:
from aws_durable_execution_sdk_python import (
DurableContext,
durable_execution,
durable_step,
StepContext,
)
@durable_step
def greet_user(step_context: StepContext, name: str) -> str:
"""Generate a greeting."""
return f"Hello {name}!"
@durable_execution
def handler(event: dict, context: DurableContext) -> str:
"""Simple durable function."""
name = event.get("name", "World")
greeting = context.step(greet_user(name))
return greeting
The custom Lambda client is used for all checkpoint and state management operations. If you don't provide a boto3_client, the SDK initializes a default Lambda client from your environment.
Next steps
Now that you've built your first durable function, explore the core features:
Learn the operations: - Steps - Execute code with retry strategies and checkpointing - Wait operations - Pause execution for seconds, minutes, or hours - Callbacks - Wait for external systems to respond - Child contexts - Organize complex workflows - Parallel operations - Run multiple operations concurrently - Map operations - Process collections in parallel
Dive deeper: - Error handling - Handle failures and implement retry strategies - Testing patterns - Write effective tests for your workflows - Best practices - Avoid common pitfalls
See also
- Documentation index - Browse all guides and examples
- Architecture diagrams - Class diagrams and concurrency flows
- Logger integration - Replay-safe structured logging
- Examples directory - More working examples