The GenAI Accelerator L3 Construct provides a comprehensive, secure, and scalable foundation for deploying Generative AI applications on AWS. This construct enables rapid deployment of GenAI backends including chat interfaces, RAG (Retrieval-Augmented Generation) capabilities, LLM integrations, and enterprise-grade security features.

The GenAI Accelerator follows a modern, event-driven serverless architecture designed for scalability, security, and cost optimization. The system is built around three core interaction patterns: real-time chat via WebSocket, RESTful API operations, and asynchronous document processing.
The GenAI Accelerator deploys a comprehensive set of AWS resources organized into logical categories. The following sections detail each component's purpose, functionality, and dependencies.
These are the essential components deployed in every GenAI Accelerator installation:
Client UI S3 Bucket & CloudFront Distribution - S3 bucket and CloudFront distribution provisioned for hosting a client-facing web application. No UI is included with MDAA; this infrastructure is ready for deployment of your own custom UI.
Admin UI S3 Bucket & CloudFront Distribution - S3 bucket and CloudFront distribution provisioned for hosting an administrative dashboard. No UI is included with MDAA; this infrastructure is ready for deployment of your own custom UI.
These components are deployed based on configuration and feature requirements:
The GenAI Accelerator deploys two WAF Web ACLs:
us-east-1)If you deploy to us-east-1, both WAFs are created automatically in the same stack. No additional configuration needed.
When deploying to a region other than us-east-1, MDAA will use a cross-region stack in us-east-1 for the Global WAF. This requires your MDAA configuration to have cross-region stacks pre-configured via additional_stacks in your module configuration:
# In mdaa.yaml, at the module level:
modules:
gaia2:
cdk_app: "@aws-mdaa/gaia-v2"
additional_stacks:
- region: 'us-east-1' # Creates a cross-region stack for Global WAF
app_configs:
- ./gaia.yaml
For more details on additional_stacks configuration, see CONFIGURATION.md.
Note: CDK bootstrap must be established in us-east-1 for your account before deployment.
If cross-region stacks are not configured, deployment will fail with:
Error: CloudFront WAF requires a cross-region stack when your primary region is not us-east-1
(CloudFront WAF resources must be deployed in us-east-1).
Create a WAF Web ACL in us-east-1 manually and reference it in your configuration:
gaia:
waf:
skipGlobalDefaultWaf: true
globalWafArn: "arn:aws:wafv2:us-east-1:123456789012:global/webacl/my-global-waf/a1b2c3d4-..."
To create a Global WAF via AWS CLI:
# Create IP Set for allowed CIDRs (us-east-1, CLOUDFRONT scope)
AWS_PAGER="" aws wafv2 create-ip-set \
--name "gaia-global-ip-allowlist" \
--scope CLOUDFRONT \
--ip-address-version IPV4 \
--addresses "203.0.113.0/24" "198.51.100.0/24" \
--region us-east-1
# Note the IP Set ARN from the output, then create the Web ACL
AWS_PAGER="" aws wafv2 create-web-acl \
--name "gaia-global-waf" \
--scope CLOUDFRONT \
--default-action Block={} \
--visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=gaia-global-waf \
--rules '[{
"Name": "IPAllowList",
"Priority": 0,
"Statement": {
"IPSetReferenceStatement": {
"ARN": "<IP_SET_ARN_FROM_PREVIOUS_COMMAND>"
}
},
"Action": { "Allow": {} },
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "IPAllowList"
}
}]' \
--region us-east-1
If you're using AWS Firewall Manager or don't need CloudFront WAF protection:
gaia:
waf:
skipGlobalDefaultWaf: true
# globalWafArn: omit this to have no WAF on CloudFront
Security note: With this configuration, your CloudFront distributions (client UI and admin UI) have no WAF protection. There is no IP allowlisting, rate limiting, or managed rule coverage at the CDN layer. Anyone on the internet can reach the login page and static assets. Cognito authentication still protects the application, but the attack surface is larger. AWS Shield Standard provides basic DDoS protection on CloudFront, but this is much coarser than WAF rules. For production deployments, prefer Option 1, 2, or 3.
The Regional WAF (for API Gateway) is created in your deployment region by default. To use an existing WAF:
gaia:
waf:
skipRegionalDefaultWaf: true
regionalWafArn: "arn:aws:wafv2:ca-central-1:123456789012:regional/webacl/my-regional-waf/..."
If you use WAF with IP allowlisting (allowedCidrs), you must include your NAT Gateway's public IP address.
Lambda functions send WebSocket responses through AppSync, which is protected by WAF. The Lambda traffic appears to come from your NAT Gateway's public IP. If this IP is not in allowedCidrs, Lambda receives FORBIDDEN errors and chat responses never reach users.
Symptoms of misconfiguration:
FORBIDDEN errors when calling AppSyncSolution: Add your NAT Gateway IP to allowedCidrs:
# Find your NAT Gateway public IP
AWS_PAGER="" aws ec2 describe-nat-gateways \
--query 'NatGateways[*].[NatGatewayId,NatGatewayAddresses[0].PublicIp]' \
--output table
The GenAI Accelerator deploys Lambda functions inside your VPC. These functions need outbound internet access to reach AWS services including AppSync Events (for WebSocket responses).
Key requirement: Subnets must have a route to a NAT Gateway.
Note that connecting a function to a public subnet doesn't automatically give it an internet access. — AWS Lambda Documentation
Lambda functions in a VPC use elastic network interfaces (ENIs) that do not receive public IP addresses. To reach the internet, Lambda traffic must route through a NAT Gateway.
The subnet's route table must have a route like 0.0.0.0/0 → nat-xxxxx for Lambda to have outbound internet access.
Additionally, AppSync Events does not have a VPC endpoint, so Lambda must reach it via the public internet. This makes NAT Gateway mandatory for GAIA v2.
If you're using AWS RAM to share a VPC from a network account to your workload account, you must specify the vpcOwnerAccountId in your configuration:
gaia:
vpc:
vpcId: "vpc-12345678"
appSubnets:
- "subnet-aaaaaaaa"
- "subnet-bbbbbbbb"
vpcOwnerAccountId: "222222222222" # Network account that owns the VPC
Why is this needed?
When Lambda creates ENIs in shared subnets, the IAM policy condition ec2:Vpc must reference the VPC ARN with the network account ID (the account that owns the VPC), not the workload account ID. Without this setting, Lambda functions will fail to create network interfaces with an authorization error.
When to use:
When NOT needed:
┌─────────────────────────────────────────────────────────────┐
│ VPC │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Subnet (NAT GW) │ │ Subnet (Lambda) │ │
│ │ ┌───────────────┐ │ │ ┌───────────────┐ │ │
│ │ │ NAT Gateway │◄─┼────┼──│ Lambda │ │ │
│ │ └───────┬───────┘ │ │ └───────────────┘ │ │
│ └──────────┼──────────┘ └─────────────────────┘ │
│ │ Route: 0.0.0.0/0 → nat-xxx │
└─────────────┼───────────────────────────────────────────────┘
│
▼
Internet Gateway → AWS Services (AppSync, Bedrock, etc.)
The GenAI Accelerator supports multiple data source types for the WebSocket chat API. Each data source is implemented as a Lambda function that processes incoming messages and sends responses via AppSync Events.
| Data Source | Description | Use Case |
|---|---|---|
bedrockRagDataSource |
RAG with Bedrock Knowledge Base | Document Q&A with citations |
invokeModelDataSource |
Direct Bedrock model invocation | General chat, no KB needed |
customDataSource |
Bring your own Lambda | Custom processing logic |
Uses Bedrock's RetrieveAndGenerate API to query a Knowledge Base and generate responses with source citations.
Features:
Required Configuration:
| Property | Description |
|---|---|
modelId |
Bedrock model ID for generation |
lambdaRole |
IAM role reference for Lambda execution |
Optional Configuration - RAG Settings:
| Property | Description |
|---|---|
guardrailId |
Bedrock Guardrail ID for content safety |
guardrailKmsKeyArn |
KMS key ARN for Guardrail encryption |
guardrailVersion |
Guardrail version to use |
displayInlineCitations |
Show citation markers in responses (default: false) |
kbNumberOfResults |
Number of documents to retrieve from KB (1-100) |
promptTemplate |
Custom prompt for response generation |
Optional Configuration - Model Inference:
| Property | Description |
|---|---|
inferenceMaxTokens |
Maximum tokens in response |
inferenceTemperature |
Randomness: 0=deterministic, 1=creative (0.0-1.0) |
inferenceTopP |
Nucleus sampling threshold (0.0-1.0) |
Optional Configuration - Orchestration (Advanced RAG):
| Property | Description |
|---|---|
orchestrationPromptTemplate |
Custom prompt for query transformation |
orchestrationInferenceMaxTokens |
Max tokens for orchestration inference |
orchestrationInferenceTemperature |
Temperature for orchestration (0.0-1.0) |
orchestrationInferenceTopP |
Top-p for orchestration (0.0-1.0) |
orchestrationInferenceStopSequences |
Stop sequences for orchestration |
orchestrationPerformanceLatency |
Performance vs latency (standard or optimized) |
orchestrationQueryTransformationType |
Query transformation type (e.g., QUERY_DECOMPOSITION) |
Optional Configuration - Lambda Settings:
| Property | Description |
|---|---|
lambdaArchitecture |
ARM_64 or X86_64 (default: X86_64) |
pythonRuntime |
Python runtime version (default: Python 3.13) |
lambdaTimeoutInSeconds |
Lambda timeout (default: 600s) |
lambdaMemorySize |
Lambda memory in MB (default: 1024) |
provisionedConcurrentExecutions |
Pre-warmed Lambda instances |
reservedConcurrentExecutions |
Reserved concurrent executions |
Direct invocation of Bedrock foundation models via invokeModelWithResponseStream. Simpler configuration when RAG is not needed.
Features:
Required Configuration:
| Property | Description |
|---|---|
modelId |
Bedrock model ID (e.g., anthropic.claude-3-sonnet-20240229-v1:0) |
lambdaRole |
IAM role reference for Lambda execution |
Optional Configuration - Lambda Settings:
| Property | Description |
|---|---|
lambdaArchitecture |
ARM_64 or X86_64 (default: X86_64) |
pythonRuntime |
Python runtime version (default: Python 3.13) |
lambdaTimeoutInSeconds |
Lambda timeout (default: 600s) |
lambdaMemorySize |
Lambda memory in MB (default: 1024) |
provisionedConcurrentExecutions |
Pre-warmed Lambda instances |
reservedConcurrentExecutions |
Reserved concurrent executions |
Allows integration of a custom Lambda function for complete control over message processing.
Required Configuration:
| Property | Description |
|---|---|
lambdaArn |
ARN of your custom Lambda function |
All REST API endpoints are served under the /v1 prefix and require Cognito authentication.
By default, the REST API requires requests to come through CloudFront, which adds an X-Origin-Verify header for origin verification. This provides defense-in-depth security.
To allow direct API Gateway access (bypassing CloudFront), set the Lambda environment variable:
REQUIRE_ORIGIN_VERIFY=false
When disabled:
When enabled (default):
X-Origin-Verify header (added by CloudFront)The Sessions API manages chat conversation history. Each session represents a conversation thread with the AI, storing the full message history for context continuity. Sessions are automatically created when users start chatting and can be retrieved to resume conversations or review past interactions.
| Method | Endpoint | Description | Auth |
|---|---|---|---|
GET |
/v1/sessions |
List all sessions for the current user | User |
GET |
/v1/sessions/<session_id> |
Get a specific session with full chat history | User |
DELETE |
/v1/sessions |
Delete all sessions for the current user | User |
DELETE |
/v1/sessions/<session_id> |
Delete a specific session | User |
GET |
/v1/admin/sessions |
List all sessions across all users (paginated) | Admin |
GET |
/v1/users/<user_id>/sessions |
List all sessions for a specific user | Admin |
GET |
/v1/users/<user_id>/sessions/<session_id> |
Get a specific session for any user | Admin |
Session list response:
{
"id": "session-uuid",
"title": "First message from user...",
"dateModified": 1703894400
}
Session detail response (includes full chat history):
{
"id": "session-uuid",
"title": "First message from user...",
"dateModified": 1703894400,
"userId": "user-sub",
"history": [
{
"id": "message-uuid",
"role": "user",
"content": "What is Amazon S3?"
},
{
"id": "message-uuid",
"role": "assistant",
"content": "Amazon S3 is...",
"parts": [
{
"type": "text",
"text": "Amazon S3 is...",
"citationSpans": [{"span": {...}, "refIndex": 0}]
},
{
"type": "source",
"documents": [{"title": "...", "uri": "s3://...", "excerpt": "..."}]
}
],
"metadata": {
"modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"responseTimeMs": 1523
}
}
]
}
History message fields:
parts (assistant messages only): Rich content array for RAG responses
type: "text": Contains the response text and citationSpans for inline citation positionstype: "source": Contains retrieved documents with title, URI, and excerptmetadata (assistant messages only): Response metadata including modelId and responseTimeMsThe Feedback API enables collecting user ratings on AI responses for quality monitoring and improvement. Users can submit thumbs up/down ratings with configurable reasons (e.g., "accuracy", "unhelpful") and optional free-text feedback. This data can be used for analytics dashboards, model fine-tuning decisions, and compliance auditing.
Configure available feedback reasons in your gaia.yaml:
gaia:
userFeedback:
reasons:
- "accuracy"
- "unhelpful"
- "app_issue"
- "other"
| Method | Endpoint | Description | Auth |
|---|---|---|---|
POST |
/v1/feedback |
Submit feedback for an AI response | User |
GET |
/v1/feedback |
Get current user's feedback history (paginated) | User |
GET |
/v1/feedback/<feedback_id> |
Get a specific feedback entry | User/Admin |
GET |
/v1/admin/feedback |
Get all feedback in a date range (paginated) | Admin |
Feedback payload:
{
"session_id": "uuid",
"message_id": "uuid",
"rating": "thumbs_up | thumbs_down",
"reason": "accuracy",
"text_feedback": "Optional detailed feedback"
}
Admin-only endpoints for managing service interruptions. Requires SERVICE_INTERRUPTION_TABLE_NAME to be configured.
| Method | Endpoint | Description | Auth |
|---|---|---|---|
POST |
/v1/bot-management/interruptions |
Activate a service interruption | Admin |
GET |
/v1/bot-management/interruptions |
Get status of all service interruptions | Admin |
GET |
/v1/bot-management/interruptions/<service_type> |
Get status of a specific service interruption | Admin |
DELETE |
/v1/bot-management/interruptions/<service_type> |
Deactivate a service interruption | Admin |
GET |
/v1/bot-management/service-types |
List available service types | Admin |
GET |
/v1/bot-management/health |
Health check for bot management | Admin |
Valid service types: global, bedrock-rag, bedrock-invoke-model, bedrock-strands-agents
The GenAI Accelerator uses AWS AppSync Events for real-time chat communication. Clients connect via WebSocket to send messages and receive streaming AI responses.
Connect to the AppSync Events endpoint using the WebSocket URL from your deployment outputs. Authentication is via Cognito JWT token in the Authorization header.
Channels follow the pattern /namespace/{userId}/{sessionId}:
| Namespace | Purpose |
|---|---|
/in/{userId}/{sessionId} |
Client publishes messages to this channel (ingress) |
/out/{userId}/{sessionId} |
Client subscribes to this channel to receive responses (egress) |
Users can only subscribe to channels matching their own userId (enforced by the sub claim in the JWT).
/in)Publish a message to start a chat interaction:
{
"payload": {
"text": "What is Amazon S3?",
"sessionId": "550e8400-e29b-41d4-a716-446655440000"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | The user's message/question |
sessionId |
string | Yes | UUID identifying the chat session |
/out)Subscribe to the /out/{userId}/{sessionId} channel to receive streaming responses. Messages arrive as events with the following structure:
{
"id": "content-uuid",
"content": {
"type": "<message_type>",
...
}
}
textDelta - Streaming Text ChunkSent incrementally as the AI generates its response. Collect and concatenate these to build the full response.
{
"id": "abc123",
"content": {
"type": "textDelta",
"sequenceNumber": 0,
"text": "Amazon S3 is "
}
}
| Field | Description |
|---|---|
type |
Always "textDelta" |
sequenceNumber |
Incrementing number for ordering chunks |
text |
The text fragment for this chunk |
When textDelta messages stop arriving, the response is complete. There is no explicit "end" message for text streaming.
citationEvent - Inline Citation Marker (RAG only)Sent when inline citations are enabled (displayInlineCitations: true). Indicates where a citation should be inserted in the response text.
{
"id": "abc123",
"content": {
"type": "citationEvent",
"sequenceNumber": 5,
"position": 142,
"citationNumber": 1,
"citationText": "relevant excerpt from source",
"documentIndex": 0
}
}
| Field | Description |
|---|---|
position |
Character position in the response where citation applies |
citationNumber |
Display number for the citation (1-indexed) |
citationText |
The text span this citation covers |
documentIndex |
Index into the source documents array |
source - Retrieved Documents (RAG only)Sent after text streaming completes. Contains the source documents used for RAG responses.
{
"id": "abc123",
"content": {
"type": "source",
"documents": [
{
"title": "Document Title",
"uri": "s3://bucket/path/to/doc.pdf",
"excerpt": "Relevant excerpt from the document..."
}
]
}
}
error - Error ResponseSent when an error occurs during processing.
{
"type": "text",
"action": "error",
"id": "error-uuid",
"connectionId": "session-id",
"userId": "user-id",
"timestamp": 1703894400,
"data": {
"sessionId": "session-id",
"content": "Error message describing what went wrong",
"type": "text"
}
}
/in/{userId}/{sessionId}textDelta messages (streaming response)citationEvent messages (if RAG with inline citations)source message with documents (if RAG)When a service interruption is active, the response will be a single textDelta containing the interruption message instead of an AI-generated response.