Skip to content

Data Warehouse

Note: This documentation is also available in a rendered format here.

Deploys a secure Redshift-based data warehouse with KMS encryption, VPC networking, SAML federation, scheduled pause/resume actions, event notifications, and automated credential rotation for database users. Common scenarios include centralizing structured data for BI reporting, running complex analytical queries across large datasets, or providing federated SQL access to business analysts.


Deployed Resources

This module deploys and integrates the following resources:

Warehouse KMS Key - Customer-managed KMS key for warehouse resources.

Warehouse Bucket - S3 bucket for warehouse utility and maintenance operations.

Warehouse Logging Bucket (Optional) - S3 bucket for Redshift user activity audit logging. Uses SSE-S3 encryption (Redshift logging requirement).

Execution Roles - Externally managed IAM execution roles associated to the Redshift cluster for cross-service operations.

Warehouse Security Group - Controls network connectivity to the Redshift cluster.

Warehouse Subnet Group - Controls which subnets the cluster is deployed on.

Warehouse Parameter Group - Cluster configuration parameters controlling cluster behavior.

Warehouse Cluster - Redshift cluster conforming to the specified configuration.

Warehouse Cluster Scheduled Actions (Optional) - Scheduled actions to automatically pause and resume the Redshift cluster.

Warehouse Federation Roles (Optional) - IAM roles for SAML-based federated access to the Redshift cluster via IAM Identity Providers.

SNS Event Subscriptions (Optional) - EventBridge subscriptions for cluster and scheduled action event notifications.

Redshift DB Service Users (Optional) - Database users with credentials stored in Secrets Manager for programmatic access.

Warehouse Users (Optional) - Redshift user credentials with configurable automated secret rotation.

datawarehouse


  • Roles — Create IAM roles for Redshift execution roles or SAML federation access
  • Data Lake — Redshift can query data lake S3 buckets via Redshift Spectrum with execution roles
  • QuickSight Account — Connect QuickSight to Redshift as a data source via VPC connection
  • QuickSight Project — Create QuickSight data sources pointing to the Redshift cluster
  • OpenSearch — Deploy OpenSearch as a complementary analytics engine for full-text search and log analytics

Security/Compliance Details

This module is designed in alignment with MDAA security/compliance principles and CDK nag rulesets. Additional review is recommended prior to production deployment, ensuring organization-specific compliance requirements are met.

  • Encryption at Rest:
    • All cluster data encrypted with customer-managed KMS key
    • Warehouse and utility S3 buckets encrypted with KMS
    • Audit logging bucket uses SSE-S3 (Redshift requirement)
  • Encryption in Transit:
    • SSL enforced on all client connections via parameter group
  • Least Privilege:
    • Database user credentials stored in Secrets Manager with configurable automatic rotation
    • Execution roles scoped to specific Redshift operations
  • Separation of Duties:
    • SAML federation roles support SSO access with dynamic credential generation and group membership via SAML claims
    • Federation groups must pre-exist in the cluster
    • Event notifications via SNS for cluster management and security events with configurable severity filtering
  • Network Isolation:
    • Cluster deployed in VPC with configurable subnet group
    • Security group denies all ingress by default; access must be explicitly granted via CIDR or security group rules

AWS Service Endpoints

The following VPC endpoints may be required if public AWS service endpoint connectivity is unavailable (e.g., private subnets without NAT gateway, firewalled environments, or PrivateLink-only architectures):

AWS Service Endpoint Service Name Type
Redshift com.amazonaws.{region}.redshift Interface
Redshift Data API com.amazonaws.{region}.redshift-data Interface
KMS com.amazonaws.{region}.kms Interface
S3 com.amazonaws.{region}.s3 Gateway
Secrets Manager com.amazonaws.{region}.secretsmanager Interface
SNS com.amazonaws.{region}.sns Interface
CloudWatch Logs com.amazonaws.{region}.logs Interface
STS com.amazonaws.{region}.sts Interface

Configuration

MDAA Config

Add the following snippet to your mdaa.yaml under the modules: section of a domain/env in order to use this module:

datawarehouse: # Module Name can be customized
  module_path: '@aws-mdaa/datawarehouse' # Must match module NPM package name
  module_configs:
    - ./datawarehouse.yaml # Filename/path can be customized

Module Config Samples and Variants

Copy the contents of the relevant sample config below into the ./datawarehouse.yaml file referenced in the MDAA config snippet above.

Minimal Configuration

Required properties only — a basic Redshift cluster with VPC networking, security group ingress, and audit logging. Start here for a quick data warehouse deployment before adding federation, scheduled actions, or database users.

sample-config-minimal.yaml

# Contents available via above link
# Minimal Redshift Data Warehouse module configuration.
# Contains only the required properties for a basic Redshift cluster
# deployment with VPC networking, security group, and audit logging.

# Admin username for the Redshift cluster. A secret is
# automatically generated for the password.
adminUsername: admin

# Number of days between admin password rotation
adminPasswordRotationDays: 30

# See CONFIGURATION.md for role reference options (name, arn, id).
# Admin roles granted full access to cluster resources including
# KMS keys and S3 buckets
dataAdminRoles:
  - arn: arn:{{partition}}:iam::{{account}}:role/Admin

# Enable audit logging to a dedicated S3 bucket
enableAuditLoggingToS3: true

# Redshift node type determining compute and storage capacity
nodeType: RA3_4XLARGE
# Number of nodes in the Redshift cluster
numberOfNodes: 2

# Weekly maintenance window in ddd:hh24:mi-ddd:hh24:mi format (UTC)
preferredMaintenanceWindow: Sun:23:45-Mon:00:15

# Security group ingress rules defining allowed inbound
# connections to the cluster port
securityGroupIngress:
  ipv4:
    - 172.31.0.0/16

# Subnet IDs for Redshift cluster node placement
# Often created by your VPC/networking stack.
# Example SSM: ssm:/path/to/subnet/id
subnetIds:
  - subnet-12312312421
  - subnet-12312321412

# VPC ID for Redshift cluster deployment
# Often created by your VPC/networking stack.
# Example SSM: ssm:/path/to/vpc/id
vpcId: vpc-12321421412

Comprehensive Configuration

Deploys a multi-node Redshift cluster with SAML federation, scheduled pause/resume, audit logging, database users with secret rotation, event notifications, workload management, parameter group tuning, and VPC networking. Use this as a reference when you need full control over cluster sizing, access patterns, cost management, and operational automation.

sample-config-comprehensive.yaml

# Contents available via above link
# Sample config for the Redshift Data Warehouse module.
# Deploys a multi-node Redshift cluster with SAML federation,
# scheduled pause/resume, audit logging, database users with
# secret rotation, event notifications, workload management,
# parameter group tuning, and VPC networking.
# This comprehensive config exercises every compatible non-excluded
# property at full depth.

# Admin username for the Redshift cluster. A secret is
# automatically generated for the password.
adminUsername: admin

# Number of days between admin password rotation
adminPasswordRotationDays: 30

# (Optional) Number of days automated snapshots are retained
# (1-35). Set to 0 to disable. (default: 1)
automatedSnapshotRetentionDays: 3

# (Optional) Additional KMS key ARNs allowed to write to the
# cluster bucket. Useful for allowing Glue jobs or other services
# to write encrypted data to the warehouse bucket.
additionalBucketKmsKeyArns:
  - arn:{{partition}}:kms:{{region}}:{{account}}:key/abcd-123123-abcd-12312421

# (Optional) SAML or OIDC federation configurations for federated
# Redshift access. Each federation creates an IAM role with SAML
# trust for dynamic credential generation and group-based cluster
# access.
federations:
  # Name of the federation for reference
  - federationName: 'test'
    # ARN of the IAM Identity Provider
    providerArn: arn:{{partition}}:iam::{{account}}:saml-provider/sample-saml-identity-provider

# See CONFIGURATION.md for role reference options (name, arn, id).
# Admin roles granted full access to cluster resources including
# KMS keys and S3 buckets. Roles can be referenced by name, arn,
# or id.
dataAdminRoles:
  - arn: arn:{{partition}}:iam::{{account}}:role/Admin

# (Optional) Roles granted read/write access to the data warehouse
# S3 bucket for data loading/unloading
warehouseBucketUserRoles:
  - name: User
  - name: team2-ex-role

# (Optional) External execution roles associated with the Redshift
# cluster for cross-service operations. If a role also needs
# warehouse bucket access, add it to warehouseBucketUserRoles.
executionRoles:
  - arn: arn:{{partition}}:iam::{{account}}:role/team1-ex-role
  - arn: arn:{{partition}}:iam::{{account}}:role/RedshiftExecRole

# VPC ID for Redshift cluster deployment
# Often created by your VPC/networking stack.
# Example SSM: ssm:/path/to/vpc/id
vpcId: vpc-12321421412

# Subnet IDs for Redshift cluster node placement. For automatic
# cluster relocation, specify at least one subnet per AZ.
# Often created by your VPC/networking stack.
# Example SSM: ssm:/path/to/subnet/id
subnetIds:
  - subnet-12312312421
  - subnet-12312321412
  - subnet-12312321413

# Weekly maintenance window in ddd:hh24:mi-ddd:hh24:mi format
# (UTC). Example: 'Sun:23:45-Mon:00:15'.
preferredMaintenanceWindow: Sun:23:45-Mon:00:15

# (Optional) TCP port for client connections to the cluster
# (default: 5440)
clusterPort: 5440

# Security group ingress rules defining allowed inbound
# connections to the cluster port. Supports IPv4 CIDR blocks and
# security group references. All other traffic is blocked.
securityGroupIngress:
  # (Optional) IPv4 CIDR blocks to allow traffic from
  ipv4:
    - 172.31.0.0/16
  # (Optional) Security group IDs to allow traffic from
  sg:
    - ssm:/path/to/ssm

# Redshift node type determining compute and storage capacity
nodeType: RA3_4XLARGE
# Number of nodes in the Redshift cluster
numberOfNodes: 2

# Enable audit logging to a dedicated S3 bucket. The audit bucket
# uses SSE-S3 encryption (not KMS) due to Redshift requirements.
enableAuditLoggingToS3: true

# (Optional) Control whether a dedicated S3 bucket is created for
# warehouse data operations (default: true)
createWarehouseBucket: true
# (Optional) Initial database name created in the cluster
# (default: default_db)
dbName: 'test_db'
# (Optional) Multi-node cluster flag. When true, creates a
# multi-node cluster for distributed processing.
multiNode: true
# (Optional) Let Redshift manage the master password
redshiftManageMasterPassword: false

# (Optional) Additional cluster parameter group parameters for
# performance tuning. Security-sensitive values are automatically
# overridden for compliance (e.g., SSL enforcement).
parameterGroupParams:
  max_concurrency_scaling_clusters: "1"
  auto_analyze: "true"

# (Optional) Snapshot identifier for cluster restoration. Only
# provide when restoring from an existing snapshot.
snapshotIdentifier: test-snapshot-id

# (Optional) Snapshot owner account. Applicable if restoring the
# cluster from a snapshot that belongs to another account.
# By default, snapshots are searched within the current account.
snapshotOwnerAccount: '{{context:account-2}}'

# (Optional) Database users created in Redshift with credentials
# stored in Secrets Manager. Supports automated secret rotation.
databaseUsers:
  # Name of the database user
  - userName: 'serviceuserGlue'
    # The DB to which the user will be added
    dbName: 'default_db'
    # Number of days between secret rotation
    secretRotationDays: 90
    # (Optional) Characters to exclude in the password
    excludeCharacters: '"@/\\'
    # (Optional) List of roles that need Redshift secret access
    secretAccessRoles:
      - name: 'test-arn'
  - userName: 'serviceuserQuicksight'
    dbName: 'default_db'
    secretRotationDays: 90

# (Optional) Scheduled actions for automated cluster pause/resume.
# Each action specifies a target action, cron schedule, and active
# time window.
scheduledActions:
  # Pause cluster every Friday at 6pm ET
  - name: pause-cluster
    # Scheduled action is enabled if true
    enable: true
    # Target operation: 'pauseCluster' or 'resumeCluster'
    targetAction: pauseCluster
    # Cron expression: cron(Minutes Hours Day-of-month Month
    # Day-of-week Year)
    schedule: cron(0 22 ? * FRI *)
    # (Optional) UTC start date/time when the schedule becomes
    # active (ISO 8601 format)
    startTime: '2023-12-31T00:00:00Z'
    # (Optional) UTC end date/time after which the schedule is no
    # longer active
    endTime: '2099-12-31T00:00:00Z'

  # Resume cluster every Monday at 7am ET
  - name: resume-cluster
    enable: true
    targetAction: resumeCluster
    schedule: cron(0 12 ? * MON *)
    startTime: '2023-12-31T00:00:00Z'
    endTime: '2099-12-31T00:00:00Z'

# (Optional) Event notification configuration for cluster and
# scheduled action monitoring. Configures SNS notifications with
# email delivery, severity filtering, and event category selection.
eventNotifications:
  # (Optional) Email addresses for SNS notification delivery
  email:
    - example@example.com
  # (Optional) Minimum event severity level (enum: ERROR, INFO)
  severity: INFO
  # (Optional) Event categories to monitor
  # (enum: configuration, management, monitoring, security,
  # pending)
  eventCategories:
    - configuration
    - management
    - monitoring
    - security
    - pending

# (Optional) Workload management (WLM) configuration for query
# queue management and resource allocation.
workloadManagement:
  - query_group: 'test-group'
    memory_percent_to_use: 50
    query_concurrency: 5

# (Optional) Enable multi-AZ deployment for high availability.
# Requires: numberOfNodes >= 2, subnetIds in >= 3 AZs,
# clusterPort in range 5431-5455 or 8191-8215.
multiAz: true

# (Optional) Target region for cross-region snapshot copies.
# When set, enables automatic cross-region snapshot copy to this region.
# Must differ from the cluster's deployment region.
backupRegion: us-west-2

# (Optional) When true, omits the explicit blockPublicAccess setting on S3
# buckets so CDK does not emit a PutBucketPublicAccessBlock API call. Use in
# SCP-restricted environments where public access block is managed externally
# via AWS account-level settings.
publicAccessBlockManagedExternally: false

External Public Access Block Management

Deploys a Redshift cluster with S3 bucket public access block management delegated externally — for example, via account-level S3 settings or SCPs. When publicAccessBlockManagedExternally is set to true, CDK omits the PutBucketPublicAccessBlock API call on provisioned S3 buckets, avoiding conflicts in environments where SCPs restrict that action. Use this variant when your organization enforces public access restrictions at the account or organizational level rather than per-bucket.

sample-config-public-access-block-external.yaml

# Contents available via above link
# Redshift Data Warehouse module configuration exercising
# publicAccessBlockManagedExternally: true.
# This omits the explicit blockPublicAccess setting on S3 buckets
# so CDK does not emit a PutBucketPublicAccessBlock API call.

adminUsername: admin
adminPasswordRotationDays: 30

dataAdminRoles:
  - arn: arn:{{partition}}:iam::{{account}}:role/Admin

enableAuditLoggingToS3: true

nodeType: RA3_4XLARGE
numberOfNodes: 2

preferredMaintenanceWindow: Sun:23:45-Mon:00:15

securityGroupIngress:
  ipv4:
    - 172.31.0.0/16

subnetIds:
  - subnet-12312312421
  - subnet-12312321412

vpcId: vpc-12321421412

# Omit explicit blockPublicAccess on S3 buckets; managed externally via
# AWS account-level settings and/or SCPs.
publicAccessBlockManagedExternally: true

Config Schema Docs