Skip to content

Modern Data Architecture Accelerator (MDAA)

The Modern Data Architecture Accelerator (MDAA) helps organizations deploy secure, compliant data analytics and AI environments on Amazon Web Services (AWS) through simple YAML configuration files. Whether you need a basic data lake, a full data science platform, Sagemaker unified studio or a generative AI solution, MDAA provides prepackaged starter kits and reusable infrastructure components that handle security compliance out of the box. It supports teams of all sizes, from small organizations looking for code-free deployment to large enterprises building complex Lake House or Data Mesh architectures.

License Version Build

Table of Contents

Who Is This For?

  • Data Engineers: Build and manage data pipelines, lakes, and warehouses with pre-configured, compliant infrastructure.
  • Platform Engineers: Deploy and operate secure analytics platforms across multiple AWS accounts using configuration-driven automation.
  • Data Scientists: Get a ready-to-use SageMaker Unified Studio environment with governed data access so you can focus on models, not infrastructure.
  • Business Analysts: Access governed data through Athena, QuickSight, and other analytics tools deployed by your platform team.
  • Compliance Officers: Gain confidence that deployed infrastructure aligns with NIST 800-53, HIPAA, and PCI-DSS security control requirements.

Key Features

  • Security compliance built in: Modules are designed for compliance with AWS Solutions, NIST 800-53 Rev5, HIPAA, PCI-DSS, and ITSG-33 CDK Nag rulesets.
  • Configuration-driven deployment: Define your entire modern data and analytics environment in YAML files and deploy with a single CLI command. No custom code required.
  • Multi-language support: Reusable CDK L2 constructs available in TypeScript, Python, Java, and .NET via JSII (JavaScript Interop Interface). L3 constructs are currently TypeScript-only.
  • Starter kits for common use cases: Prepackaged configurations for data lakes, data science platforms, generative AI, governed lakehouses, and healthcare data.
  • Multi-account and multi-region: Deploy across multiple AWS accounts and regions with built-in cross-account trust and governance.

Quick Start

Deploy your first data lake in minutes using the Basic DataLake starter kit. Alternatively, quickly deploy one of these other starter kits

Prerequisites

Steps

  1. Clone the repo and navigate to the Basic DataLake starter kit:
git clone https://github.com/aws/modern-data-architecture-accelerator.git
cd modern-data-architecture-accelerator/starter_kits/basic_datalake
  1. Edit mdaa.yaml to specify an organization name. This must be globally unique, as it is used in the naming of all deployed resources (including globally named resources such as S3 buckets).

  2. If required, edit mdaa.yaml to specify context: values specific to your environment.

  3. Ensure you are authenticated to your target AWS account.

  4. Bootstrap your AWS account for CDK (if not already done):

npx cdk bootstrap
  1. Deploy using npx (no installation required):
npx @aws-mdaa/cli deploy -c mdaa.yaml

Or install the CLI globally and then deploy:

npm install -g @aws-mdaa/cli
mdaa deploy -c mdaa.yaml

Estimated deployment time: ~15–20 minutes

For full deployment details, see the Basic DataLake starter kit README.

What You Just Deployed

The Basic DataLake starter kit creates a secure, encrypted Amazon S3 data lake with AWS Glue databases and crawlers, AWS Identity and Access Management (IAM) roles with least-privilege policies, and AWS Key Management Service (KMS) encryption keys, all configured for compliance with standard security rulesets.

Looking for a different starting point? See Starter Kits for other prepackaged options including data science platforms, generative AI, and more.

Implementation Guide

MDAA follows a five-phase deployment lifecycle: Architecture (define your target platform design), Configuration (author YAML config files for each module), Customization (optionally extend via code-based escape hatches), Predeployment (bootstrap AWS accounts), and Deployment (deploy via the MDAA CLI). Each phase builds on the previous one, and starter kits can accelerate the first two phases significantly.

Phase Description Time Estimate
Architecture Define your target platform design and select modules 1–2 days
Configuration Author YAML config files for each module 1–3 days
Customization Optionally extend via code-based escape hatches 0–2 days
Predeployment Bootstrap AWS accounts with CDK 2 - 10 mins
Deployment Deploy via the MDAA CLI 15 min – 1 hour

For the full step-by-step guide, see the MDAA Implementation Guide. Starter kits and sample configurations provide ready-made configurations that can accelerate the early phases significantly.

Workshops and Learning Resources

Self-Paced Workshops

  • MDAA Hands-On Workshop: A guided, hands-on workshop that walks you through deploying and configuring MDAA from scratch.

Sample Configurations and Starter Kits

  • External Sample Configurations: A community-maintained repository of additional MDAA configurations for various use cases and architectures.
  • Starter Kits: Prepackaged, secure MDAA configurations for common use cases, included in this repository.

Documentation

Browse the full documentation, module references, and configuration schemas at aws.github.io/modern-data-architecture-accelerator.

Starter Kits

Starter kits provide secure, prepackaged foundations for common use cases:

Starter Kit Description Est. Deploy Time
Basic DataLake A secure S3 data lake with Glue databases and crawlers ~15–20 min
Basic DataScience Platform A standalone SageMaker AI Studio data science environment ~20–30 min
GenAI Accelerator Enterprise-ready generative AI platform with Amazon Bedrock ~10–15 min
Governed Lakehouse DataZone-governed lakehouse with fine-grained access control ~20–25 min
Health Data Accelerator Healthcare data lake with DMS (Database Migration Service) integration ~30–45 min
SMUS Research Environment A SageMaker Unified Studio-enabled architecture suitable for facilitating team-based research activities ~20–25 min

Sample Configurations

Additional sample configurations are available in a dedicated repository for easier community contribution and faster updates.

Available Modules

MDAA is implemented as a set of compliant modules deployed via a unified orchestration layer. For detailed module documentation, configuration schemas, and API references, see the MDAA Documentation Site.

  • MDAA CDK Modules: Configuration-driven CDK Apps that deploy compliant data analytics components as CloudFormation stacks. Can be run independently via CDK CLI or composed via the MDAA CLI.
  • MDAA CDK L2 and L3 Constructs: Reusable CDK constructs designed for compliance with AWS Solutions, HIPAA, PCI-DSS, and NIST 800-53 R5 rulesets. L2 constructs are available in TypeScript, Python, Java, and .NET via JSII. L3 constructs are currently TypeScript-only.
  • MDAA CLI: A configuration-driven CLI that composes and orchestrates multiple MDAA modules to deploy compliant end-to-end analytics environments.

MDAA Code Architecture

Governance Modules

Data Lake Modules

Data Ops Modules

Data Analytics Modules

AI / Data Science Modules

Core / Utility Modules

Reusable CDK L2 Constructs

Compliant with AWS Solutions, HIPAA, PCI-DSS, and NIST 800-53 R5 CDK Nag rulesets:

Using and Extending MDAA

MDAA can be used and extended in three ways:

Configuration-Driven Deployment

Deploy compliant, end-to-end analytics environments using YAML config files and the MDAA CLI. No code required - accessible to all roles, from simple to complex deployments with high compliance assurance.

Code-Driven Custom Environments

Build custom analytics environments using MDAA's reusable CDK constructs. Multi-language support (TypeScript, Python, Java, .NET) for L2 constructs; L3 constructs are currently TypeScript-only.

Workload Integration

Independently developed workloads (CDK or CloudFormation) can leverage MDAA-deployed resources via the standard set of SSM (Systems Manager) parameters published by all MDAA modules.

MDAA Usage and Extension

Logical Architecture

MDAA is designed as a set of logical architectural layers, each constituted by a set of functional modules. Each module configures and deploys a set of resources which constitute the data analytics environment. Modules may have logical dependencies on each other, and may also leverage non-MDAA resources deployed within the environment.

While MDAA can be used to implement a comprehensive, end-to-end data analytics platform, it does not result in a closed system. MDAA may be freely integrated with non-MDAA deployed platform elements and analytics capabilities. Any individual layer or module of MDAA can be replaced by a non-MDAA component, and the remaining layers and modules will continue to function (assuming basic functional parity with the replaced MDAA module or layer).

MDAA Logical Architecture

Code Architecture

MDAA Code Architecture

Metrics Collection

This solution collects anonymous operational metrics to help AWS improve quality and features. For more information, including how to disable this capability, see the CDK version reporting documentation.

For Developers

MDAA includes comprehensive testing for both TypeScript/CDK code and Python Lambda/Glue functions:

# Run all tests
./scripts/test.sh

# TypeScript tests only
lerna run test --stream

# Python tests only
npm run test:python:all

For detailed guides, see:

Full documentation and module reference is available at aws.github.io/modern-data-architecture-accelerator. To generate the docs locally, run mkdocs serve from the project root (requires MkDocs).

Contributing

We welcome contributions from the community. See CONTRIBUTING.md for guidelines on how to get started, set up your development environment, and submit pull requests.

Security

See CONTRIBUTING.md for information on reporting security issues.

See SECURITY.md for details on MDAA's security design principles and compliance approach.

License

This project is licensed under the Apache-2.0 License.