The Modern Data Architecture Accelerator (MDAA) helps organizations deploy secure, compliant data analytics and AI environments on Amazon Web Services (AWS) through simple YAML configuration files. Whether you need a basic data lake, a full data science platform, Sagemaker unified studio or a generative AI solution, MDAA provides prepackaged starter kits and reusable infrastructure components that handle security compliance out of the box. It supports teams of all sizes, from small organizations looking for code-free deployment to large enterprises building complex Lake House or Data Mesh architectures.
Deploy your first data lake in minutes using the Basic DataLake starter kit. Alternatively, quickly deploy one of these other starter kits
git clone https://github.com/aws/modern-data-architecture-accelerator.git
cd modern-data-architecture-accelerator/starter_kits/basic_datalake
Edit mdaa.yaml to specify an organization name. This must be globally unique, as it is used in the naming of all deployed resources (including globally named resources such as S3 buckets).
If required, edit mdaa.yaml to specify context: values specific to your environment.
Ensure you are authenticated to your target AWS account.
Bootstrap your AWS account for CDK (if not already done):
npx cdk bootstrap
npx @aws-mdaa/cli deploy -c mdaa.yaml
Or install the CLI globally and then deploy:
npm install -g @aws-mdaa/cli
mdaa deploy -c mdaa.yaml
Estimated deployment time: ~15–20 minutes
For full deployment details, see the Basic DataLake starter kit README.
The Basic DataLake starter kit creates a secure, encrypted Amazon S3 data lake with AWS Glue databases and crawlers, AWS Identity and Access Management (IAM) roles with least-privilege policies, and AWS Key Management Service (KMS) encryption keys, all configured for compliance with standard security rulesets.
Looking for a different starting point? See Starter Kits for other prepackaged options including data science platforms, generative AI, and more.
MDAA follows a five-phase deployment lifecycle: Architecture (define your target platform design), Configuration (author YAML config files for each module), Customization (optionally extend via code-based escape hatches), Predeployment (bootstrap AWS accounts), and Deployment (deploy via the MDAA CLI). Each phase builds on the previous one, and starter kits can accelerate the first two phases significantly.
| Phase | Description | Time Estimate |
|---|---|---|
| Architecture | Define your target platform design and select modules | 1–2 days |
| Configuration | Author YAML config files for each module | 1–3 days |
| Customization | Optionally extend via code-based escape hatches | 0–2 days |
| Predeployment | Bootstrap AWS accounts with CDK | 2 - 10 mins |
| Deployment | Deploy via the MDAA CLI | 15 min – 1 hour |
For the full step-by-step guide, see the MDAA Implementation Guide. Starter kits and sample configurations provide ready-made configurations that can accelerate the early phases significantly.
Browse the full documentation, module references, and configuration schemas at aws.github.io/modern-data-architecture-accelerator.
Starter kits provide secure, prepackaged foundations for common use cases:
| Starter Kit | Description | Est. Deploy Time |
|---|---|---|
| Basic DataLake | A secure S3 data lake with Glue databases and crawlers | ~15–20 min |
| Basic DataScience Platform | A standalone SageMaker AI Studio data science environment | ~20–30 min |
| GenAI Accelerator | Enterprise-ready generative AI platform with Amazon Bedrock | ~10–15 min |
| Governed Lakehouse | DataZone-governed lakehouse with fine-grained access control | ~20–25 min |
| Health Data Accelerator | Healthcare data lake with DMS (Database Migration Service) integration | ~30–45 min |
| SMUS Research Environment | A SageMaker Unified Studio-enabled architecture suitable for facilitating team-based research activities | ~20–25 min |
Additional sample configurations are available in a dedicated repository for easier community contribution and faster updates.
MDAA is implemented as a set of compliant modules deployed via a unified orchestration layer. For detailed module documentation, configuration schemas, and API references, see the MDAA Documentation Site.

Compliant with AWS Solutions, HIPAA, PCI-DSS, and NIST 800-53 R5 CDK Nag rulesets:
MDAA can be used and extended in three ways:
Deploy compliant, end-to-end analytics environments using YAML config files and the MDAA CLI. No code required - accessible to all roles, from simple to complex deployments with high compliance assurance.
Build custom analytics environments using MDAA's reusable CDK constructs. Multi-language support (TypeScript, Python, Java, .NET) for L2 constructs; L3 constructs are currently TypeScript-only.
Independently developed workloads (CDK or CloudFormation) can leverage MDAA-deployed resources via the standard set of SSM (Systems Manager) parameters published by all MDAA modules.

MDAA is designed as a set of logical architectural layers, each constituted by a set of functional modules. Each module configures and deploys a set of resources which constitute the data analytics environment. Modules may have logical dependencies on each other, and may also leverage non-MDAA resources deployed within the environment.
While MDAA can be used to implement a comprehensive, end-to-end data analytics platform, it does not result in a closed system. MDAA may be freely integrated with non-MDAA deployed platform elements and analytics capabilities. Any individual layer or module of MDAA can be replaced by a non-MDAA component, and the remaining layers and modules will continue to function (assuming basic functional parity with the replaced MDAA module or layer).


This solution collects anonymous operational metrics to help AWS improve quality and features. For more information, including how to disable this capability, see the CDK version reporting documentation.
MDAA includes comprehensive testing for both TypeScript/CDK code and Python Lambda/Glue functions:
# Run all tests
./scripts/test.sh
# TypeScript tests only
lerna run test --stream
# Python tests only
npm run test:python:all
For detailed guides, see:
Full documentation and module reference is available at aws.github.io/modern-data-architecture-accelerator. To generate the docs locally, run mkdocs serve from the project root (requires MkDocs).
We welcome contributions from the community. See CONTRIBUTING.md for guidelines on how to get started, set up your development environment, and submit pull requests.
See CONTRIBUTING.md for information on reporting security issues.
See SECURITY.md for details on MDAA's security design principles and compliance approach.
This project is licensed under the Apache-2.0 License.