SageMaker Model Monitoring
Note: This documentation is also available in a rendered format here.
Deploys SageMaker Model Monitor schedules for all four monitoring types — data quality, model quality, model bias, and model explainability — against a deployed real-time endpoint. Each monitor runs as a scheduled processing job that compares live inference traffic against baseline statistics and constraints, publishing violations to S3 and CloudWatch. Use this module when you need continuous monitoring of a production inference endpoint to detect data drift, model degradation, bias drift, or changes in feature attribution.
Deployed Resources
This module deploys and integrates the following resources:
SageMaker Data Quality Monitoring Schedule (Optional) - Scheduled processing job that compares incoming request data against baseline data statistics and constraints.
SageMaker Model Quality Monitoring Schedule (Optional) - Scheduled processing job that evaluates model prediction accuracy against ground truth labels.
SageMaker Model Bias Monitoring Schedule (Optional) - Scheduled processing job that detects bias drift in model predictions using SageMaker Clarify.
SageMaker Model Explainability Monitoring Schedule (Optional) - Scheduled processing job that tracks changes in feature attribution using SageMaker Clarify.
Baseline Processing Job (Optional) - One-time processing job that generates baseline statistics and constraints from a representative dataset.
Amazon S3 Output Bucket - Stores monitoring output reports, constraint violations, and baseline artifacts.
AWS KMS Key - Customer-managed encryption key for S3 output bucket and processing job storage volumes.
AWS IAM Monitoring Role - Execution role for monitoring processing jobs with permissions to read endpoint data capture and write results to S3.
Related Modules
- SageMaker Endpoint — Deploys the real-time inference endpoint that this module monitors for drift and quality degradation
- SageMaker MLOps — Provides the model artifacts and model package group used to establish monitoring baselines
- SageMaker Studio Domain — Provides SageMaker domain tagging context for resource governance
Security/Compliance Details
This module is designed in alignment with MDAA security/compliance principles and CDK nag rulesets. Additional review is recommended prior to production deployment, ensuring organization-specific compliance requirements are met.
- Encryption at Rest:
- S3 output bucket encrypted with customer-managed KMS key
- Processing job storage volumes encrypted with KMS
- Baseline artifacts encrypted at rest
- Encryption in Transit:
- All S3 access enforced over HTTPS via bucket policy
- Processing job containers communicate over TLS
- Least Privilege:
- Monitoring role scoped to specific endpoint, S3 paths, and KMS key
- KMS key policy restricts usage to the monitoring role and admin principals
- Network Isolation:
- Monitoring processing jobs support VPC configuration with security groups and subnets
- Optional network isolation mode prevents containers from making outbound network calls
Configuration
MDAA Config
Add the following snippet to your mdaa.yaml under the modules: section of a domain/env in order to use this module:
sagemaker-model-monitoring: # Module Name can be customized
module_path: '@aws-mdaa/sagemaker-model-monitoring' # Must match module NPM package name
module_configs:
- ./sagemaker-model-monitoring.yaml # Filename/path can be customized
Module Config Samples and Variants
Copy the contents of the relevant sample config below into the ./sagemaker-model-monitoring.yaml file referenced in the MDAA config snippet above.
Minimal Configuration
Start here for a single data quality monitor on an existing endpoint with default schedule and instance settings.
# Minimal config for the SageMaker Model Monitoring module.
# Contains only the required properties for basic data quality
# monitoring on a SageMaker endpoint.
#
# NOTE: ECR image URIs below are region-specific (us-east-1). Replace the account ID
# and region with values appropriate for your deployment region. See:
# https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html
# Name of the SageMaker endpoint to monitor
# Often created by the SageMaker Endpoint module.
# Example SSM: ssm:/{{org}}/{{domain}}/<endpoint_module_name>/endpoint-name
endpointName: test-endpoint
# Monitor configurations — at least one type must be enabled
monitors:
dataQuality:
enabled: true
schedule: "cron(0 * ? * * *)"
instanceType: ml.m5.xlarge
imageUri: "156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer"
Comprehensive Configuration
Use this as a reference when you need all four monitor types, custom baseline generation, VPC isolation, and per-monitor schedule and instance configuration.
sample-config-comprehensive.yaml
# Comprehensive config for the SageMaker Model Monitoring module.
# Deploys all four monitor types (data quality, model quality,
# model bias, model explainability) with VPC isolation, KMS
# encryption, and automated baselining.
#
# NOTE: ECR image URIs below are region-specific (us-east-1). Replace the account ID
# and region with values appropriate for your deployment region. See:
# https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html
# Name of the SageMaker endpoint to monitor
# Often created by the SageMaker Endpoint module.
# Example SSM: ssm:/{{org}}/{{domain}}/<endpoint_module_name>/endpoint-name
endpointName: test-endpoint
# Monitor configurations — at least one type must be enabled
monitors:
dataQuality:
enabled: true
schedule: "cron(0 * ? * * *)"
instanceType: ml.m5.xlarge
instanceCount: 1
volumeSizeInGb: 30
maxRuntimeInSeconds: 3600
imageUri: "156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer"
# (Optional) Pre-computed baseline references
baselineDatasetUri: s3://test-model-bucket/baselines/data-quality/dataset.csv
baselineConstraintsUri: s3://test-model-bucket/baselines/data-quality/constraints.json
baselineStatisticsUri: s3://test-model-bucket/baselines/data-quality/statistics.json
modelQuality:
enabled: true
schedule: "cron(0 */6 ? * * *)"
instanceType: ml.m5.xlarge
imageUri: "156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer"
problemType: BinaryClassification
groundTruthS3Uri: s3://test-model-bucket/ground-truth/
inferenceAttribute: prediction
probabilityAttribute: probability
probabilityThreshold: 0.5
modelBias:
enabled: true
schedule: "cron(0 0 ? * MON *)"
instanceType: ml.m5.xlarge
imageUri: "246618743249.dkr.ecr.us-east-1.amazonaws.com/sagemaker-clarify-processing:1.0"
groundTruthS3Uri: s3://test-model-bucket/ground-truth/
featuresAttribute: features
modelExplainability:
enabled: true
schedule: "cron(0 0 ? * MON *)"
instanceType: ml.m5.xlarge
imageUri: "246618743249.dkr.ecr.us-east-1.amazonaws.com/sagemaker-clarify-processing:1.0"
featuresAttribute: features
# (Optional) VPC ID for monitoring jobs
# Often created by your VPC/networking stack.
# Example SSM: ssm:/path/to/vpc/id
vpcId: vpc-0123456789abcdef0
# (Optional) Subnet IDs for monitoring jobs
# Often created by your VPC/networking stack.
# Example SSM: ssm:/path/to/subnet/id
subnetIds:
- subnet-0123456789abcdef0
- subnet-0123456789abcdef1
# (Optional) Security group IDs for monitoring jobs
# Often created by your VPC/networking stack.
# Example SSM: ssm:/path/to/security-group/id
securityGroupIds:
- sg-0123456789abcdef0
# (Optional) S3 bucket ARN for model artifacts access
modelBucketArn: arn:{{partition}}:s3:::test-model-bucket
# (Optional) Enable network isolation for monitoring jobs
# (default: false)
networkIsolation: false
# (Optional) KMS key ARN for encryption. If omitted, a new
# customer-managed key is created.
kmsKeyArn: arn:{{partition}}:kms:{{region}}:{{account}}:key/test-key-id
# (Optional) Automated baselining configuration
baselineTrainingDataS3Uri: s3://test-model-bucket/baselines/training-data.csv
baselineOutputDataS3Uri: s3://test-model-bucket/baselines/output/
baselineSchedule: "cron(0 0 1 * ? *)"
baselineDatasetFormat: '{"csv": {"header": true}}'