MDAA L3 construct for SageMaker Ground Truth labeling workflows.
Creates a fully automated, continuous labeling pipeline:
A SageMaker Ground Truth labeling job takes a static manifest file as input — it is a one-shot batch operation, not a continuously running service. There is no way to connect a labeling job directly to an S3 bucket and have it automatically pick up new files.
This architecture solves that limitation:
S3 Upload → EventBridge → SQS Queue
↓
EventBridge Scheduler (e.g. every 5 min)
↓
Step Functions State Machine
├── Poll SQS
├── Build manifest
├── CreateLabelingJob → Human workers label data
├── (Optional) CreateVerificationJob → Workers verify labels
├── Write results to Feature Store
└── Delete processed SQS messages
Data can be uploaded at any time. The scheduler periodically checks for pending items and processes them in batches. If the queue is empty, the state machine exits cleanly with no cost.
| Outcome | Meaning |
|---|---|
| SUCCEEDED (no labeling job) | Queue was empty — nothing to process. |
| SUCCEEDED (with labeling job) | Labeling job completed, items were labeled, results written to Feature Store. |
| FAILED (TotalLabeled = 0) | Labeling job completed but no worker labeled any items within the task availability window (default: 6 hours). Messages are returned to SQS for retry on the next scheduled run. |
| FAILED (labeling job error) | SageMaker labeling job failed (e.g. permission issue, invalid config). Messages are returned to SQS for retry. |
Note: A
FAILEDexecution does not mean data is lost. The error handling path returns all unprocessed messages to the SQS queue so they are picked up on the next scheduled run. Items that repeatedly fail will eventually move to the dead-letter queue.
| Task Type | Media |
|---|---|
image_bounding_box |
Image |
image_semantic_segmentation |
Image |
image_single_label_classification |
Image |
image_multi_label_classification |
Image |
text_single_label_classification |
Text |
text_multi_label_classification |
Text |
named_entity_recognition |
Text |
The following resources must be created before deploying this construct.
| Resource | Config Field | How to Create |
|---|---|---|
| SageMaker Workteam | labelingTaskConfig.workteamArn |
Create via SageMaker Ground Truth console or aws sagemaker create-workteam. Private workteams require a Cognito user pool. |
| Label Categories File | labelingTaskConfig.categoriesS3Uri |
Upload a JSON file to S3 with label categories. See categories file format. |
| Labeling UI Template | labelingTaskConfig.templateS3Uri |
Upload a Liquid HTML template to S3. Required for image_bounding_box and image_semantic_segmentation (SageMaker does not support built-in HumanTaskUiArn for these task types). Optional for classification and NER tasks which use AWS-managed templates. See custom templates. |
verification block is configured)| Resource | Config Field | How to Create |
|---|---|---|
| Verification Workteam | verification.workteamArn |
Same as labeling workteam. Can be the same or a different team. |
| Verification UI Template | verification.templateS3Uri |
Optional. Custom template for verification UI. |
| Verification Categories File | verification.categoriesS3Uri |
Optional. The first label should be the "pass" label; other labels indicate validation failures. |
Note: Only private workteams are supported. Public workteam pricing (
LABELING_TASK_PRICE/VERIFICATION_TASK_PRICE) is not currently exposed.
import { SageMakerGroundTruthL3Construct } from '@aws-mdaa/sagemaker-ground-truth-l3-construct';
new SageMakerGroundTruthL3Construct(stack, 'GroundTruth', {
naming: props.naming,
roleHelper: props.roleHelper,
jobName: 'image-labeling',
taskType: 'image_bounding_box',
labelingTaskConfig: {
taskTitle: 'Label bounding boxes',
taskDescription: 'Draw bounding boxes around objects',
taskKeywords: ['image', 'bounding box'],
workteamArn: 'arn:aws:sagemaker:us-east-1:123456789012:workteam/private-crowd/my-team',
categoriesS3Uri: 's3://my-bucket/categories.json',
},
// Optional: add verification step
verification: {
workteamArn: 'arn:aws:sagemaker:us-east-1:123456789012:workteam/private-crowd/verify-team',
taskTitle: 'Verify labels',
taskDescription: 'Verify the bounding box labels are correct',
},
// Optional: custom schedule (default: daily at noon UTC)
workflowSchedule: 'cron(0 12 * * ? *)',
});
The construct publishes the following SSM parameters:
upload-bucket-name — S3 bucket for uploading data objectsoutput-bucket-name — S3 bucket for labeling job resultsfeature-group-name — SageMaker Feature Group namestate-machine-arn — Step Functions state machine ARNupload-queue-url — SQS queue URL for data notifications