Skip to content

Spark Driver and Executor Logs

The status of the spark jobs can be monitored via EMR on EKS describe-job-run API.

To be able to monitor the job progress and to troubleshoot failures, you must configure your jobs to send log information to Amazon S3, Amazon CloudWatch Logs, or both

Send Spark Logs to S3

Update the IAM role with S3 write access

Configure the IAM Role passed in StartJobRun input executionRoleArn with access to S3 buckets.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my_s3_log_location",
                "arn:aws:s3:::my_s3_log_location/*",
            ]
        }
    ]
}

Configure the StartJobRun API with S3 buckets

Configure the monitoringConfiguration with s3MonitoringConfiguration, and configure the S3 location where the logs would be synced.

{
  "name": "<job_name>", 
  "virtualClusterId": "<vc_id>",  
  "executionRoleArn": "<iam_role_name_for_job_execution>", 
  "releaseLabel": "<emr_release_label>", 
  "jobDriver": {

  }, 
  "configurationOverrides": {
    "monitoringConfiguration": {
      "persistentAppUI": "ENABLED",
      "s3MonitoringConfiguration": {
        "logUri": "s3://my_s3_log_location"
      }
    }
  }
}

Log location of JobRunner, Driver, Executor in S3

The JobRunner (pod that does spark-submit), Spark Driver, and Spark Executor logs would be found in the following S3 locations.

JobRunner/Spark-Submit/Controller Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${job-runner-pod-id}/(stderr.gz/stdout.gz)

Driver Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-pod-name}/(stderr.gz/stdout.gz)

Executor Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-executor-id}/(stderr.gz/stdout.gz)

Send Spark Logs to CloudWatch

Update the IAM role with CloudWatch access

Configure the IAM Role passed in StartJobRun input executionRoleArn with access to CloudWatch Streams.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogStream",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Resource": [
        "arn:aws:logs:*:*:*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:PutLogEvents"
      ],
      "Resource": [
        "arn:aws:logs:*:*:log-group:my_log_group_name:log-stream:my_log_stream_prefix/*"
      ]
    }
  ]
}

Configure StartJobRun API with CloudWatch

Configure the monitoringConfiguration with cloudWatchMonitoringConfiguration, and configure the CloudWatch logGroupName and logStreamNamePrefix where the logs should be pushed.

{
  "name": "<job_name>", 
  "virtualClusterId": "<vc_id>",  
  "executionRoleArn": "<iam_role_name_for_job_execution>", 
  "releaseLabel": "<emr_release_label>", 
  "jobDriver": {

  }, 
  "configurationOverrides": {
    "monitoringConfiguration": {
      "persistentAppUI": "ENABLED",
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "my_log_group_name",
        "logStreamNamePrefix": "my_log_stream_prefix"
      }
    }
  }
}

Log location of JobRunner, Driver, Executor

The JobRunner (pod that does spark-submit), Spark Driver, and Spark Executor logs would be found in the following AWS CloudWatch locations.

JobRunner/Spark-Submit/Controller Logs - ${my_log_group_name}/${my_log_stream_prefix}/${virtual-cluster-id}/jobs/${job-id}/containers/${job-runner-pod-id}/(stderr.gz/stdout.gz)

Driver Logs - ${my_log_group_name}/${my_log_stream_prefix}/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-pod-name}/(stderr.gz/stdout.gz)

Executor Logs - ${my_log_group_name}/${my_log_stream_prefix}/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-executor-id}/(stderr.gz/stdout.gz)