Self Hosted Spark History Server¶
In this section, you will learn how to self host Spark History Server instead of using the Persistent App UI on the AWS Console.
-
In your StartJobRun call for EMR on EKS, set the following conf. to point to an S3 bucket where you would like your event logs to go :
spark.eventLog.dirandspark.eventLog.enabledas such:"configurationOverrides": { "applicationConfiguration": [{ "classification": "spark-defaults", "properties": { "spark.eventLog.enabled": "true", "spark.eventLog.dir": "s3://your-bucket-here/some-directory" ... -
Take note of the S3 bucket specified in #1, and use it in the instructions on step #3 wherever you are asked for
path_to_eventlogand make sure it is prepended withs3a://, nots3://. An example is-Dspark.history.fs.logDirectory=s3a://path_to_eventlog. -
Follow instructions here to launch Spark History Server using a Docker image.
-
After following the above steps, event logs should flow to the specified S3 bucket and the docker container should spin up Spark History Server (which will be available at
127.0.0.1:18080). This instance of Spark History Server will pick up and parse event logs from the S3 bucket specified.