Mount EBS Volume to spark driver and executor pods¶

Amazon EBS volumes can be mounted on Spark driver and executor pods through static and dynamic provisioning.

Using dynamically-created PVC to mount EBS volumes per pod in a Spark offers significant benefit in terms of performance, scalability, and ease of management. However, it also introduces complexities and potential costs if EBS create/attach/detach/delete operation throttles when over 5000 EBS volumes were generated by Spark pods. It needs to be carefully managed. It's important to weigh the pros and cons against your specific use case requirements and constraints to determine if this technique is suitable for your scale of Spark workloads.

Pros¶

Scalability: As Spark scales up and down automatically during a job execution, dynamic PVCs allow storage to scale seamlessly with the number of executor pods. This ensures that each new executor gets the necessary storage without manual intervention.
Optimized Storage Allocation: Dynamically provisioning PVCs allows you to allocate exactly the amount of storage needed for each Spark's pod. This prevents over-provisioning and ensures efficient use of resources, potentially reducing storage costs.
Cost efficient: Only pay for the storage you actually use, which can be more cost-effective than pre-allocating large, static volumes.
High IO Performance: By giving each executor its own EBS volume, you avoid I/O contention among executors. This leads to more predictable and higher performance, especially for I/O-intensive tasks.
Data Locality: With each executor having its own volume, data is stored locally to the executor’s pod. It can reduce data transfer latency.
Resilience for Spot Interruption: With the feature of "PVC Reuse" offered by Spark, EBS can persist shuffle data throughout a job lifetime, even if a pod is terminated in case of Spot interruption. You avoid creating new volumes instead re-attach them to new pods, which provides a faster recovery from a node failure or interruption event. This improves your application resilience while running Spot instances to reduce your compute cost.

Cons¶

Storage Costs: EBS volumes can be expensive, especially if EBS volumes were provisioned more than necessary, due to this bug or EBS CSI controller's scalability issue.
Resource Utilization: Inefficient use of storage resources can occur if each pod is allocated a large EBS volume but only uses a fraction of it.
Attachment Latency & Limit: Frequently attaching and detaching EBS volumes can introduce latency and potentially exceed EBS limit. For most instance types, only 26 extra volumes can be attached to a single Amazon EC2 instance.

Prerequisite¶

Amazon EBS CSI driver is installed on an EKS cluster. Use this comand to check if the driver exists:

kubectl get csidriver

Static Provisioning¶

EKS Admin Tasks¶

First, create your EBS volumes:

aws ec2 --region <region> create-volume --availability-zone <availability zone> --size 50
{
    "AvailabilityZone": "<availability zone>", 
    "MultiAttachEnabled": false, 
    "Tags": [], 
    "Encrypted": false, 
    "VolumeType": "gp2", 
    "VolumeId": "<vol -id>", 
    "State": "creating", 
    "Iops": 150, 
    "SnapshotId": "", 
    "CreateTime": "2020-11-03T18:36:21.000Z", 
    "Size": 50
}

Create Persistent Volume(PV) that has the EBS volume created above hardcoded:

cat > ebs-static-pv.yaml << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: ebs-static-pv
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: gp2
  awsElasticBlockStore:
    fsType: ext4
    volumeID: <vol -id>
EOF

kubectl apply -f ebs-static-pv.yaml -n <namespace>

Create Persistent Volume Claim(PVC) for the Persistent Volume created above:

cat > ebs-static-pvc.yaml << EOF
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ebs-static-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  volumeName: ebs-static-pv
EOF

kubectl apply -f ebs-static-pvc.yaml -n <namespace>

PVC - ebs-static-pvc can be used by spark developer to mount to the spark pod

NOTE: Pods running in EKS worker nodes can only attach to the EBS volume provisioned in the same AZ as the EKS worker node. Use node selectors to schedule pods on EKS worker nodes the specified AZ.

Spark Developer Tasks¶

Request

cat >spark-python-in-s3-ebs-static-localdir.json << EOF
{
  "name": "spark-python-in-s3-ebs-static-localdir", 
  "virtualClusterId": "<virtual-cluster-id>", 
  "executionRoleArn": "<execution-role-arn>", 
  "releaseLabel": "emr-6.15.0-latest", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "s3://<s3 prefix>/trip-count-fsx.py", 
       "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.instances=10 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6 "
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "spark-defaults", 
        "properties": {
          "spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-sparkspill.options.claimName": "ebs-static-pvc",
          "spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-sparkspill.mount.path": "/var/spark/spill/",
          "spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-sparkspill.mount.readOnly": "false",
         }
      }
    ], 
    "monitoringConfiguration": {
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "/emr-containers/jobs", 
        "logStreamNamePrefix": "demo"
      }, 
      "s3MonitoringConfiguration": {
        "logUri": "s3://joblogs"
      }
    }
  }
}
EOF
aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-ebs-static-localdir.json

Observed Behavior:
When the job gets started, the pre-provisioned EBS volume is mounted to driver pod. You can exec into the driver container to verify that the EBS volume is mounted. Also you can verify the mount from the driver pod's spec.

kubectl get pod <driver pod name> -n <namespace> -o yaml --export

Dynamic Provisioning¶

Dynamic Provisioning PVC/Volumes is supported for both Spark driver and executors for EMR versions >= 6.3.0.

EKS Admin Tasks¶

Warning

When enabling PVC for your Spark, the default Kubernetes role emr-containers may not have the required PVC permissions and job submission may fail. Please follow AWS Troubleshooting Guide to troubleshoot and add the required PVC permissions.

To set up storage, you can create a new "gp3" EBS Storage Class or use an existing one. When using Immidaite volumeBindingMode, PVC will be bound or provisioned without knowledge of the Pod's scheduling requirements, so we should use the WaitForFirstConsumer mode to delay volume provisioning until the pod is scheduled, and avoiding availability zone conflicts.

Here's an example:

cat >demo-gp3-sc.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: demo-gp3-sc
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
reclaimPolicy: Delete   # delete PVs after a job is completed
allowVolumeExpansion: true
mountOptions:
  - debug
volumeBindingMode: WaitForFirstConsumer  # Critical for multi-attach prevention
EOF

kubectl apply -f demo-gp3-sc.yaml

Spark Developer Tasks¶

Request

cat >spark-python-in-s3-ebs-dynamic-localdir.json << EOF
{
  "name": "spark-python-in-s3-ebs-dynamic-localdir", 
  "virtualClusterId": "<virtual-cluster-id>", 
  "executionRoleArn": "<execution-role-arn>", 
  "releaseLabel": "emr-6.15.0-latest", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "s3://<s3 prefix>/trip-count-fsx.py", 
       "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.instances=10 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6"
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "spark-defaults", 
        "properties": {
          "spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName": "OnDemand",
          "spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass": "demo-gp3-sc",
          "spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path":"/data",
          "spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly": "false",
          "spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit": "10Gi",

          "spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName":"OnDemand",
          "spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass": "demo-gp3-sc",
          "spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path": "/data",
          "spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly": "false",
          "spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit": "50Gi",
         }
      }
    ], 
    "monitoringConfiguration": {
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "/emr-containers/jobs", 
        "logStreamNamePrefix": "demo"
      }, 
      "s3MonitoringConfiguration": {
        "logUri": "s3://joblogs"
      }
    }
  }
}
EOF
aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-ebs-dynamic-localdir.json

Observed Behavior: When the job gets started, an EBS volume is provisioned dynamically by the EBS CSI driver and mounted to Spark's driver and executor pods. You can exec into the driver / executor container to verify that the EBS volume is mounted. Also, you can verify the mount from driver / executor pod spec.

# verify the EBS volume is mounted or not
kubectl get pod <driver pod name> -n <namespace> -c spark-kubernetes-driver -- df -h

# export the driver pod spec to a yaml file
kubectl get pod <driver pod name> -n <namespace> -o yaml --export