Skip to content

EMR on EKS Glossary & Terms

  • EMR on EKS Job: The Spark Job being submitted and executed by the EMR on EKS Control plane
  • EMR on EKS Job types: Type of Spark job being submitted. It can be either batch job (having fixed job duration) or streaming job (continuously running job).
  • Kubernetes (K8s) control plane: A K8s cluster consists of a control plane and one or more worker nodes. Control plane is responsible managing overall state of the cluster and includes components such as API server, etcd database, scheduler, and controller manager.
  • K8s API request: The K8s API is a resource-based (RESTful) programmatic interface provided via HTTP. It supports retrieving, creating, updating, and deleting resources in K8s cluster via the standard HTTP verbs (POST, PUT, PATCH, DELETE, GET).
  • K8s pod: Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.
  • K8s event: Event is a report of an event somewhere in the K8s cluster. It generally denotes some state change in the system.
  • K8s config map: A ConfigMap is an API object used to store non-confidential data in key-value pairs. Pods can consume ConfigMaps as environment variables, command-line arguments, or as configuration files in a volume.
  • K8s API Server: API server **** is an internal K8s component responsible to serve and process K8s API requests. EKS hosts this K8s control plane component on EKS owned infrastructure that is different from customer’s EKS cluster.
  • K8s Etcd database: Etcd is K8s internal database that stores information about K8s objects such as pods, events, config maps etc. EKS hosts this K8s control plane component on EKS owned infrastructure.
  • K8s Job: A K8s Job object creates and monitors a pod until they complete successfully. It has a retry policy that helps ensuring completion. This is different from EMR on EKS job concept. An EMR on EKS job usually submits one or more K8s jobs in K8s cluster.
  • K8s Job Controller: The K8s native controller is a component that interacts with the Kubernetes API server to create pods, update job status according to pod status, create events. Job controller monitors and updates K8s job objects.
  • K8s Job Controller Work Queue (Depth): The backlog of K8s job object events accumulated, that need to be processed by job controller.
  • EMR Spark Operator: A job submission model for Amazon EMR on EKS, which users can deploy and manage Spark applications with the Amazon EMR release runtime on the Amazon EKS clusters
  • Job types: Type of Spark job being submitted. It can be either batch job (having fixed job duration) or streaming job (continuously running job).
  • Spark Operator Workqueue: The central component in the Spark Operator's control loop, managing the flow of SparkApplication resources that need to be processed, ensuring efficient, ordered, and reliable handling of these resources.