Amazon VPC CNI¶
Amazon EKS implements cluster networking through the Amazon VPC Container Network Interface(VPC CNI) plugin. The CNI plugin allows Kubernetes Pods to have the same IP address as they do on the VPC network. More specifically, all containers inside the Pod share a network namespace, and they can communicate with each-other using local ports.
Amazon VPC CNI has two components:
- CNI Binary, which will setup Pod network to enable Pod-to-Pod communication. The CNI binary runs on a node root file system and is invoked by the kubelet when a new Pod gets added to, or an existing Pod removed from the node.
- ipamd, a long-running node-local IP Address Management (IPAM) daemon and is responsible for:
- managing ENIs on a node, and
- maintaining a warm-pool of available IP addresses or prefix
When an instance is created, EC2 creates and attaches a primary ENI associated with a primary subnet. The primary subnet may be public or private. The Pods that run in hostNetwork mode use the primary IP address assigned to the node primary ENI and share the same network namespace as the host.
The CNI plugin manages Elastic Network Interfaces (ENI) on the node. When a node is provisioned, the CNI plugin automatically allocates a pool of slots (IPs or Prefix’s) from the node’s subnet to the primary ENI. This pool is known as the warm pool, and its size is determined by the node’s instance type. Depending on CNI settings, a slot may be an IP address or a prefix. When a slot on an ENI has been assigned, the CNI may attach additional ENIs with warm pool of slots to the nodes. These additional ENIs are called Secondary ENIs. Each ENI can only support a certain number of slots, based on instance type. The CNI attaches more ENIs to instances based on the number of slots needed, which usually corresponds to the number of Pods. This process continues until the node can no longer support additional ENI. The CNI also pre-allocates “warm” ENIs and slots for faster Pod startup. Note each instance type has a maximum number of ENIs that may be attached. This is one constraint on Pod density (number of Pods per node), in addition to compute resources.
The maximum number of network interfaces, and the maximum number of slots that you can use varies by the type of EC2 Instance. Since each Pod consumes an IP address on a slot, the number of Pods you can run on a particular EC2 Instance depends on how many ENIs can be attached to it and how many slots each ENI supports. We suggest setting the maximum Pods per EKS user guide to avoid exhaustion of the instance’s CPU and memory resources. Pods using
hostNetwork are excluded from this calculation. You may consider using a script called max-pod-calculator.sh to calculate EKS’s recommended maximum Pods for a given instance type.
Secondary IP mode is the default mode for VPC CNI. This guide provides a generic overview of VPC CNI behavior when Secondary IP mode is enabled. The functionality of ipamd (allocation of IP addresses) may vary depending on the configuration settings for VPC CNI, such as Prefix Mode, Security Groups Per Pod, and Custom Networking.
The Amazon VPC CNI is deployed as a Kubernetes Daemonset named aws-node on worker nodes. When a worker node is provisioned, it has a default ENI, called the primary ENI, attached to it. The CNI allocates a warm pool of ENIs and secondary IP addresses from the subnet attached to the node’s primary ENI. By default, ipamd attempts to allocate an additional ENI to the node. The IPAMD allocates additional ENI when a single Pod is scheduled and assigned a secondary IP address from the primary ENI. This "warm" ENI enables faster Pod networking. As the pool of secondary IP addresses runs out, the CNI adds another ENI to assign more.
The number of ENIs and IP addresses in a pool are configured through environment variables called WARM_ENI_TARGET, WARM_IP_TARGET, MINIMUM_IP_TARGET. The
aws-node Daemonset will periodically check that a sufficient number of ENIs are attached. A sufficient number of ENIs are attached when all of the
MINIMUM_IP_TARGET conditions are met. If there are insufficient ENIs attached, the CNI will make an API call to EC2 to attach more until
MAX_ENI limit is reached.
WARM_ENI_TARGET- Integer, Values >0 indicate requirement Enabled
- The number of Warm ENIs to be maintained. An ENI is “warm” when it is attached as a secondary ENI to a node, but it is not in use by any Pod. More specifically, no IP addresses of the ENI have been associated with a Pod.
- Example: Consider an instance with 2 ENIs, each ENI supporting 5 IP addresses. WARM_ENI_TARGET is set to 1. If exactly 5 IP addresses are associated with the instance, the CNI maintains 2 ENIs attached to the instance. The first ENI is in use, and all 5 possible IP addresses of this ENI are used. The second ENI is “warm” with all 5 IP addresses in pool. If another Pod is launched on the instance, a 6th IP address will be needed. The CNI will assign this 6th Pod an IP address from the second ENI and from 5 IPs from the pool. The second ENI is now in use, and no longer in a “warm” status. The CNI will allocate a 3rd ENI to maintain at least 1 warm ENI.
The warm ENIs still consume IP addresses from the CIDR of your VPC. IP addresses are “unused” or “warm” until they are associated with a workload, such as a Pod.
WARM_IP_TARGET, Integer, Values >0 indicate requirement Enabled
- The number of Warm IP addresses to be maintained. A Warm IP is available on an actively attached ENI, but has not been assigned to a Pod. In other words, the number of Warm IPs available is the number of IPs that may be assigned to a Pod without requiring an additional ENI.
- Example: Consider an instance with 1 ENI, each ENI supporting 20 IP addresses. WARM_IP_TARGET is set to 5. WARM_ENI_TARGET is set to 0. Only 1 ENI will be attached until a 16th IP address is needed. Then, the CNI will attach a second ENI, consuming 20 possible addresses from the subnet CIDR.
MINIMUM_IP_TARGET, Integer, Values >0 indicate requirement Enabled
- The minimum number of IP addresses to be allocated at any time. This is commonly used to front-load the assignment of multiple ENIs at instance launch.
- Example: Consider a newly launched instance. It has 1 ENI and each ENI supports 10 IP addresses. MINIMUM_IP_TARGET is set to 100. The ENI immediately attaches 9 more ENIs for a total of 100 addresses. This happens regardless of any WARM_IP_TARGET or WARM_ENI_TARGET values.
This project includes a Subnet Calculator Excel Document. This calculator document simulates the IP address consumption of a specified workload under different ENI configuration options, such as
When Kubelet receives an add Pod request, the CNI binary queries ipamd for an available IP address, which ipamd then provides to the Pod. The CNI binary wires up the host and Pod network.
Pods deployed on a node are, by default, assigned to the same security groups as the primary ENI. Alternatively, Pods may be configured with different security groups.
As the pool of IP addresses is depleted, the plugin automatically attaches another elastic network interface to the instance and allocates another set of secondary IP addresses to that interface. This process continues until the node can no longer support additional elastic network interfaces.
When a Pod is deleted, VPC CNI places the Pod’s IP address in a 30-second cool down cache. The IPs in a cool down cache are not assigned to new Pods. When the cooling-off period is over, VPC CNI moves Pod IP back to the warm pool. The cooling-off period prevents Pod IP addresses from being recycled prematurely and allows kube-proxy on all cluster nodes to finish updating the iptables rules. When the number of IPs or ENIs exceeds the number of warm pool settings, the ipamd plugin returns IPs and ENIs to the VPC.
As described above in Secondary IP mode, each Pod receives one secondary private IP address from one of the ENIs attached to an instance. Since each Pod uses an IP address, the number of Pods you can run on a particular EC2 Instance depends on how many ENIs can be attached to it and how many IP addresses it supports. The VPC CNI checks the limits file to find out how many ENIs and IP addresses are allowed for each type of instance.
You can use the following formula to determine maximum number of Pods you can deploy on a node.
(Number of network interfaces for the instance type × (the number of IP addresses per network interface - 1)) + 2
The +2 indicates Pods that require host networking, such as kube-proxy and VPC CNI. Amazon EKS requires kube-proxy and VPC CNI to be operating on each node, and these requirements are factored into the max-pods value. If you want to run additional host networking pods, consider updating the max-pods value.
The +2 indicates Kubernetes Pods that use host networking, such as kube-proxy and VPC CNI. Amazon EKS requires kube-proxy and VPC CNI to be running on every node and are calculated towards max-pods. Consider updating max-pods if you plan to run more host networking Pods. You can specify
--kubelet-extra-args "—max-pods=110" as user data in the launch template.
As an example, on a cluster with 3 c5.large nodes (3 ENIs and max 10 IPs per ENI), when the cluster starts up and has 2 CoreDNS pods, the CNI will consume 49 IP addresses and keeps them in warm pool. The warm pool enables faster Pod launches when the application is deployed.
Node 1 (with CoreDNS pod): 2 ENIs, 20 IPs assigned
Node 2 (with CoreDNS pod): 2 ENIs, 20 IPs assigned
Node 3 (no Pod): 1 ENI. 10 IPs assigned.
Keep in mind that infrastructure pods, often running as daemon sets, each contribute to the max-pod count. These can include:
- Amazon Elastic LoadBalancer
- Operational pods for metrics-server
We suggest that you plan your infrastructure by combining these Pods' capacities. For a list of the maximum number of Pods supported by each instance type, see eni-max-Pods.txt on GitHub.
Use Secondary IP Mode when¶
Secondary IP mode is an ideal configuration option for ephemeral EKS clusters. Greenfield customers who are either new to EKS or in the process of migrating can take advantage of VPC CNI in secondary mode.
Avoid Secondary IP Mode when¶
If you are experiencing Pod density issues, we suggest enabling prefix mode. If you are facing IPv4 depletion issues, we advise migrating to IPv6 clusters. If IPv6 is not on the horizon, you may choose to use custom networking.
Deploy VPC CNI Managed Add-On¶
When you provision a cluster, Amazon EKS installs VPC CNI automatically. Amazon EKS nevertheless supports managed add-ons that enable the cluster to interact with underlying AWS resources such as computing, storage, and networking. We highly recommend that you deploy clusters with managed add-ons including VPC CNI.
Amazon EKS managed add-on offer VPC CNI installation and management for Amazon EKS clusters. Amazon EKS add-ons include the latest security patches, bug fixes, and are validated by AWS to work with Amazon EKS. The VPC CNI add-on enables you to continuously ensure the security and stability of your Amazon EKS clusters and decrease the amount of effort required to install, configure, and update add-ons. Additionally, a managed add-on can be added, updated, or deleted via the Amazon EKS API, AWS Management Console, AWS CLI, and eksctl.
You can find the managed fields of VPC CNI using
--show-managed-fields flag with the
kubectl get command.
Managed add-ons prevents configuration drift by automatically overwriting configurations every 15 minutes. This means that any changes to managed add-ons, made via the Kubernetes API after add-on creation, will overwrite by the automated drift-prevention process and also set to defaults during add-on update process.
The fields managed by EKS are listed under managedFields with manager as EKS. Fields managed by EKS include service account, image, image url, liveness probe, readiness probe, labels, volumes, and volume mounts.
The most frequently used fields such as WARM_ENI_TARGET, WARM_IP_TARGET, and MINIMUM_IP_TARGET are not managed and will not be reconciled. The changes to these fields will be preserved upon updating of the add-on.
We suggest testing the add-on behavior in your non-production clusters for a specific configuration before updating production clusters. Additionally, follow the steps in the EKS user guide for add-on configurations.
Migrate to Managed Add-On¶
You will manage the version compatibility and update the security patches of self-managed VPC CNI. To update a self-managed add-on, you must use the Kubernetes APIs and instructions outlined in the EKS user guide. We recommend migrating to a managed add-on for existing EKS clusters and highly suggest creating a backup of your current CNI settings prior to migration. To configure managed add-ons, you can utilize the Amazon EKS API, AWS Management Console, or AWS Command Line Interface.
Amazon EKS will replace the CNI configuration settings if the field is listed as managed with default settings. We caution against modifying the managed fields. The add-on does not reconcile configuration fields such as the warm environment variables and CNI modes. The Pods and applications will continue to run while you migrate to a managed CNI.
Backup CNI Settings Before Update¶
VPC CNI runs on customer data plane (nodes), and hence Amazon EKS does not automatically update the add-on (managed and self-managed) when new versions are released or after you update your cluster to a new Kubernetes minor version. To update the add-on for an existing cluster, you must trigger an update via update-addon API or clicking update now link in the EKS console for add-ons. If you have deployed self-managed add-on, follow steps mentioned under updating self-managed VPC CNI add-on.
We strongly recommend that you update one minor version at a time. For example, if your current minor version is
1.9 and you want to update to
1.11, you should update to the latest patch version of
1.10 first, then update to the latest patch version of
Perform an inspection of the aws-node Daemonset before updating Amazon VPC CNI. Take a backup of existing settings. If using a managed add-on, confirm that you have not updated any settings that Amazon EKS might override. We recommend a post update hook in your automation workflow or a manual apply step after an add-on update.
For a self-managed add-on, compare the backup with
releases on GitHub to see the available versions and familiarize yourself with the changes in the version that you want to update to. We recommend using Helm to manage self-managed add-ons and leverage values files to apply settings. Any update operations involving Daemonset delete will result in application downtime and must be avoided.
Understand Security Context¶
We strongly suggest you to understand the security contexts configured for managing VPC CNI efficiently. Amazon VPC CNI has two components CNI binary and ipamd (aws-node) Daemonset. The CNI runs as a binary on a node and has access to node root file system, also has privileged access as it deals with iptables at the node level. The CNI binary is invoked by the kubelet when Pods gets added or removed.
The aws-node Daemonset is a long-running process responsible for IP address management at the node level. The aws-node runs in
hostNetwork mode and allows access to the loopback device, and network activity of other pods on the same node. The aws-node init-container runs in privileged mode and mounts the CRI socket allowing the Daemonset to monitor IP usage by the Pods running on the node. Amazon EKS is working to remove the privileged requirement of aws-node init container. Additionally, the aws-node needs to update NAT entries and to load the iptables modules and hence runs with NET_ADMIN privileges.
Amazon EKS recommends deploying the security policies as defined by the aws-node manifest for IP management for the Pods and networking settings. Please consider updating to the latest version of VPC CNI. Furthermore, please consider opening a GitHub issue if you have a specific security requirement.
Use separate IAM role for CNI¶
The AWS VPC CNI requires AWS Identity and Access Management (IAM) permissions. The CNI policy needs to be set up before the IAM role can be used. You can use
AmazonEKS_CNI_Policy, which is an AWS managed policy for IPv4 clusters. AmazonEKS CNI managed policy only has permissions for IPv4 clusters. You must create a separate IAM policy for IPv6 clusters with the permissions listed here.
By default, VPC CNI inherits the Amazon EKS node IAM role (both managed and self-managed node groups).
Configuring a separate IAM role with the relevant policies for Amazon VPC CNI is strongly recommended. If not, the pods of Amazon VPC CNI gets the permission assigned to the node IAM role and have access to the instance profile assigned to the node.
The VPC CNI plugin creates and configures a service account called aws-node. By default, the service account binds to the Amazon EKS node IAM role with Amazon EKS CNI policy attached. To use the separate IAM role, we recommend that you create a new service account with Amazon EKS CNI policy attached. To use a new service account you must redeploy the CNI pods. Consider specifying a
--service-account-role-arn for VPC CNI managed add-on when creating new clusters. Make sure you remove Amazon EKS CNI policy for both IPv4 and IPv6 from Amazon EKS node role.
It is advised that you block access instance metadata to minimize the blast radius of security breach.
Handle Liveness/Readiness Probe failures¶
We advise increasing the liveness and readiness probe timeout values (default
timeoutSeconds: 10) for EKS 1.20 an later clusters to prevent probe failures from causing your application's Pod to become stuck in a containerCreating state. This problem has been seen in data-intensive and batch-processing clusters. High CPU use causes aws-node probe health failures, leading to unfulfilled Pod CPU requests. In addition to modifying the probe timeout, ensure that the CPU resource requests (default
CPU: 25m) for aws-node are correctly configured. We do not suggest updating the settings unless your node is having issues.
We highly encourage you to run sudo
bash /opt/cni/bin/aws-cni-support.sh on a node while you engage Amazon EKS support. The script will assist in evaluating kubelet logs and memory utilization on the node. Please consider installing SSM Agent on Amazon EKS worker nodes to run the script.
Monitor IP Address Inventory¶
You can monitor the IP addresses inventory of subnets using CNI Metrics Helper.
- maximum number of ENIs the cluster can support
- number of ENIs already allocated
- number of IP addresses currently assigned to Pods
- total and maximum number of IP address available
You can also set CloudWatch alarms to get notified if a subnet is running out of IP addresses. Please visit EKS user guide for install instructions of CNI metrics helper. Make sure DISABLE_METRICS variable for VPC CNI is set to false.
Plan for Growth¶
We strongly recommend to size your subnets for growth. Right sizing the subnets will prevent your subnets from running out of IP addresses as your Pods and nodes scale. You will not be able to create new Pods or nodes if the subnets don’t have enough available IP addresses. If you need to limit the IP addresses the CNI caches, then you can use warm pool CNI environment variables. If subnets in your VPC run out of available IP addresses, we suggest creating a new subnet using the VPC’s original CIDR blocks.
Use the sample EKS Subnet Calculator spreadsheet to plan your network. The spreadsheet calculates IP usage based on workloads and VPC ENI configuration. The IP usage is compared to an IPv4 subnet to determine if the configuration and subnet size is sufficient for your workload.
Configure IP and ENI Target values in address constrained environments¶
Improving your VPC design is the recommended response to IP address exhaustion. Consider solutions like IPv6 and Secondary CIDRs. Adjusting these values to minimize the number of Warm IPs should be a temporary solution after other options are excluded. Misconfiguring these values may interfere with cluster operation.
In the default configuration, VPC CNI keeps an entire ENI (and associated IPs) in the warm pool. This may consume a large number of IPs, especially on larger instance types.
If your cluster subnet has a limited number of free IP addresses, scrutinize these VPC CNI configuration environment variables:
Configure the value of MINIMUM_IP_TARGET to closely match the number of Pods you expect to run on your nodes. Doing so will ensure that as Pods get created, and the CNI can assign IP addresses from the warm pool without calling the EC2 API.
Avoid setting the value of WARM_IP_TARGET too low as it will cause additional calls to the EC2 API, and that might cause throttling of the requests. For large clusters use along with MINIMUM_IP_TARGET to avoid throttling of the requests.
To configure these options, download
aws-k8s-cni.yaml and set the environment variables. At the time of writing, the latest release is located here. Check the version of the configuration value matches the installed VPC CNI version.
The warm settings will be reset to defaults when you update the CNI. Please take a backup of the CNI, before you update the CNI. Review the configuration settings to determine if you need to reapply them after update is successful.
Configure Warm ENI value for Batch Workloads¶
We recommend updating the default
WARM_ENI_TARGET to match the Pod scale needs for batch workloads. Setting
WARM_ENI_TARGET to a high value always maintains the warm IP pool required to run large batch workloads and hence avoid data processing delays.
Configure IPTables Forward Policy on non-EKS Optimized AMI Instances¶
If you are using custom AMI, make sure to set iptables forward policy to ACCEPT under kubelet.service. Many systems set the iptables forward policy to DROP. You can build custom AMI using HashiCorp Packer and a build specification with resources and configuration scripts from the Amazon EKS AMI repository on AWS GitHub. You can update the kubelet.service and follow the instructions specified here to create a custom AMI.
Routinely Upgrade CNI Version¶
The VPC CNI is backward compatible. The latest version works with all Amazon EKS supported Kubernetes versions. Additionally, the VPC CNI is offered as an EKS add-on (see “Deploy VPC CNI Managed Add-On” above). While EKS add-ons orchestrates upgrades of add-ons, it will not automatically upgrade add-ons like the CNI because they run on the data plane. You are responsible for upgrading the VPC CNI add-on following managed and self-managed worker node upgrades.