AppMesh Troubleshooting Guide¶
Common Errors¶
Exceeded pod count per VirtualNode/VirtualGateway limit¶
AppMesh limits pod count per virtualNode and virtualGateway. By default the limit is 50.
Your can adjust this limit by adjust the "Connected Envoy processes per virtual node" service quota.
Namespaces is not labeled correctly¶
Namespaces must be labeled with two kind of labels:
appmesh.k8s.aws/sidecarInjectorWebhook: enabled
is required on namespaces where pod should be injected with envoy sidecars.- customized labels to make
mesh
CustomResource selects the namespace viamesh.spec.namespaceSelector
. (optional if you have a single Mesh selects all namespaces)
Troubleshooting¶
Tail the controller logs:
export APPMESH_SYSTEM_NAMESPACE=appmesh-system
kubectl logs -n "${APPMESH_SYSTEM_NAMESPACE}" -f --since 10s \
$(kubectl get pods -n "${APPMESH_SYSTEM_NAMESPACE}" -o name | grep controller)
Tail envoy logs:
export APPLICATION_NAMESPACE=<your namespace>
export APPLICATION=<your pod or deployment> # i.e. deploy/my-app
kubectl logs -n "${APPLICATION_NAMESPACE} "${APPLICATION_POD}" envoy -f --since 10s
View envoy configuration:
export APPLICATION_NAMESPACE=<your namespace>
export APPLICATION=<your pod>
kubectl port-forward -n "${APPLICATION_NAMESPACE}" \
$(kubectl get pod -n "${APPLICATION_NAMESPACE}" | grep "${APPLICATION}" |awk '{print $1}') \
9901
Then navigate to localhost:9901/
for the index or localhost:9901/config_dump
for the envoy config.
VirtualGateway - Common Issues¶
"failed to find matching virtualGateway for gatewayRoute: gateway-route-headers, expecting 1 but found 0"
mTLS - Common Issues¶
Envoy fails to boot up when SDS based mTLS is enabled¶
When SDS based mTLS is enabled at the controller level via enable-sds
flag, controller expects to find SDS Provider’s UDS at path specified by sds-uds-path
. It is set to a default value of /run/spire/sockets/agent.sock
which is the default SPIRE Agent’s UDS path. Make sure that SDS Provider on the local node is up and running and UDS is active. Currently, SPIRE is the only supported SDS provider. Please check if SPIRE Agent is up and running on the same node as the problematic Envoy.
You can use the below command to figure out the exact reason of the envoy bootup issue. If the error is due to not being able to mount SDS provider's UDS socket then you would need to address that.
kubectl describe pod <pod-name> -n <namespace-name>
Pod is up and running but Envoy doesn’t have any certs in SDS mode.¶
- To begin with, check if
APPMESH_SDS_SOCKET_PATH
env variable is present under Envoy and if it has the correct UDS path value. If it is missing, then the controller didn’t inject the env variable. Check ifenable-sds
is set totrue
for the controller.
For example, when using SPIRE Agent you should see something like below in Envoy.
APPMESH_SDS_SOCKET_PATH: /run/spire/sockets/agent.sock
- If the above env variable is present with the correct value, then check if Envoy is able to communicate with the SDS Provider. Below command will help verify if the Envoy is able to reach out to the local SDS Provider via the UDS path passed in to the controller and also to see if it is healthy.
kubectl exec -it <pod-name> -n <namespace-name> -c envoy -- curl http://localhost:9901/clusters | grep -E '(static_cluster_sds.*cx_active|static_cluster_sds.*healthy)'
static_cluster_sds_unix_socket::/run/spire/sockets/agent.sock::cx_active::1
static_cluster_sds_unix_socket::/run/spire/sockets/agent.sock::health_flags::healthy
- If the SDS cluster is healthy in Envoy, then check if the workload entry tied to this particular Pod/app container is registered with SPIRE Server and if the selectors match. Use the below command to list out all the registered workload entries.
kubectl exec -n spire spire-server-0 -- /opt/spire/bin/spire-server entry show
Pod liveness and readiness probes fail when mTLS is enabled¶
HTTP and TCP health checks from the kubelet will not work as is, if mutual TLS is enabled as the kubelet doesn't have relevant certs.
Workarounds:
- Expose the health check endpoint on a different port and skip mTLS for that port. You can then set
appmesh.k8s.aws/ports
annotation with the application port value on the deployment spec.
Example: If your main application port is 8080 and if health check endpoint is exposed on 8081, then appmesh.k8s.aws/ports:8080
will help bypass mTLS for health check port.
SDS cluster is present in Envoy's config even though corresponding VirtualNode doesn't have mTLS SDS config¶
Set appmesh.k8s.aws/sds:disabled
for the deployments behind VirtualNodes without SDS config.