Ever wondered what actually happens when you run docker run? Under the hood, Linux containers aren’t magic — they’re built on kernel features called namespaces and cgroups. In this hands-on guide, we’ll demystify containers by building one from scratch using raw Linux namespace APIs. By the end, you’ll understand exactly how Docker, Podman, and other container runtimes isolate processes.
What Are Linux Namespaces?
Namespaces are a Linux kernel feature that partitions system resources so that one set of processes sees one set of resources while another set sees a different set. There are eight types of namespaces in modern Linux (kernel 6.x+):
- PID — Process ID isolation
- NET — Network stack isolation
- MNT — Mount point isolation
- UTS — Hostname isolation
- IPC — Inter-process communication isolation
- USER — User/group ID mapping
- CGROUP — Cgroup root isolation
- TIME — Clock isolation (added in kernel 5.6)
Each namespace type creates an independent instance of a particular global resource. When a process is placed inside a namespace, it can only see and interact with resources in that namespace.
Prerequisites
You’ll need a Linux system (Ubuntu 22.04+ or Fedora 38+ recommended) with:
# Check your kernel version (need 5.6+)
uname -r
# Install required tools
sudo apt install -y util-linux debootstrapStep 1: Creating a UTS Namespace (Hostname Isolation)
The simplest namespace to start with is UTS, which isolates the hostname. The unshare command lets us create namespaces from the command line:
# Create a new UTS namespace and run a shell inside it
sudo unshare --uts /bin/bash
# Inside the new namespace, change the hostname
hostname my-container
hostname
# Output: my-container
# Open another terminal and check — host's hostname is unchanged!
hostname
# Output: your-original-hostnameThat’s isolation in action. The hostname change inside the namespace doesn’t affect the host system.
Step 2: PID Namespace — Process Isolation
PID namespaces give the contained process its own PID tree, where it thinks it’s PID 1:
# Create PID + UTS namespace with a new proc mount
sudo unshare --pid --uts --mount-proc --fork /bin/bash
# Check processes inside the namespace
ps aux
# You'll only see the bash process and ps itself!
# The bash process thinks it's PID 1
echo $$
# Output: 1This is exactly how containers see their own process tree. PID 1 inside the container is just a regular process on the host with a different PID.
Step 3: Mount Namespace — Filesystem Isolation
Now let’s create an isolated filesystem. We’ll use debootstrap to create a minimal Debian root filesystem:
# Create a minimal root filesystem
sudo mkdir -p /tmp/mycontainer
sudo debootstrap --variant=minbase bookworm /tmp/mycontainer
# Enter a fully isolated namespace with its own root
sudo unshare --pid --uts --mount --fork /bin/bash
# Pivot to the new root filesystem
mount --bind /tmp/mycontainer /tmp/mycontainer
cd /tmp/mycontainer
mkdir -p old_root
pivot_root . old_root
# Mount essential filesystems
mount -t proc proc /proc
mount -t sysfs sys /sys
mount -t tmpfs tmp /tmp
# Unmount old root
umount -l /old_root
rmdir /old_root
# You're now in your own container!
ls /
cat /etc/os-releaseStep 4: Network Namespace — Network Isolation
Network namespaces give each container its own network stack. Here’s how to set up a virtual ethernet pair to connect a namespace to the host:
# Create a named network namespace
sudo ip netns add mycontainer
# Create a virtual ethernet pair
sudo ip link add veth-host type veth peer name veth-container
# Move one end into the namespace
sudo ip link set veth-container netns mycontainer
# Configure the host side
sudo ip addr add 10.0.0.1/24 dev veth-host
sudo ip link set veth-host up
# Configure the container side
sudo ip netns exec mycontainer ip addr add 10.0.0.2/24 dev veth-container
sudo ip netns exec mycontainer ip link set veth-container up
sudo ip netns exec mycontainer ip link set lo up
# Test connectivity
sudo ip netns exec mycontainer ping -c 3 10.0.0.1
# Success! Container can reach the host
# Enable internet access via NAT
sudo iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -j MASQUERADE
sudo ip netns exec mycontainer ip route add default via 10.0.0.1Step 5: Putting It All Together with a Script
Let’s write a minimal container runtime in Bash that combines everything:
#!/bin/bash
# mini-container.sh — A minimal container runtime
set -e
ROOTFS="/tmp/mycontainer"
HOSTNAME="mini-container"
if [ "$1" = "child" ]; then
# We're inside the new namespaces now
hostname "$HOSTNAME"
# Set up mounts
mount -t proc proc "$ROOTFS/proc"
mount -t sysfs sys "$ROOTFS/sys"
mount -t tmpfs tmp "$ROOTFS/tmp"
# Pivot root
cd "$ROOTFS"
mkdir -p .old_root
pivot_root . .old_root
umount -l /.old_root 2>/dev/null || true
rmdir /.old_root 2>/dev/null || true
# Set resource limits with cgroups v2
if [ -d /sys/fs/cgroup ]; then
mkdir -p /sys/fs/cgroup/mini-container
echo "104857600" > /sys/fs/cgroup/mini-container/memory.max # 100MB
echo "50000 100000" > /sys/fs/cgroup/mini-container/cpu.max # 50% CPU
echo $$ > /sys/fs/cgroup/mini-container/cgroup.procs
fi
# Run the command
exec "${@:2}"
else
# Parent: create namespaces and re-exec as child
exec unshare \
--pid \
--uts \
--mount \
--net \
--ipc \
--fork \
"$0" child "${@:-/bin/bash}"
fi# Usage
sudo chmod +x mini-container.sh
sudo ./mini-container.sh /bin/bash
# You're now inside your DIY container!
hostname # → mini-container
ps aux # → only your processes
ls / # → isolated filesystemUser Namespaces: Rootless Containers
The most powerful namespace for security is the USER namespace, which lets you run containers without root privileges. This is how Podman runs rootless containers:
# Create a user namespace where your UID maps to root inside
unshare --user --map-root-user --pid --mount-proc --fork /bin/bash
# Inside: you appear to be root
whoami
# Output: root
id
# Output: uid=0(root) gid=0(root)
# But on the host, you're still your regular user!
# This is rootless containers in actionThis is why Podman is considered more secure than Docker by default — it uses user namespaces to avoid running anything as real root.
How Docker Uses Namespaces
When you run docker run -it ubuntu bash, Docker’s runc runtime does essentially what we did above:
- Creates PID, NET, MNT, UTS, IPC, and optionally USER namespaces
- Sets up a layered filesystem (OverlayFS) as the root
- Configures cgroups for resource limits
- Creates veth pairs for networking
- Applies seccomp and AppArmor profiles
- Executes the specified command as PID 1
You can inspect Docker’s namespace usage on any running container:
# Find the container's PID on the host
docker inspect --format '{{.State.Pid}}' my-container
# List its namespaces
sudo ls -la /proc/<PID>/ns/
# You'll see links to each namespace typePractical Debugging Tips
Understanding namespaces helps you debug container issues:
# Enter a running container's namespaces directly
sudo nsenter --target <PID> --mount --uts --ipc --net --pid /bin/bash
# List all namespaces on the system
lsns
# Check which namespace a process belongs to
ls -la /proc/self/ns/Key Takeaways
- Containers are not VMs — they’re just processes with namespace isolation
- Namespaces provide the isolation boundary (what you can see)
- Cgroups provide resource limits (how much you can use)
- User namespaces enable rootless containers for better security
- Understanding these primitives makes you better at debugging container issues and designing secure architectures
Next time someone asks “what is a container?”, you can confidently say: it’s a process running in its own set of Linux namespaces with cgroup resource limits. No magic involved.

Leave a Reply