eBPF: Reshaping Linux Infrastructure at Scale

The Problem with the Old Way

For decades, if you wanted to change how the Linux kernel handled packets, filtered syscalls, or traced performance bottlenecks, you had two options: submit a kernel patch and wait years for it to land in distributions, or write a kernel module. Kernel modules are powerful but dangerous. A bug in a module crashes the host. They are kernel-version-specific, so a module built for 5.15 breaks on 6.1. Deploying them across a heterogeneous fleet means maintaining dozens of builds. For a company running hundreds of thousands of servers, this is not an engineering problem — it is a liability.

eBPF solves this. It lets you inject custom logic into the running kernel — for networking, security, or observability — safely, portably, and without rebooting anything.

What eBPF Actually Is

BPF (Berkeley Packet Filter) was created in 1992 to make tcpdump fast. Instead of copying every packet to userspace for filtering, BPF ran a tiny virtual machine inside the kernel to filter packets in place. It was narrow by design.

In 2014, Alexei Starovoitov and Daniel Borkmann landed extended BPF in Linux 3.18. The instruction set was redesigned for 64-bit architectures, the register count expanded, and — critically — the set of hook points exploded beyond packet filtering. Today you can attach eBPF programs to network ingress and egress paths, tracepoints, kprobes (dynamic kernel function probes), uprobes (userspace function probes), and syscall entry/exit points. The kernel became programmable.

The safety model is what makes this usable in production. You write an eBPF program in restricted C, or using Go or Rust libraries that emit eBPF bytecode. Before the program runs, the kernel's verifier statically analyzes every possible execution path: no unbounded loops, no out-of-bounds memory access, no unsafe pointer arithmetic. Programs that fail verification are rejected outright. Programs that pass are JIT-compiled to native machine code — so there is no interpreter overhead at runtime. The result is code running inside the kernel at near-native speed, with safety guarantees the kernel enforces automatically.

XDP and the Networking Revolution

The most dramatic eBPF capability for infrastructure teams is XDP — eXpress Data Path. XDP hooks run before the kernel's networking stack processes a packet. Before sk_buff allocation. Before routing lookup. Before anything. An XDP program can drop, redirect, or modify a packet in the NIC driver layer at over 100 million packets per second on commodity hardware.

For DDoS defense, this changes everything. A volumetric attack that would saturate the kernel's normal network path — filling socket queues, exhausting CPU on interrupt handling — gets dropped at the driver level before any of that overhead occurs. Cloudflare uses XDP-based eBPF programs to absorb terabit-scale DDoS attacks. The packet never reaches the stack. The host stays up.

Meta went further. They replaced their entire load balancing infrastructure — previously built on IPVS — with Katran, an open-source eBPF/XDP load balancer. Katran runs on every server in Meta's data centers, handling Facebook-scale traffic without dedicated load balancer appliances. The flexibility of eBPF means they can update load balancing logic by loading a new program, not by rebooting or redeploying hardware.

Kubernetes Networking via Cilium

The Kubernetes networking problem is hard. Every pod needs an IP. Pods need to communicate across nodes. Network policies need to be enforced. Service load balancing needs to happen. The traditional answer was iptables — a rules engine that does not scale. At a few thousand rules, iptables lookup becomes O(n) and CPU consumption climbs visibly. At tens of thousands of pods, it breaks down.

Cilium replaces iptables entirely with eBPF. Pod-to-pod routing, service load balancing, and network policy enforcement all happen in eBPF programs attached to network interfaces. Lookups are O(1) via eBPF hash maps. Policy enforcement happens in the kernel's fast path. Cilium also understands HTTP, gRPC, and Kafka at Layer 7 — because eBPF programs can inspect packet payloads, not just headers.

The adoption is telling. AWS EKS, Google GKE, and Azure AKS all offer Cilium as a default or recommended CNI. Kubernetes clusters running hundreds of nodes with thousands of pods are using eBPF for every packet decision, and the iptables bottleneck is gone.

Observability Without Instrumentation

Traditional APM tools require code changes: add a library, wrap functions, redeploy. eBPF observability requires none of that. You attach a kprobe or uprobe to any kernel or userspace function and collect data — latency, arguments, return values, stack traces — without modifying the application.

Netflix uses bpftrace in production for exactly this. bpftrace is a high-level scripting language for eBPF, comparable to what DTrace was on Solaris. A one-liner can trace every disk I/O over 1ms on a production host, or histogram TCP connection latencies, or find which processes are causing the most context switches — all with overhead measured in single-digit percentages, not the 10–30% cost of traditional profiling.

Pixie takes this further for Kubernetes, auto-instrumenting every service in a cluster using eBPF to capture request/response data, latency distributions, and error rates without any per-service configuration. No sidecars. No SDK integration. The observability is built into the kernel layer.

Security Enforcement at the Kernel Level

seccomp-BPF has been filtering syscalls in Linux containers for years — it is what Docker, Chrome, and Firefox use to restrict what system calls a sandboxed process can make. That is the narrow end of eBPF security.

The broader end is runtime security enforcement. Falco uses eBPF to watch every syscall in a Kubernetes cluster and alert on suspicious behavior — a container spawning a shell, a process writing to /etc, a network connection to an unexpected IP. Tetragon, from the Cilium project, goes further: it can not only detect policy violations in real time but enforce them by killing the offending process before the syscall completes. The policy logic runs in the kernel via eBPF. There is no race window between detection and response.

Portability: CO-RE and BTF

One remaining complaint about eBPF was kernel version lock-in. An eBPF program compiled against kernel 5.15 headers might not work on 6.1 if struct layouts changed. CO-RE — Compile Once, Run Everywhere — solves this. With BTF (BPF Type Format), the kernel exposes its internal type information at runtime. The eBPF loader uses BTF to relocate field accesses at load time, adapting the compiled program to whatever kernel is actually running. A single eBPF binary can now run across an entire mixed-kernel fleet without recompilation.

What Comes Next

Microsoft is actively porting eBPF to Windows via the ebpf-for-windows project. SmartNIC vendors are adding hardware offload for eBPF programs, so XDP filtering can run on the NIC itself, freeing host CPUs entirely. User-space eBPF runtimes are maturing, enabling eBPF-style sandboxed extensibility outside the kernel.

The underlying pattern is consistent: a programmable extension mechanism with strong safety guarantees beats static kernel code for anything that needs to evolve at production speed. eBPF did not just improve Linux networking and observability — it changed the model for how kernel behavior gets extended. The infrastructure teams who understood that early are running faster, safer, and at higher scale than those still writing iptables rules and kernel modules.

eBPF: How Linux Got a Safe, Fast, Programmable Kernel