"The datacenter network is the computer. When you understand that, you understand why VXLAN-EVPN exists: it is the network operating system for the hyperscale infrastructure layer." -- Ivan Pepelnjak, Data Center Networking: Beyond Hype (blog, 2021)
Lecture (50 min)
12.1 Why Traditional VLANs Fall Short in the Datacenter
The traditional Ethernet switching model (VLANs + STP) was designed for campus networks with tens to hundreds of hosts. Modern cloud datacenters run tens of thousands of virtual machines, often migrated live across physical hosts (vMotion / live migration).
Three scaling problems with VLAN + STP in the datacenter:
-
4,096 VLAN limit: the 12-bit VID in 802.1Q allows 4,096 VLANs. A large cloud provider with hundreds of tenants, each requiring network isolation, exhausts the VLAN space.
-
STP topology constraints: STP was designed for loop prevention in tree topologies. Modern datacenters use leaf-spine topologies (every leaf switch connects to every spine; every spine connects to every leaf) for high redundancy and ECMP (Equal-Cost Multi-Path) forwarding. STP blocks redundant paths, eliminating the benefit of leaf-spine.
-
MAC table scale: every physical host may run dozens of VMs. With 10,000 physical hosts, the top-of-rack and spine switches must maintain MAC tables for potentially hundreds of thousands of entries. Hardware CAM tables have limits.
12.2 VXLAN: Overlay Tunneling
VXLAN (Virtual Extensible LAN, RFC 7348) encapsulates Layer-2 Ethernet frames inside UDP/IP packets. Two physical hosts can create a logical Ethernet segment between them without the physical switches understanding anything about the overlay topology.
VXLAN encapsulation:
Original Ethernet Frame:
[Eth Hdr][IP Hdr][TCP Hdr][Payload]
VXLAN-encapsulated packet on the wire:
[Outer Eth Hdr][Outer IP Hdr][UDP Hdr port 4789][VXLAN Hdr 8B][Inner Eth Hdr][IP Hdr][TCP Hdr][Payload]
VXLAN Header (8 bytes):
Flags (8 bits; bit I=1 means VNI valid)
Reserved (24 bits)
VNI: VXLAN Network Identifier (24 bits; 16 million possible segments)
Reserved (8 bits)
The VNI (VXLAN Network Identifier) is 24 bits: 16,777,216 possible overlay segments. This solves the 4,096 VLAN limit. Each tenant gets one or more VNIs; overlay segments are independent of physical VLAN assignments.
VTEP (VXLAN Tunnel Endpoint): the device that encapsulates/decapsulates VXLAN. In a software-defined datacenter, VTEPs run in the hypervisor (Open vSwitch); in a hardware datacenter, VTEPs run in the ToR switches.
12.3 EVPN: Control Plane for VXLAN
VXLAN by itself requires either flooding (replicate all BUM -- Broadcast, Unknown unicast, Multicast -- traffic to all VTEPs) or a centralized controller to manage MAC-to-VTEP mappings. Both approaches scale poorly.
EVPN (Ethernet VPN, RFC 7432 + RFC 8365) uses BGP as a control plane for VXLAN. VTEPs advertise MAC/IP addresses as BGP EVPN route types. Other VTEPs receive these advertisements and build their MAC-to-VTEP forwarding tables without flooding.
EVPN route types (key ones for this course):
| Route Type | Purpose |
|---|---|
| Type 2 (MAC/IP Advertisement) | Advertises a MAC address (+ optionally IP) and its host VTEP |
| Type 3 (Inclusive Multicast Ethernet Tag) | VTEP announces membership in a VNI for BUM traffic replication |
| Type 5 (IP Prefix Route) | Advertises IP prefixes for inter-VNI / inter-subnet routing |
With EVPN, when a VM's MAC address is learned on a VTEP, it is immediately advertised via BGP to all other VTEPs in the fabric. Remote VTEPs install the MAC-to-VTEP mapping and can send unicast VXLAN directly without flooding.
FRRouting EVPN configuration:
router bgp 65000
bgp router-id 10.0.0.1
neighbor SPINES peer-group
neighbor SPINES remote-as 65100
neighbor 10.100.1.1 peer-group SPINES
!
address-family l2vpn evpn
neighbor SPINES activate
advertise-all-vni
exit-address-family
!
# Verify EVPN routes
show bgp l2vpn evpn
show bgp l2vpn evpn route type 2 # MAC/IP routes
show evpn vni # VNI status and learned MACs
12.4 Leaf-Spine Topology and ECMP
Modern datacenters use a leaf-spine (Clos network) fabric:
- Every leaf switch connects to every spine switch (full bipartite graph between leaf tier and spine tier)
- No switch-to-switch connections within a tier
- Any leaf-to-leaf path has exactly 2 hops (leaf -> spine -> leaf) for a 2-tier fabric
This topology has several properties valuable for datacenters:
- Predictable latency: all paths have equal hop count (2 hops for 2-tier Clos)
- ECMP: multiple equal-cost paths exist between any two leaves; load is spread across all paths simultaneously
- No STP: leaf-spine with eBGP underlay (each leaf is its own AS) has no Layer-2 loops; STP is disabled; all links carry traffic
ECMP with BGP:
# Enable ECMP in FRR BGP (allow up to 64 equal-cost paths)
router bgp 65001
maximum-paths 64
maximum-paths ibgp 64
# Verify multiple next-hops for a prefix
show ip bgp 10.0.0.0/8
# Should show multiple Next Hops if ECMP is active
12.5 Containerlab for Fabric Simulation
Containerlab can simulate a leaf-spine fabric using FRRouting containers. The topology YAML defines the nodes (leaves + spines) and links between them; Containerlab manages the veth pairs and container networking.
# topo-fabric.clab.yml
name: fabric
topology:
nodes:
spine1:
kind: linux
image: frrouting/frr:latest
spine2:
kind: linux
image: frrouting/frr:latest
leaf1:
kind: linux
image: frrouting/frr:latest
leaf2:
kind: linux
image: frrouting/frr:latest
leaf3:
kind: linux
image: frrouting/frr:latest
links:
- endpoints: ["leaf1:eth1", "spine1:eth1"]
- endpoints: ["leaf1:eth2", "spine2:eth1"]
- endpoints: ["leaf2:eth1", "spine1:eth2"]
- endpoints: ["leaf2:eth2", "spine2:eth2"]
- endpoints: ["leaf3:eth1", "spine1:eth3"]
- endpoints: ["leaf3:eth2", "spine2:eth3"]
Each leaf and spine gets eBGP underlay configured (per RFC 7938 pattern); EVPN overlay configured via FRR's address-family l2vpn evpn.
Lab 11 builds this 2-spine, 3-leaf fabric; configures eBGP underlay; adds VXLAN + EVPN overlay; verifies MAC learning without flooding.
Lab Preview
Lab 11 deploys a VXLAN-EVPN fabric in Containerlab:
- Deploy the 5-node (2 spine, 3 leaf) Containerlab topology
- Configure eBGP underlay on each leaf-spine link (unique AS per leaf, shared spine AS)
- Configure VXLAN VNI 100 on leaf1 and leaf2; attach host containers to each
- Enable EVPN on all nodes; verify Type 2 (MAC/IP) routes propagate between leaves
- Ping between hosts on different leaves; capture VXLAN-encapsulated traffic on the spine links
- Verify ECMP by adding a second spine and confirming traffic uses both spine paths
Homework
Reading (45 min): Kurose-Ross 9e Ch 6.6 (Data Center Networking). Focus on the motivation for datacenter-specific architectures, the concept of fat-tree topologies, and load balancing across multiple paths. Then skim RFC 7938 (Use of BGP for Routing in Large-Scale Data Centers) -- read the Abstract + Section 1 (Introduction) for the rationale behind eBGP underlay.
Hands-on (60 min): Explore VXLAN encapsulation using Linux kernel VXLAN interfaces:
# Create a VXLAN interface (VTEP) on Linux
sudo ip link add vxlan100 type vxlan id 100 \
dstport 4789 remote 192.168.1.2 local 192.168.1.1 dev eth0
sudo ip link set vxlan100 up
sudo ip addr add 10.100.0.1/24 dev vxlan100
# Capture VXLAN traffic
sudo tcpdump -i eth0 -n "udp port 4789" -w /tmp/vxlan.pcap
# In Wireshark: VXLAN is auto-decoded; look for the inner Ethernet frame
On a second VM, create a matching VXLAN100 interface pointing back to the first VM's IP. Ping across the VXLAN tunnel and capture the encapsulation.
Toolchain Diary Entry
Deepen this week: Containerlab fabric topologies; EVPN verification commands
containerlab deploy -t TOPOLOGY.yaml: deploy a Containerlab topology.
containerlab destroy -t TOPOLOGY.yaml: tear down a topology and clean up containers.
containerlab inspect -t TOPOLOGY.yaml --format json: list all nodes with management IPs in JSON.
show bgp l2vpn evpn (FRR vtysh): show all EVPN BGP routes.
show evpn vni detail (FRR vtysh): show VNI status, MAC count, remote VTEPs.
show evpn mac vni 100 (FRR vtysh): show all MACs learned in VNI 100.
ip link show type vxlan: show all VXLAN interfaces on a Linux host.
bridge fdb show dev vxlan100: show forwarding database for a VXLAN interface; includes remote VTEP entries.
tcpdump -i eth0 -n "udp port 4789": capture raw VXLAN traffic on the underlay interface.
Key Terms
- VXLAN (RFC 7348): Virtual Extensible LAN; Layer-2-in-UDP encapsulation; 24-bit VNI (16M segments); runs on UDP port 4789; solves VLAN scale limit in datacenter fabrics
- VNI: VXLAN Network Identifier; 24-bit tenant/segment identifier in the VXLAN header; analogous to VLAN ID but with 16x larger namespace
- VTEP: VXLAN Tunnel Endpoint; device that encapsulates/decapsulates VXLAN; implemented in hypervisors (Open vSwitch) or hardware switches (Arista, Cisco Nexus, etc.)
- EVPN (RFC 7432 + RFC 8365): Ethernet VPN; BGP address family used as the control plane for VXLAN; advertises MAC/IP bindings as route types; eliminates BUM flooding
- BUM traffic: Broadcast, Unknown unicast, Multicast; traffic that must be replicated to all members of a network segment; the scaling challenge that EVPN's Type 3 routes address
- Leaf-spine (Clos fabric): datacenter topology where every leaf connects to every spine; 2-hop predictable latency; supports ECMP across all spine links; no STP required
- ECMP: Equal-Cost Multi-Path routing; forwarding traffic across multiple equal-cost paths simultaneously; fundamental to leaf-spine fabric throughput
- eBGP underlay: using eBGP between every leaf and spine (RFC 7938 pattern); provides loop-free routing + ECMP without STP; each leaf gets its own AS number