Classroom Glossary Public page

NET-201 Week 12 -- Cloud Networking: VXLAN, EVPN, Overlays, and Containerlab Fabric

1,138 words

"The datacenter network is the computer. When you understand that, you understand why VXLAN-EVPN exists: it is the network operating system for the hyperscale infrastructure layer." -- Ivan Pepelnjak, Data Center Networking: Beyond Hype (blog, 2021)


Lecture (50 min)

12.1 Why Traditional VLANs Fall Short in the Datacenter

The traditional Ethernet switching model (VLANs + STP) was designed for campus networks with tens to hundreds of hosts. Modern cloud datacenters run tens of thousands of virtual machines, often migrated live across physical hosts (vMotion / live migration).

Three scaling problems with VLAN + STP in the datacenter:

  1. 4,096 VLAN limit: the 12-bit VID in 802.1Q allows 4,096 VLANs. A large cloud provider with hundreds of tenants, each requiring network isolation, exhausts the VLAN space.

  2. STP topology constraints: STP was designed for loop prevention in tree topologies. Modern datacenters use leaf-spine topologies (every leaf switch connects to every spine; every spine connects to every leaf) for high redundancy and ECMP (Equal-Cost Multi-Path) forwarding. STP blocks redundant paths, eliminating the benefit of leaf-spine.

  3. MAC table scale: every physical host may run dozens of VMs. With 10,000 physical hosts, the top-of-rack and spine switches must maintain MAC tables for potentially hundreds of thousands of entries. Hardware CAM tables have limits.

12.2 VXLAN: Overlay Tunneling

VXLAN (Virtual Extensible LAN, RFC 7348) encapsulates Layer-2 Ethernet frames inside UDP/IP packets. Two physical hosts can create a logical Ethernet segment between them without the physical switches understanding anything about the overlay topology.

VXLAN encapsulation:

Original Ethernet Frame:
  [Eth Hdr][IP Hdr][TCP Hdr][Payload]

VXLAN-encapsulated packet on the wire:
  [Outer Eth Hdr][Outer IP Hdr][UDP Hdr port 4789][VXLAN Hdr 8B][Inner Eth Hdr][IP Hdr][TCP Hdr][Payload]

VXLAN Header (8 bytes):
  Flags (8 bits; bit I=1 means VNI valid)
  Reserved (24 bits)
  VNI: VXLAN Network Identifier (24 bits; 16 million possible segments)
  Reserved (8 bits)

The VNI (VXLAN Network Identifier) is 24 bits: 16,777,216 possible overlay segments. This solves the 4,096 VLAN limit. Each tenant gets one or more VNIs; overlay segments are independent of physical VLAN assignments.

VTEP (VXLAN Tunnel Endpoint): the device that encapsulates/decapsulates VXLAN. In a software-defined datacenter, VTEPs run in the hypervisor (Open vSwitch); in a hardware datacenter, VTEPs run in the ToR switches.

12.3 EVPN: Control Plane for VXLAN

VXLAN by itself requires either flooding (replicate all BUM -- Broadcast, Unknown unicast, Multicast -- traffic to all VTEPs) or a centralized controller to manage MAC-to-VTEP mappings. Both approaches scale poorly.

EVPN (Ethernet VPN, RFC 7432 + RFC 8365) uses BGP as a control plane for VXLAN. VTEPs advertise MAC/IP addresses as BGP EVPN route types. Other VTEPs receive these advertisements and build their MAC-to-VTEP forwarding tables without flooding.

EVPN route types (key ones for this course):

Route Type Purpose
Type 2 (MAC/IP Advertisement) Advertises a MAC address (+ optionally IP) and its host VTEP
Type 3 (Inclusive Multicast Ethernet Tag) VTEP announces membership in a VNI for BUM traffic replication
Type 5 (IP Prefix Route) Advertises IP prefixes for inter-VNI / inter-subnet routing

With EVPN, when a VM's MAC address is learned on a VTEP, it is immediately advertised via BGP to all other VTEPs in the fabric. Remote VTEPs install the MAC-to-VTEP mapping and can send unicast VXLAN directly without flooding.

FRRouting EVPN configuration:

router bgp 65000
 bgp router-id 10.0.0.1
 neighbor SPINES peer-group
 neighbor SPINES remote-as 65100
 neighbor 10.100.1.1 peer-group SPINES
 !
 address-family l2vpn evpn
  neighbor SPINES activate
  advertise-all-vni
 exit-address-family
!
# Verify EVPN routes
show bgp l2vpn evpn
show bgp l2vpn evpn route type 2   # MAC/IP routes
show evpn vni                       # VNI status and learned MACs

12.4 Leaf-Spine Topology and ECMP

Modern datacenters use a leaf-spine (Clos network) fabric:

  • Every leaf switch connects to every spine switch (full bipartite graph between leaf tier and spine tier)
  • No switch-to-switch connections within a tier
  • Any leaf-to-leaf path has exactly 2 hops (leaf -> spine -> leaf) for a 2-tier fabric

This topology has several properties valuable for datacenters:

  • Predictable latency: all paths have equal hop count (2 hops for 2-tier Clos)
  • ECMP: multiple equal-cost paths exist between any two leaves; load is spread across all paths simultaneously
  • No STP: leaf-spine with eBGP underlay (each leaf is its own AS) has no Layer-2 loops; STP is disabled; all links carry traffic

ECMP with BGP:

# Enable ECMP in FRR BGP (allow up to 64 equal-cost paths)
router bgp 65001
 maximum-paths 64
 maximum-paths ibgp 64

# Verify multiple next-hops for a prefix
show ip bgp 10.0.0.0/8
# Should show multiple Next Hops if ECMP is active

12.5 Containerlab for Fabric Simulation

Containerlab can simulate a leaf-spine fabric using FRRouting containers. The topology YAML defines the nodes (leaves + spines) and links between them; Containerlab manages the veth pairs and container networking.

# topo-fabric.clab.yml
name: fabric
topology:
  nodes:
    spine1:
      kind: linux
      image: frrouting/frr:latest
    spine2:
      kind: linux
      image: frrouting/frr:latest
    leaf1:
      kind: linux
      image: frrouting/frr:latest
    leaf2:
      kind: linux
      image: frrouting/frr:latest
    leaf3:
      kind: linux
      image: frrouting/frr:latest
  links:
    - endpoints: ["leaf1:eth1", "spine1:eth1"]
    - endpoints: ["leaf1:eth2", "spine2:eth1"]
    - endpoints: ["leaf2:eth1", "spine1:eth2"]
    - endpoints: ["leaf2:eth2", "spine2:eth2"]
    - endpoints: ["leaf3:eth1", "spine1:eth3"]
    - endpoints: ["leaf3:eth2", "spine2:eth3"]

Each leaf and spine gets eBGP underlay configured (per RFC 7938 pattern); EVPN overlay configured via FRR's address-family l2vpn evpn.

Lab 11 builds this 2-spine, 3-leaf fabric; configures eBGP underlay; adds VXLAN + EVPN overlay; verifies MAC learning without flooding.


Lab Preview

Lab 11 deploys a VXLAN-EVPN fabric in Containerlab:

  • Deploy the 5-node (2 spine, 3 leaf) Containerlab topology
  • Configure eBGP underlay on each leaf-spine link (unique AS per leaf, shared spine AS)
  • Configure VXLAN VNI 100 on leaf1 and leaf2; attach host containers to each
  • Enable EVPN on all nodes; verify Type 2 (MAC/IP) routes propagate between leaves
  • Ping between hosts on different leaves; capture VXLAN-encapsulated traffic on the spine links
  • Verify ECMP by adding a second spine and confirming traffic uses both spine paths

Homework

Reading (45 min): Kurose-Ross 9e Ch 6.6 (Data Center Networking). Focus on the motivation for datacenter-specific architectures, the concept of fat-tree topologies, and load balancing across multiple paths. Then skim RFC 7938 (Use of BGP for Routing in Large-Scale Data Centers) -- read the Abstract + Section 1 (Introduction) for the rationale behind eBGP underlay.

Hands-on (60 min): Explore VXLAN encapsulation using Linux kernel VXLAN interfaces:

# Create a VXLAN interface (VTEP) on Linux
sudo ip link add vxlan100 type vxlan id 100 \
  dstport 4789 remote 192.168.1.2 local 192.168.1.1 dev eth0
sudo ip link set vxlan100 up
sudo ip addr add 10.100.0.1/24 dev vxlan100

# Capture VXLAN traffic
sudo tcpdump -i eth0 -n "udp port 4789" -w /tmp/vxlan.pcap

# In Wireshark: VXLAN is auto-decoded; look for the inner Ethernet frame

On a second VM, create a matching VXLAN100 interface pointing back to the first VM's IP. Ping across the VXLAN tunnel and capture the encapsulation.


Toolchain Diary Entry

Deepen this week: Containerlab fabric topologies; EVPN verification commands

containerlab deploy -t TOPOLOGY.yaml: deploy a Containerlab topology.

containerlab destroy -t TOPOLOGY.yaml: tear down a topology and clean up containers.

containerlab inspect -t TOPOLOGY.yaml --format json: list all nodes with management IPs in JSON.

show bgp l2vpn evpn (FRR vtysh): show all EVPN BGP routes.

show evpn vni detail (FRR vtysh): show VNI status, MAC count, remote VTEPs.

show evpn mac vni 100 (FRR vtysh): show all MACs learned in VNI 100.

ip link show type vxlan: show all VXLAN interfaces on a Linux host.

bridge fdb show dev vxlan100: show forwarding database for a VXLAN interface; includes remote VTEP entries.

tcpdump -i eth0 -n "udp port 4789": capture raw VXLAN traffic on the underlay interface.


Key Terms

  • VXLAN (RFC 7348): Virtual Extensible LAN; Layer-2-in-UDP encapsulation; 24-bit VNI (16M segments); runs on UDP port 4789; solves VLAN scale limit in datacenter fabrics
  • VNI: VXLAN Network Identifier; 24-bit tenant/segment identifier in the VXLAN header; analogous to VLAN ID but with 16x larger namespace
  • VTEP: VXLAN Tunnel Endpoint; device that encapsulates/decapsulates VXLAN; implemented in hypervisors (Open vSwitch) or hardware switches (Arista, Cisco Nexus, etc.)
  • EVPN (RFC 7432 + RFC 8365): Ethernet VPN; BGP address family used as the control plane for VXLAN; advertises MAC/IP bindings as route types; eliminates BUM flooding
  • BUM traffic: Broadcast, Unknown unicast, Multicast; traffic that must be replicated to all members of a network segment; the scaling challenge that EVPN's Type 3 routes address
  • Leaf-spine (Clos fabric): datacenter topology where every leaf connects to every spine; 2-hop predictable latency; supports ECMP across all spine links; no STP required
  • ECMP: Equal-Cost Multi-Path routing; forwarding traffic across multiple equal-cost paths simultaneously; fundamental to leaf-spine fabric throughput
  • eBGP underlay: using eBGP between every leaf and spine (RFC 7938 pattern); provides loop-free routing + ECMP without STP; each leaf gets its own AS number