"BGP is the routing protocol of the Internet. It is also, without exaggeration, one of the most critical and fragile pieces of infrastructure in the global telecommunications system." -- Kurose & Ross, Computer Networking: A Top-Down Approach, 9th ed., §5.4
Lecture (100 min -- first of two BGP weeks)
3.1 NET-201 BGP and What It Left Out
NET-201's BGP module built a working iBGP/eBGP topology in GNS3, observed path-vector advertisements, and sandboxed a prefix hijack. That module intentionally left three topics for NET-301:
- Route reflectors -- how iBGP scales to hundreds of routers without a full mesh
- BGP communities -- how carriers attach policy metadata to routes
- RPKI -- the cryptographic origin-validation overlay that makes BGP prefix origination auditable
Week 3 covers the first two. Week 4 (in this outline: the second BGP week, delivered without a separate week file) covers RPKI and prefix-hijacking detection. Lab 3 covers RPKI deployment.
3.2 The iBGP Full-Mesh Problem
BGP's split-horizon rule requires that a route learned from an iBGP peer not be re-advertised to another iBGP peer. This prevents loops within an AS, but it also means that in a full iBGP mesh, every router must peer directly with every other router. An AS with N routers requires N*(N-1)/2 iBGP sessions.
At NET-201 scale (5-10 routers): manageable. At carrier scale:
| Routers in AS | Full-mesh sessions |
|---|---|
| 10 | 45 |
| 50 | 1,225 |
| 100 | 4,950 |
| 500 | 124,750 |
A major carrier with 500 BGP-speaking routers needs 124,750 iBGP sessions. This is not operationally feasible.
3.3 Route Reflectors (RFC 4456)
Route Reflectors (RRs) solve the iBGP scaling problem by relaxing the split-horizon rule for designated servers. An RR can re-advertise routes received from one client to other clients and to non-client peers.
RR terminology:
| Term | Definition |
|---|---|
| RR Client | An iBGP peer that has been configured to "point to" the RR; forms a single session to the RR |
| Non-client | An iBGP peer that does NOT peer through the RR (maintains full-mesh with the RR and other non-clients) |
| Cluster | A set of RR + its clients; identified by CLUSTER_ID |
| ORIGINATOR_ID | BGP attribute added by the RR when reflecting a route; prevents re-advertisement loops |
| CLUSTER_LIST | BGP attribute carrying the list of CLUSTER_IDs a route has traversed; prevents reflection loops |
RR topology patterns:
Two-level hierarchy (most common in large ASes):
- Level 1 RRs: placed at major PoPs; each clients a set of edge routers
- Level 2 RRs: cluster of two or three, peering with all Level 1 RRs in full mesh
Pair-of-RRs (common in enterprise): two RRs, each with all other routers as clients; the two RRs peer with each other in iBGP. Provides redundancy with only N-1 client sessions per RR.
Caveat: route reflectors are a policy-propagation tool, not a traffic-forwarding tool. The RR reflects routing information; traffic still flows on whatever path the route specifies. Hot-potato routing, IGP metric differences, and NEXT_HOP resolution must all be considered when deploying RR hierarchies.
3.4 BGP Communities
A BGP community (RFC 1997) is a 32-bit attribute attached to a route, used to carry policy information between BGP peers. Format: AS_number:community_value (e.g., 65001:100).
Well-known communities:
| Community | Hex value | Meaning |
|---|---|---|
| NO_EXPORT | 0xFFFFFF01 | Do not advertise outside the AS |
| NO_ADVERTISE | 0xFFFFFF02 | Do not advertise to any BGP peer |
| NO_EXPORT_SUBCONFED | 0xFFFFFF03 | Do not advertise to eBGP peers (keep within confederation) |
| BLACKHOLE | 65535:666 (de facto) | Trigger remote triggered black hole at peers |
Large communities (RFC 8092): the 32-bit community format limits the AS-number field to 16 bits, causing issues since 4-byte ASNs became common. Large communities use a 96-bit format: {Global_Administrator}:{Local_Data_1}:{Local_Data_2}, all 32-bit fields.
Operator use cases:
| Use case | Community pattern |
|---|---|
| Route-tagging for origin AS | 65001:origin_code -- internal policy classification |
| Prepend control | 65000:prepend1 -- ask peer to prepend AS-path once when advertising to their customers |
| No-export to specific peer | 65000:nopeer_{peer_ASN} |
| RTBH (Remote Triggered Black Hole) | 65535:666 -- advertise a /32 with this community; peer drops all traffic destined to it |
3.5 RPKI: Resource Public Key Infrastructure
RPKI (RFC 6480) is the cryptographic infrastructure that allows the rightful holder of an IP prefix to publish a signed attestation of which AS is authorized to originate that prefix. This directly addresses the BGP prefix hijack attack class.
The BGP hijack problem:
BGP has no native mechanism to verify that the AS announcing a prefix actually owns it. An AS announcing 8.8.8.0/24 (Google's DNS) will receive traffic destined for Google's addresses, whether it owns the prefix or not. Real-world hijacks include:
- Pakistan Telecom's 2008 announcement of YouTube's prefix (brought down YouTube globally for ~2 hours)
- Rostelecom's 2020 announcement of routes for Amazon, Cloudflare, Akamai, and others (8,800 prefixes hijacked for approximately 1 hour)
- China Telecom's documented pattern of brief, low-volume route announcements consistent with traffic interception
ROA (Route Origin Authorization): an RPKI object signed by the prefix holder using their key material from the Regional Internet Registry (RIR). A ROA specifies:
- The prefix
- The maximum prefix length that can be announced (to prevent more-specific hijacks)
- The authorized origin AS
ROA example:
Prefix: 8.8.8.0/24
Max Length: 24
Origin AS: AS15169 (Google)
Signed by: Google's ARIN-issued certificate chain
RPKI validation states:
| State | Meaning | Action |
|---|---|---|
| Valid | The BGP announcement matches a ROA (same prefix, origin AS, and within max-length) | Accept; mark Valid |
| Invalid | A ROA exists for the prefix but the origin AS or prefix length does not match | Reject (if policy enforces); high-confidence hijack indicator |
| NotFound (Unknown) | No ROA exists for this prefix | Accept (conservative) or investigate |
RPKI-to-Router (RTR) protocol: routers do not perform RPKI validation themselves (certificate parsing is expensive). Instead, a validator cache (Routinator, Fort, OctoRPKI, RTRR) fetches and validates RPKI data from all five RIRs (ARIN, RIPE, APNIC, LACNIC, AFRINIC), then serves the validated ROA table to routers via the lightweight RTR protocol.
RIR repositories → Validator Cache (Routinator) → RTR → FRR/Cisco/Juniper router
→ Policy: reject Invalid
3.6 Detecting Prefix Hijacks in Production Traffic
Even with RPKI deployed, detection requires active monitoring:
Real-time BGP monitoring services:
- RIPE RIS (Routing Information Service): BGP route collector network; API for querying current RIB state
- RouteViews: similar collector network operated by University of Oregon
- Cloudflare Radar and BGPStream: commercial + open tools for BGP event detection
MOAS (Multiple Origin AS) detection: if a prefix is simultaneously announced by two different ASes, this is either a hijack or a misconfiguration. BGP looking glasses (RIPEstat, HE BGP) flag these in near-real-time.
Prefix more-specific detection: a hijacker often announces a more-specific prefix (/25 vs /24) to attract traffic via the longest-prefix-match rule. Monitoring for unexpected more-specific announcements of your prefixes is a standard carrier practice.
Time-series analysis: legitimate routes are stable. A prefix appearing in the global DFZ (Default-Free Zone) for only a few minutes, then disappearing, is a hijack signature. Tools like bgpmon.net and Kentik alert on these patterns.
BGP at Scale: The Kurose-Ross Framing
Kurose-Ross 9e §5.4 covers BGP as "inter-AS routing." At NET-301 depth, the lesson is not just how BGP works mechanically but why its design guarantees make it simultaneously the Internet's most critical infrastructure and its most attackable one. BGP was designed in an era of trusted peers; it has no origin authentication because authentication was assumed to be a social problem (contracts between ISPs), not a cryptographic problem. RPKI retrofits cryptographic authentication onto a 1990s trust model. That architectural debt -- the reason RPKI exists -- is worth naming explicitly: a protocol designed for trusted parties, now running the Internet's routing system at a scale its designers never anticipated.
Lab 3 Introduction
Lab 3 deploys Routinator against the live RPKI repositories and integrates it with an FRR router in Containerlab via the RTR protocol. You will query the Routinator REST API to observe ROA validity for several prefixes (including some with Invalid state), configure FRR to drop Invalid routes via a route-map, and simulate a prefix hijack: announce a /24 with a different origin AS and verify the FRR router rejects it as Invalid.
Independent Practice (6 hr)
- Kurose-Ross 9e §5.4 -- re-read with focus on iBGP, communities, and RPKI sections
- RFC 4456 §§1-4 (Route Reflectors -- architecture sections)
- RFC 8092 §§1-3 (Large BGP Communities)
- RFC 6480 §§1-3 (RPKI -- architecture overview)
- Read the "Pakistan Telecom hijacks YouTube" post-mortem (NANOG archives; 2008; ~20 min)
- Lab 3 -- Part A (Routinator setup + ROA validity queries)