Classroom Public page

Week 7: TCP II -- Flow Control and Congestion Control

1,088 words

TCP does more than guarantee delivery. It also adapts to the receiver's capacity (flow control) and to the network's capacity (congestion control). This week you trace these mechanisms through packet captures and see what a congested or constrained TCP session looks like on the wire.


Theme

A fast sender can overwhelm a slow receiver. TCP prevents this with the window size field: the receiver advertises how much buffer space it has, and the sender limits how much unacknowledged data it puts in flight to that window size. Separately, the network itself can become congested. TCP addresses this with congestion control: a set of algorithms that reduce the sending rate when the network drops packets, and carefully increase it again. This week you see both mechanisms in the bytes.

Reading (~60 minutes)

  1. Stevens TCP/IP Illustrated Ch 19 ("TCP Interactive Data Flow"): Nagle algorithm; delayed ACKs; small-segment behavior
  2. Stevens TCP/IP Illustrated Ch 20 ("TCP Bulk Data Flow"): sliding windows; window scaling; slow start; congestion avoidance
  3. Optional: Kurose & Ross Ch 3 §3.6 (Principles of Congestion Control): the congestion-control problem in the abstract

Lecture outline (~2 hours)

Section 1: The sliding window

  • Flow control: TCP's mechanism to prevent a fast sender from overwhelming a slow receiver
  • The receiver's window size field (rwnd) in each TCP segment says: "I have this many bytes of buffer space available; send no more than this much unacknowledged data."
  • Sender must not have more than min(rwnd, cwnd) bytes unacknowledged at any moment (cwnd is the congestion window; see below)
  • As the receiver ACKs data, the window slides forward: the sender can send more
  • Window size is 16 bits (max 65535 bytes); with the window-scale option (negotiated in SYN/SYN-ACK), this can be scaled up by a factor of up to 2^14

Reading window-size changes in Wireshark:

  • Column: "Window" in the packet list (customizable)
  • Display filter: tcp.window_size_value < 5000 to find packets with very small windows
  • A window that drops to 0: "zero window" -- the sender must pause until the receiver advertises a non-zero window
  • Wireshark expert information: "TCP Zero Window" and "TCP Window Full" appear as info items in the expert analysis view

Section 2: Delayed ACKs and Nagle algorithm

  • Delayed ACKs: instead of ACKing every segment immediately, the receiver waits up to 200ms to see if a second segment arrives; if so, it sends one ACK covering both. Reduces ACK traffic by up to 50%.
  • Nagle algorithm: the sender accumulates small data chunks and sends them as one segment rather than many tiny segments. Good for bulk transfers; bad for interactive protocols (SSH, gaming) where each keystroke needs immediate delivery.
  • Disabling Nagle: TCP_NODELAY socket option. Applications where latency matters more than efficiency (SSH, Telnet, real-time games) set this option.
  • In a capture: look for small segments arriving rapidly and then a single larger ACK, or vice versa.

Section 3: Slow start

  • When a TCP connection starts, the sender does not know how much bandwidth the network can support
  • Slow start: begin with a small congestion window (cwnd, typically 10 MSS = ~14 KB) and double it each RTT until either (a) the receiver's window limits it or (b) packet loss is detected
  • This is called "slow start" because it starts conservatively, but the doubling is exponential: cwnd goes 10 -> 20 -> 40 -> 80 MSS in just 4 RTTs
  • Slow start threshold (ssthresh): when cwnd reaches ssthresh, switch from exponential growth to linear growth (congestion avoidance)

Section 4: Congestion avoidance and loss detection

  • Congestion avoidance: once cwnd >= ssthresh, increase cwnd by 1 MSS per RTT instead of doubling. Linear growth toward the network's capacity.
  • Loss detection: TCP interprets packet loss as a signal of congestion
    • Timeout: if an ACK does not arrive within the retransmission timeout (RTO), retransmit the segment; cut cwnd to 1 MSS; restart slow start
    • Triple duplicate ACK (fast retransmit): if the sender receives three ACKs for the same sequence number, something after that was lost; retransmit the lost segment immediately without waiting for timeout; cut cwnd in half (less aggressive than timeout recovery)
  • AIMD (Additive Increase, Multiplicative Decrease): the high-level description of congestion control behavior -- add linearly in good times, cut multiplicatively on loss. This is what keeps millions of TCP flows sharing the Internet fairly.

Section 5: What congestion looks like in a capture

  • Retransmissions: Wireshark marks retransmitted segments as "[TCP Retransmission]" in the info column
  • Duplicate ACKs: Wireshark marks these as "[TCP Dup ACK]"
  • Out-of-order segments: "[TCP Out-of-Order]"
  • Zero window: "[TCP Zero Window]" -- the receiver is full; the sender must pause
  • Window update: "[TCP Window Update]" -- the receiver is telling the sender it has more space available

Display filters for troubleshooting:

  • tcp.analysis.retransmission -- retransmissions
  • tcp.analysis.duplicate_ack -- duplicate ACKs
  • tcp.analysis.zero_window -- zero-window conditions
  • tcp.analysis.out_of_order -- out-of-order segments

Labs (~90 minutes)

Lab 7-1: TCP Window and Flow Control (labs/lab-7-1-tcp-window.md)

Independent practice (~7 hours)

  1. Read Stevens Ch 19-20 in full; work through the window-size trace examples
  2. Load fundamentals-http-get.pcap in pcap-tools. In each TCP segment: what is the window size advertised by the receiver? Does it change across the connection? Explain what you observe.
  3. Apply tcp.analysis.retransmission to tall-100-frames.pcap. Do any retransmissions appear? If so, how long after the original segment?
  4. Look up "TCP CUBIC" and "TCP BBR" -- two modern congestion-control algorithms. How do they differ from the classic AIMD algorithm? Which does Linux use by default?
  5. Read about "buffer bloat" (https://www.bufferbloat.net/). What is it, and why did congestion control not fully solve it?

Reflection prompts (~30 minutes)

  1. Slow start is exponential. Why does TCP start conservatively (small initial window) rather than testing the full network bandwidth immediately?
  2. TCP treats packet loss as a signal of congestion. What happens on a lossy wireless link where packet loss is due to radio interference, not congestion? Is TCP's response correct in that case?
  3. AIMD cuts the window in half on each congestion event. If two TCP flows compete for the same bottleneck link, will they converge to an equal share? Why or why not?
  4. A "zero window" condition freezes the sender until the receiver opens its window. What application-level behavior causes a receiver's window to drop to zero?
  5. TCP's congestion control was designed for a world where the Internet's bottleneck was long-haul links. Today, many bottlenecks are in mobile radio links with highly variable capacity. What design assumptions of classical TCP break down in this environment?

What comes next

Week 8 goes deep into DNS: recursive resolvers, authoritative name servers, the full delegation chain, and how dig +trace shows you the complete path from the root servers to the final answer.