HTTP is the application-layer protocol that delivers web pages, APIs, and most of the traffic on the Internet. This week you read a complete HTTP/1.1 conversation at the packet level and complete a full tour of the academy pcap-tools analysis workbench.
Theme
HTTP is a text-based request-response protocol. The client sends a request line (method + path + version), a set of headers, and optionally a body. The server sends a status line (version + status code + reason phrase), a set of headers, and optionally a body. That structure has been stable since HTTP/1.1 was standardized in 1997. HTTP/2 changed the framing (from text to binary), and HTTP/3 moved the transport from TCP to QUIC -- but the request-response semantic is the same. This week you read the original text format in a capture and understand every field.
Figure 9.1. Week 9 sits at the amber core of the onion. By now your captures show four nested rings of headers wrapping every payload your browser writes. This week you finally read what is inside the innermost ring: the HTTP text the application layer wrote on top of TCP, IP, and Ethernet.
Reading (~45 minutes)
- Kurose & Ross Ch 2 §2.2 ("The Web and HTTP"): HTTP request and response; persistent connections; HTTP/2 and HTTP/3 briefly
- Stevens TCP/IP Illustrated Ch 24 ("HTTP") if available in your edition; otherwise: RFC 7230 §2-3 (HTTP/1.1 Message Syntax and Routing): the request-line, header fields, message-body
- Optional: the MDN Web Docs page "HTTP Messages" (
https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages) -- a visual complement to the RFCs
Lecture outline (~2 hours)
Section 1: HTTP request anatomy
An HTTP/1.1 GET request looks like this on the wire:
GET /index.html HTTP/1.1\r\n
Host: virtuscyberacademy.org\r\n
User-Agent: Mozilla/5.0 ...\r\n
Accept: text/html,application/xhtml+xml\r\n
Connection: close\r\n
\r\n
- Request line: method, path (URI), version. Separated by spaces; terminated with CRLF.
- Headers:
Field-Name: valuepairs; each terminated with CRLF. Case-insensitive field names. - Blank line: CRLF with no header before it signals the end of the headers. Anything after is the body (for POST, PUT, PATCH).
- The blank line is mandatory. HTTP parsers look for the CRLFCRLF sequence to find the boundary between headers and body.
Common HTTP methods:
| Method | When used |
|---|---|
| GET | Retrieve a resource; no body |
| POST | Send data to be processed; has a body |
| PUT | Replace a resource; has a body |
| DELETE | Remove a resource; no body |
| HEAD | Retrieve headers only (no body in response) |
| OPTIONS | Discover which methods the server supports |
Section 2: HTTP response anatomy
HTTP/1.1 200 OK\r\n
Content-Type: text/html; charset=utf-8\r\n
Content-Length: 1234\r\n
Server: nginx\r\n
\r\n
<!DOCTYPE html>
...body...
- Status line: version, status code, reason phrase.
- Headers: same format as the request.
- Blank line: separates headers from body.
- Body: the response content; length determined by
Content-Lengthheader (orTransfer-Encoding: chunked).
HTTP status code classes:
| Range | Class | Examples |
|---|---|---|
| 2xx | Success | 200 OK, 201 Created, 204 No Content |
| 3xx | Redirection | 301 Moved Permanently, 302 Found, 304 Not Modified |
| 4xx | Client error | 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found |
| 5xx | Server error | 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable |
Section 3: Key HTTP headers
| Header | Direction | Meaning |
|---|---|---|
| Host | Request | Which virtual host is being requested (required in HTTP/1.1) |
| User-Agent | Request | Client identification string |
| Accept | Request | Acceptable response media types |
| Content-Type | Both | Media type of the body |
| Content-Length | Both | Length of the body in bytes |
| Authorization | Request | Credentials (Basic, Bearer, Digest) |
| Cookie | Request | Stored cookies to send to the server |
| Set-Cookie | Response | Cookies the server wants the client to store |
| Location | Response | Target URL for a redirect |
| Cache-Control | Both | Caching directives |
| Connection | Both | close or keep-alive |
In a capture: Wireshark decodes all standard HTTP headers in the protocol tree under "Hypertext Transfer Protocol."
Section 4: Persistent connections and pipelining
- HTTP/1.0: one request per TCP connection; connection closes after each response. High overhead.
- HTTP/1.1: persistent connections by default (
Connection: keep-alive). Multiple requests over the same TCP connection. The server sends responses in order. - Pipelining: a client can send multiple requests without waiting for the first response. Rarely used in practice because of head-of-line blocking: if the first response is slow, all subsequent responses are delayed.
- HTTP/2: binary framing; multiplexes many requests over one TCP connection without head-of-line blocking; header compression (HPACK). The response to pipelining's limitations.
- HTTP/3: moves transport from TCP to QUIC (UDP-based); eliminates TCP-level head-of-line blocking.
Section 5: curl as a diagnostic tool
curl lets you send HTTP requests and inspect responses from the command line:
curl -v https://virtuscyberacademy.org/ # verbose: shows headers + body
curl -I https://virtuscyberacademy.org/ # HEAD request: headers only
curl -o /dev/null -w "%{http_code}" https://virtuscyberacademy.org/ # status code only
curl -H "Accept: application/json" https://api.example.com/data # custom header
curl --http1.1 -v https://virtuscyberacademy.org/ # force HTTP/1.1
In a capture, curl traffic looks identical to browser traffic at the protocol level. The difference is in the User-Agent header.
Labs (~90 minutes)
Lab 9-1: HTTP GET Trace (labs/lab-9-1-http-get.md)
Lab 9-2: pcap-tools Workbench Tour (labs/lab-9-2-pcap-tools-tour.md)
Independent practice (~7 hours)
- Read Kurose & Ross Ch 2 §2.2 in full; pay attention to the persistent-connection discussion
- Open your browser's developer tools (F12), go to the Network tab, and load
https://virtuscyberacademy.org/. How many HTTP requests does a single page load generate? What is the most common status code? Are there any 3xx redirects? - Load
fundamentals-http-get.pcapin pcap-tools. Find the HTTP request packet. What method? What path? WhatHostheader? Find the HTTP 200 OK response. WhatContent-Type? WhatContent-Length? - Load
http-get.pcap(the upstream-mirrored Wireshark sample). Compare it to the fundamentals version. Are there any differences in the headers? In the body? - Run
curl -v --http1.1 https://virtuscyberacademy.org/. Write down every header in the request and response. Identify any headers that a security analyst would examine when assessing a web server.
Reflection prompts (~30 minutes)
- HTTP/1.1 headers are plain text. The value of a header can be read by anyone who can see the TCP stream. For a site served over plain HTTP (not HTTPS), what information can a passive observer learn from the headers alone?
- HTTP/2 compresses headers using HPACK. One consequence: if the same header appears in many requests, it is sent only once. How could this compression create a security problem? (Look up "CRIME attack.")
- The
User-Agentheader identifies the client software. Websites use it for browser compatibility decisions. What are the privacy implications of a detailed User-Agent string? - A 301 redirect says "this resource has permanently moved to a new URL." A 302 says "temporarily moved." How does a browser handle each? How does a search engine's crawler handle each?
- HTTP is stateless: each request is independent. Cookies are the mechanism that adds state (sessions, logins, preferences). What would web applications look like if cookies did not exist?
What comes next
Week 10 covers TLS: the encryption layer that turns HTTP into HTTPS. You will trace a TLS handshake in a capture, identify the ClientHello and ServerHello, and work through the academy's Wireshark CVE quartet mini-module.