Week 2: OSINT and Passive Reconnaissance · PEN-101

Passive reconnaissance collects publicly available information without touching the target's infrastructure. Done well, it gives you the target's attack surface, employee names, technology stack, and organizational chart before you have sent a single packet. Done carelessly, it leaves traces on the target's logs and announces your interest before you have decided to proceed.

Reading (~1.5 hr)

Required:

Weidman, Penetration Testing, Chapter 5 ("Information Gathering"), pp. 67-86. Covers OSINT sources (Netcraft, WHOIS, DNS, email harvesting, Maltego) and port scanning with Nmap. Read the OSINT sections before lecture; port scanning is Week 3.
HP3, Chapter 2 ("Before the Snap -- Red Team Recon"), pp. [see book]. Read the monitoring, subdomain discovery, certificate transparency, GitHub, and cloud-scanning subsections. HP3 covers OSINT at the red-team level; the techniques are directly applicable to authorized engagements.

Supplementary:

PTES, Intelligence Gathering section (pentest-standard.org/Intelligence_Gathering). Read the OSINT subsections.
OWASP Testing Guide v4.2, Section OTG-INFO (owasp.org/www-project-web-security-testing-guide/). Browser-based reconnaissance against web applications.

Lecture outline (~1 hr)

Part 1: What passive recon is and why the distinction matters (15 min)

Passive reconnaissance collects publicly available information. The defining constraint: no packet you send should land in the target's logs. If you query the target's authoritative DNS server directly, that query is logged. If you run a port scan, the target's firewall may log it. If you browse the target's website, the web server logs your IP.

Passive recon uses sources that are not the target: WHOIS databases, certificate transparency logs, search engine caches, DNS resolvers that are not the target's, third-party OSINT aggregators, GitHub, LinkedIn, job postings, Shodan, Censys.

The professional reason this matters: clients often require an OSINT-only phase before granting exploitation authorization. The distinction is in the ROE. Some clients want to know their passive recon attack surface before they authorize any active probing. More practically, aggressive scanning from your IP immediately identifies you as a tester to an alert blue team, and on real engagements, stealth is often part of the assessment.

Part 2: Domain and registration intelligence (20 min)

WHOIS: Every registered domain has a WHOIS record with registrant contact information, registration and expiration dates, registrar, and name servers. Not all registrants expose personal information (privacy proxies are common), but corporate registrations often retain real contacts.

whois example.com

Note the authoritative name servers -- these tell you the DNS hosting provider, which is often different from the web hosting provider. The registrar contact tells you the domain's administrative contact. The registration date tells you when the company established a web presence.

DNS enumeration:

Record types to collect:

A records -- IPv4 addresses for hostnames
AAAA records -- IPv6
MX records -- mail servers (reveal the email provider; critical for social-engineering phase later)
TXT records -- SPF, DKIM, DMARC records reveal email authentication configuration; also sometimes expose internal domains in misconfigured SPF records
NS records -- authoritative name servers
CNAME records -- aliases; reveal CDN, hosting providers

Tools:

nslookup example.com
dig example.com ANY
host -a example.com
dnsrecon -d example.com -t std  # comprehensive DNS recon

DNS zone transfers: A zone transfer (AXFR) asks the DNS server for all its records at once. Misconfigured authoritative DNS servers will comply, handing you a complete inventory of all hostnames in the zone. Well-configured servers refuse. Always try:

dig @ns1.example.com example.com AXFR

Part 3: Certificate transparency and subdomain discovery (15 min)

Certificate transparency (CT) logs: Every public TLS certificate is logged to public CT logs (crt.sh; transparencyreport.google.com). You can search CT logs for all certificates ever issued for a domain or its wildcards, revealing subdomains the organization chose not to advertise.

curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq '.[].name_value' | sort -u

This single command against a real target has revealed staging environments, development servers, internal tools, and backup systems that were never meant to be publicly listed but received a TLS certificate and therefore logged.

Subdomain brute-forcing (semi-passive): Tools like sublist3r, amass, and theHarvester aggregate multiple passive sources (CT logs, DNS dumpster, DNSlytics, Shodan, etc.) to enumerate subdomains without directly querying the target's name servers.

sublist3r -d example.com -o subdomains.txt

Part 4: GitHub, cloud, and organizational intelligence (10 min)

GitHub: Developers commit sensitive material to public repositories with surprising frequency. Search targets:

The organization's GitHub org: github.com/[orgname]
Employee GitHub profiles linked from the company site or LinkedIn
Search GitHub for the domain name: github.com search for "example.com"

What to look for: API keys, passwords, database connection strings, internal hostnames, Terraform/CloudFormation configs that reveal cloud infrastructure, .env files committed by accident.

Tool: trufflehog or gitleaks can scan a repo for secrets automatically:

trufflehog git --repo=https://github.com/orgname/repo

Shodan and Censys: Shodan (shodan.io) and Censys (search.censys.io) passively scan the Internet and index open ports, service banners, and TLS certificates. Searching for an organization's IP range or domain reveals what services it exposes without you sending any packets to the target.

# Shodan CLI (requires free API key)
shodan search "hostname:example.com"

LinkedIn and job postings: Job postings reveal technology stack. "Senior Java developer, Spring Boot, AWS, PostgreSQL" tells you the backend stack. "DevOps Engineer, Terraform, GKE, Datadog" tells you the infrastructure. Employee profiles on LinkedIn reveal team structure. This is open-source information the company chose to publish.

Lab 2: OSINT Dossier (~4 hr, graded)

See labs/lab-2-osint-dossier.md for the full lab.

Target: The instructor-designated lab target (a fictional company with a real domain; details provided at lab start).

Authorization note: The lab target is a domain created and operated by the academy for this purpose. All OSINT in this lab is against the designated domain only. Do not query any other organization's systems.

Deliverable: A structured OSINT dossier covering:

Domain and registration: WHOIS, registration date, registrar, name servers
DNS records: all A, MX, TXT, NS, CNAME records; zone transfer result (succeed or refuse)
Subdomains: CT log results + passive enumeration results
IP ranges and hosting: where the servers live; cloud provider if any
Technology stack inference: web server, frameworks, CMS (from HTTP headers, Wappalyzer, or job postings)
GitHub / code exposure: findings from any public repos associated with the domain
Email and personnel: employee names, email addresses, roles found through public sources
Shodan / Censys: open ports and services indexed by passive scanners
Findings summary: what attack vectors does this recon suggest? Where would you focus Week 3 active recon?

What not to do:

Do not run active port scans against the target. That is Week 3.
Do not submit forms on the target's website or trigger any interactive behavior. Reading public pages is fine.
Do not query the target's authoritative DNS server directly. Use resolvers like 8.8.8.8.

Evidence: Document every source used. Every finding must include how you found it. A finding without a source does not exist in a professional dossier.

Independent practice (~3 hr)

Tool exploration (1.5 hr): Install and run theHarvester against the lab target domain. Compare its output to what you found manually.

sudo apt install theharvester -y
theHarvester -d example.com -l 200 -b bing,google,linkedin,twitter

Shodan exploration (0.5 hr): Create a free Shodan account (shodan.io). Search for the lab target's domain or IP. Note what information Shodan has indexed without your help.
Reflection (1 hr): Write the reflection prompts below before Week 3.

Reflection prompts

You found a staging server (staging.example.com) with an unindexed login form during CT log enumeration. The production server is listed in the ROE. The staging server is on the same IP block. The ROE says "IP range 203.0.113.0/24." Is the staging server in scope? How do you determine this? What do you do if you are uncertain?
The GitHub recon revealed an .env file committed 18 months ago and then deleted. The commit history still contains the file. What information might be in an .env file? Is this finding still valid even though the file was "deleted"? Does the finding's validity depend on whether the credentials in the file are still active?
You found an employee's LinkedIn profile that lists their job title as "AWS Cloud Engineer" at the target company. What does this tell you about the likely attack surface you will see in Week 3 active recon? How does this shape your scanning priorities?

Toolchain Diary: Week 2 additions

theHarvester -- OSINT aggregation tool; queries public sources for email addresses, subdomains, IPs, and URLs
sublist3r -- Subdomain enumeration via passive sources
shodan -- Internet-wide passive port/service scanner (search.censys.io is a peer; both index the Internet continuously)
crt.sh -- Certificate transparency log search; reveals all TLS certs issued for a domain
dnsrecon -- DNS enumeration: zone transfers, record enumeration, brute-force subdomains

What's next

Week 3 is the transition from passive to active reconnaissance. You will use Nmap and Masscan to scan the authorized lab network, enumerate services, grab banners, and produce a host inventory. The OSINT dossier from Lab 2 informs where to look: the subdomains, IP ranges, and tech-stack hints shape which ports and services deserve the deepest enumeration attention.