Identify 10 mystery files using only file, xxd, and strings. No running. Read-before-run discipline from day one.
Overview
You are given 10 files named file-01 through file-10 with no extensions and no other metadata. Your task is to identify each file's format, provide the evidence that supports your identification, and describe the next analysis step you would take if you were actually investigating the file.
Allowed tools: file, xxd, strings only. No other tools. No running the binaries.
Time: ~90 minutes.
Setup
The lab files are provided by the instructor as a tar archive or as a directory listing. Place them in a working directory:
mkdir lab1 && cd lab1
# instructor provides: lab1-files.tar.gz
tar xzf lab1-files.tar.gz
ls -la
Verify you have 10 files named file-01 through file-10.
Self-paced fallback: If you are working without an instructor, see labs/_artifacts/README.md ("Self-paced fallback: Lab 1") for a script that builds an equivalent 10-file set from system binaries and cross-compiled ELFs. The file types you will identify are the same as the cohort version.
Part A: Initial triage with file
Run the file command on all 10 files at once:
file file-0*
Record the output in your lab notebook. For each file:
- What format does
filereport? - Does the report seem complete, partial, or uncertain?
Some files may report "data" or "ASCII text" or something unexpected. Those are the interesting ones.
Part B: Magic number verification with xxd
For each file, look at the first 16 bytes:
xxd -l 16 file-01
Compare the first bytes to the magic number table from Week 2:
7F 45 4C 46= ELF4D 5A= PE (Windows executable)25 50 44 46= PDF89 50 4E 47= PNG50 4B 03 04= ZIP1F 8B= gzip
For each file, record:
- First 4 bytes in hex
- Does the magic number match what
filereported? If not, why might they differ?
If a file shows unexpected or missing magic bytes (for example, the magic number is at a nonzero offset, or the file appears to have been partially truncated), note it and try xxd -s 0 -l 64 file-XX to see more of the beginning.
Part C: String content survey with strings
For each file, extract strings of 8+ characters:
strings -n 8 file-01 | head -20
Record interesting strings you find: version strings, paths, error messages, compiler signatures. For binary files (ELF, PE), the string content tells you something about what the binary does. For data files (PNG, PDF, ZIP), the strings may reveal metadata.
For at least three of the files, write a sentence about what the string content tells you that the magic number alone did not.
Lab Report
For each of the 10 files, write a structured entry:
File: file-01
Format: [your identification]
Magic bytes: [first 4 bytes in hex]
Evidence: [what file, xxd, or strings showed you]
Next step: [what you would do next if investigating this file]
At the end of the 10 entries, write a brief paragraph (100-150 words) answering:
- Which file was hardest to identify, and why?
- Were there any cases where
filewas wrong or ambiguous? How did xxd or strings help? - What does this lab tell you about the reliability of file extensions as a format indicator?
Grading
| Criterion | Points |
|---|---|
| All 10 files correctly identified with format name | 40 |
| Magic bytes recorded for each file | 20 |
| Evidence cited (which tool, what output) for each identification | 20 |
| "Next step" is specific and appropriate to the identified format | 10 |
| Reflection paragraph addresses all three questions | 10 |
| Total | 100 |
A correct identification with no evidence cited earns partial credit. An incorrect identification with well-reasoned evidence earns more credit than a correct guess with no evidence -- the evidence is the point.
Lab 1 of 9. Due: end of Week 1. Week 2 and 3 build directly on the file identification skills practiced here.