Skip to content

Test Framework Architecture

The EmbSec Kit includes a comprehensive Python-based testing framework designed to validate both the correctness of lab implementations and the effectiveness of security exploits.

Overview

The test framework (labs/common/test_framework.py) provides:

  • Automated testing of embedded binaries in QEMU
  • Validation of security vulnerabilities
  • Exploit verification
  • Flag extraction and validation
  • CI/CD integration

Core Components

Base Test Classes

LabTestBase

The abstract base class for all lab tests:

class LabTestBase(ABC, unittest.TestCase):
    """Base class for all lab tests"""

    # Override these in derived classes
    LAB_NAME = None
    BUFFER_SIZE = 64
    EXPECTED_MENU_OPTIONS = []
    TIMEOUT = 30

Key features:

  • QEMU process management
  • I/O handling for embedded systems
  • Pattern matching for addresses and flags
  • Test lifecycle management

Specialized Base Classes

BufferOverflowTestBase

  • Specific tests for stack-based vulnerabilities
  • Return address calculation
  • Payload generation helpers

FormatStringTestBase

  • Format string vulnerability testing
  • Stack offset calculation
  • Memory write verification

Test Lifecycle

  1. Setup Phase

    def setUpClass(cls):
        # Locate lab binary
        # Configure QEMU command
        # Set up paths
    

  2. Test Execution

    def setUp(self):
        self.proc = None
    
    def tearDown(self):
        if self.proc:
            self.proc.terminate()
    

  3. QEMU Management

    def start_qemu(self) -> subprocess.Popen:
        self.proc = subprocess.Popen(
            self.qemu_cmd,
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT
        )
    

Test Categories

1. Binary Validation Tests

def test_01_binary_exists(self):
    """Test that lab binary exists"""
    self.assertTrue(os.path.exists(self.lab_binary))
    self.assertTrue(os.path.exists(self.lab_bin))

2. Normal Execution Tests

def test_02_normal_execution(self):
    """Test normal execution without exploit"""
    self.start_qemu()
    output = self.read_output()
    for option in self.EXPECTED_MENU_OPTIONS:
        self.assertIn(option, output)

3. Security Tests

def test_03_flag_not_accessible_normally(self):
    """Test that flag is not accessible without exploit"""
    # Try all menu options
    for i in range(1, 5):
        output = self.get_menu_choice(str(i))
        self.assertNotIn("embsec{", output)

4. Vulnerability Tests

def test_04_vulnerability_exists(self):
    """Test that the vulnerability exists"""
    # Lab-specific implementation

5. Exploit Tests

def test_05_exploit_gets_flag(self):
    """Test that exploit successfully gets flag"""
    exploit_data = self.prepare_exploit()
    payload = self.get_exploit_payload(**exploit_data)
    self.send_exploit(payload)
    flag = self.extract_flag(output)
    self.assertIsNotNone(flag)

6. Determinism Tests

def test_06_flag_deterministic(self):
    """Test that flag is deterministic"""
    # Run exploit multiple times
    # Verify same flag each time

Exploit Development Support

Address Extraction

def extract_address(self, output: str, pattern: str) -> Optional[int]:
    """Extract address from output using regex pattern"""
    match = re.search(pattern, output, re.IGNORECASE)
    if match:
        return int(match.group(1), 16)
    return None

Flag Validation

def extract_flag(self, output: str) -> Optional[str]:
    """Extract flag from output"""
    match = re.search(r'embsec\{([a-f0-9]{64})\}', output)
    if match:
        return match.group(0)
    return None

Binary Packing Helpers

def p32(addr: int) -> bytes:
    """Pack 32-bit address for ARM little-endian"""
    return struct.pack("<I", addr)

def p16(val: int) -> bytes:
    """Pack 16-bit value"""
    return struct.pack("<H", val)

Lab-Specific Implementation

Each lab extends the base classes:

class TestBufferOverflow(BufferOverflowTestBase):
    LAB_NAME = "buffer-overflow"
    BUFFER_SIZE = 64
    EXPECTED_MENU_OPTIONS = ["Login", "Debug", "Exit"]

    def get_exploit_payload(self, target: int, offset: int) -> bytes:
        payload = b"A" * offset
        payload += p32(target)
        payload += b"\n"
        return payload

Integration with Build System

CMake Integration

The build system automatically discovers and registers tests:

if(EXISTS ${CMAKE_SOURCE_DIR}/labs/${LAB_NAME}/tests/test_lab.py)
    add_test(
        NAME ${LAB_NAME}
        COMMAND python3 ${CMAKE_SOURCE_DIR}/labs/${LAB_NAME}/tests/test_lab.py
        WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
    )
endif()

CI/CD Integration

Tests run automatically in GitLab CI:

test:labs:
  stage: test
  script:
    - ./tools/scripts/test_all_labs.sh -b ${BUILD_DIR} -v
  artifacts:
    reports:
      junit: test_results/junit.xml

Advanced Features

Non-Blocking I/O

def read_output(self, size: int = 4096) -> str:
    # Set non-blocking mode
    import fcntl
    fd = self.proc.stdout.fileno()
    flags = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, flags | os.O_NONBLOCK)

Timeout Handling

while time.time() - start_time < 2:  # 2 second timeout
    try:
        chunk = self.proc.stdout.read(size)
        if chunk:
            output += chunk
    except:
        time.sleep(0.1)

Pattern-Based Testing

The framework supports pattern-based testing for different vulnerability types:

  • Stack buffer overflows
  • Format string bugs
  • Integer overflows
  • Use-after-free
  • Race conditions

Best Practices

  1. Inherit from appropriate base class for vulnerability type
  2. Define clear test names following the numbered convention
  3. Implement abstract methods for lab-specific behavior
  4. Use helper functions for common operations
  5. Validate both success and failure cases

Debugging Tests

Verbose Output

python3 test_lab.py -v

Individual Test

python3 test_lab.py TestBufferOverflow.test_05_exploit_gets_flag

Debug Mode

DEBUG=1 python3 test_lab.py

Future Enhancements

  • Hardware testing support via serial interface
  • Performance benchmarking
  • Fuzzing integration
  • Coverage analysis
  • Exploit reliability metrics