Skip to content

symbolic_file.py test fails: Incompatibility between SymbolicFile and libc buffered I/O functions #2672

@dguido

Description

@dguido

Issue: symbolic_file.py Test Failure - Incompatibility Between Symbolic Files and libc Functions

Summary

The symbolic_file.py test consistently fails because Manticore cannot properly handle symbolic file content when it's read through libc functions like getline(). The test expects multiple execution paths but only generates one.

Test Details

  • Location: examples/script/symbolic_file.py
  • Purpose: Demonstrate symbolic file analysis by having Manticore explore both branches of a password check
  • Expected: Multiple execution paths (password matches and doesn't match)
  • Actual: Only one path explored, assertion fails

Root Cause Analysis

The Core Problem

Manticore's symbolic file implementation is incompatible with libc's buffered I/O functions:

  1. Type Mismatch: When a symbolic file is opened, Manticore returns a SymbolicFile Python object, but getline() expects a C FILE* structure
  2. No Function Modeling: getline() is not modeled/hooked by Manticore, so it executes natively without understanding symbolic data
  3. Early Termination: The program exits with code 127 (command/library failure) when trying to use the incompatible file handle

Technical Details

How Symbolic Files Work in Manticore

  • Files marked with --file flag or add_symbolic_file() become symbolic
  • '+' characters in the file are treated as symbolic bytes
  • Manticore creates a SymbolicFile object (Python) with a symbolic array
  • Works at the syscall level (read/write/open)

Why getline() Fails

// In fileio.c
FILE *infile = fopen(fname, "r");  // Returns SymbolicFile, not FILE*
getline(&line, &line_size, infile); // Expects FILE*, crashes

Evidence from Investigation

  1. State lifecycle: States are killed rather than terminated normally
  2. No branching: Fork callbacks never trigger
  3. Hook analysis: File operation hooks don't execute, indicating early failure
  4. Exit code: 127 indicates library/command failure
  5. Workspace: Only generates 1 testcase instead of expected 2+

Code Flow

1. Program starts with symbolic file argument
2. fopen() called -> Manticore returns SymbolicFile object
3. getline() called -> Expects FILE* structure
4. Type mismatch causes crash/undefined behavior
5. State is killed with exit code 127
6. No symbolic data reaches strcmp()
7. No branching occurs

Reproduction Steps

cd manticore/examples/script
python symbolic_file.py
# AssertionError: Should have found more than 1 path through the program

Three Options for Fixing

Option 1: Model getline() in Manticore (Recommended Long-term)

Effort: High
Impact: Would fix all similar issues

Implementation would require:

  • Hook getline() at the binary level
  • Detect when file handle is a SymbolicFile
  • Manually implement getline's behavior with symbolic data
  • Return symbolic bytes to the buffer
# Pseudocode for manticore/native/models.py
def getline(state, lineptr, n, stream):
    if isinstance(stream, SymbolicFile):
        # Read symbolic bytes until newline or EOF
        symbolic_data = stream.read_until_newline()
        state.cpu.write_bytes(lineptr, symbolic_data)
        return len(symbolic_data)
    else:
        # Fall back to native execution
        return state.invoke_model(getline_native, lineptr, n, stream)

Option 2: Rewrite Test to Use Compatible Functions (Recommended Short-term)

Effort: Low
Impact: Fixes this specific test

Replace getline() with direct read() syscall:

// Instead of:
getline(&line, &line_size, infile);

// Use:
int fd = open(fname, O_RDONLY);
read(fd, buffer, 11);

This works because Manticore properly handles read() syscalls with symbolic files.

Option 3: Document as Known Limitation (Immediate)

Effort: Minimal
Impact: Prevents CI failures, sets expectations

Mark the test as expected to fail with clear documentation:

  • Add to known limitations in docs
  • Skip test in CI with explanation
  • Document that symbolic files only work with syscalls, not libc buffered I/O

Impact

  • CI/CD: Test causes false failures in continuous integration
  • User Experience: Users may think symbolic file analysis is broken
  • Documentation: Feature appears to work but has undocumented limitations

Related Code

  • manticore/platforms/linux.py:540-663 - SymbolicFile class
  • manticore/platforms/linux.py:3847-3852 - _sys_open_get_file()
  • manticore/platforms/linux.py:4155-4160 - generate_workspace_files()
  • manticore/native/cli.py:33-34 - add_symbolic_file initialization

Workaround

Users can work around this by:

  1. Using programs that read files with syscalls (read/write) instead of libc functions
  2. Writing custom models for specific libc functions they need
  3. Using concrete files instead of symbolic for programs using buffered I/O

Test History

  • Test was already marked as "flaky" in CI
  • Has been a known intermittent issue
  • Original fileio.c had uninitialized variable bug (fixed in ef8677a)

Recommendation

  1. Immediate: Disable test in CI with skip marker and explanation
  2. Short-term: Create alternative test using syscall-based file reading
  3. Long-term: Implement proper modeling for common libc file functions

References

  • Original issue with uninitialized variables: Fixed in commit ef8677a
  • Similar limitations exist for other libc functions that use FILE* structures
  • Symbolic file feature works correctly with syscall-level operations

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions