Skip to content

Stripped binaries in dataset cause bad ground truth comparisons #32

@aeflores

Description

@aeflores

I have noticed that several binaries in the dataset are stripped:
e.g.

mthumb_executables/clients/gcc_O2/scp
mthumb_executables/clients/gcc_O2/sftp
mthumb_executables/clients/gcc_O2/ssh
mthumb_executables/clients/gcc_O2/ssh-add
...

since the comparison scripts use symbols to exclude functions added by linker (https://github.com/junxzm1990/x86-sok/blob/master/compare/compareInstsArmMips.py#L504), this can cause problem in these binaries.

For example, in mthumb_executables/clients/gcc_O2/scp, the address 0x28c4 does not seem to belong to the ground truth, however, that is the entrypoint address (the entrypoint is 0x28c5 according to readelf since this is a thumb binary).
In other words, the _start function is missing from the ground truth and it cannot be excluded because the binary is stripped.
Do you have any advice on how to deal with these cases?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions