WINIC is a platform-independent automated micro-benchmarking tool. It currently supports x86, ARM and RISC-V on Linux. WINIC can automatically determine latency and throughput values for all instructions the given CPU supports.
WINIC currently cannot measure:
- instructions accessing memory (this will be added in the future)
- branches, returns, system calls and privileged instructions
WINIC is relying on LLVM and clang to generate and assemble benchmarks. Use setup.sh
after cloning this repository to automatically download and build LLVM aswell as WINIC. To manage multiple builds e.g. for multiple platforms in an HPC context specify --dir <buildName>
to build a version of LLVM into ./llvm-build-buildName and WINIC into ./build-buildName.
To calculate throughput and latency WINIC needs the clock-frequency to be fixed e.g. by using likwid-setFrequencies. Once the frequency is fixed you can use WINIC as follows:
./winic -f <frequency> MODE [options]
Measure latencies or throughputs.
By default WINIC measures all available instructions and generates a .yaml file with the results. Additionally a report_mode_timestamp
is generated providing additional information about how the values were obtained and warnings about unusual results. The runtime of a full run strongly depends on the architecture.
Architectrure | Runtime TP | Runtime LAT |
---|---|---|
x86 | 23 min | 40 min |
AArch64 | 23 min | 17 min |
RISCV | 8 min | 9 min |
To measure only a range of opcodes, use --minOpcode
and --maxOpcode
. This is mostly useful for debugging and development.
To measure single instructions add one or more -i <LLVM_INSTRUCTION_NAME>
options.
By default x87 floating point instructions are excluded, as they are deprecated and consume a lot of time on architectures that emulate them. Use the --x87FP
flag to measure them.
In manual mode, WINIC can execute arbitrary altered benchmark functions.
To run a function called "tp" from file.s
and calculate the cycles per instruction assuming the loop has 12 instructions do
winic -f <frequency> MAN --path file.s --funcName tp --nInst 12
There are always cases where WINIC doesn't produce correct data. To do a custom benchmark for an instruction, first run WINIC in TP or LAT mode with -i <LLVM_INSTRUCTION_NAME>
. This will output all .s
files generated for the benchmark to asm/
and an assembler_out.log
. The .s
files can then be modified and executed using the MAN-mode.
By default TP and LAT mode generate a db_timestamp.yaml
file with the results. Use -o/--output <file.yaml>
to specify a custom path instead. If the file already exists the values obtained during the run will overwrite the existing ones, all other values will be left unchanged. This works with single instructions aswell as full TP/LAT runs. A standard workflow therefore would be to do a TP run generating a database and then a LAT run updating it.
WINIC automatically uses helper instructions to:
- break dependencies between instructions to measure throughput
- introduce dependencies between instructions to measure latency
All uses of helper instructions are logged in report_timestamp
.
If an instruction would need a helper but none can be found, WINIC will fail and report "ERROR_NO_HELPER".
WINIC can only use instructions as helper if they were measured in the current run which is a problem when trying to measure single instructions.
The solution is to first do a full run and look up the dependencies of the instruction in the report, then the measurement can be reproduced by supplying all dependencies alongside the instruction using the -i <LLVM_INSTRUCTION_NAME>
option.
Note that currently --output
does NOT load the values into the internal working databases so the information read from there can NOT be used as helpers.
There are scripts in analysis
to compare the measurements on x86 with uops.info aswell as to generate useful reference files which contain comprehensive information about instructions, operands, registers etc. from LLVM. For more details refer to analysis/README.md
.