This repository contains a 2-way superscalar Fetch Unit implementation with cutting-edge branch predictors, developed as part of UIUC’s ECE 511 coursework. The design maximizes performance through fast and accurate branch prediction, minimizing pipeline flushes and energy waste.
## Directory Structure 1. Fetch a. Bin - Compiled binaries b. Hdl - RTL implementation (SystemVerilog) c. Hvl - Testbenches and verification code d. Lint - Linting reports and checks e. Pkg - Shared SV packages f. Sim - Simulation outputs and scripts g. Synth - Synthesis results and constraints 2. L-perceptron - Long-history perceptron predictor 3. ltage - L-TAGE branch predictor 4. perceptron - Standard perceptron predictor 5. Tage - Tagged Geometric History Length predictor (TAGE)
The Fetch Stage in a superscalar out-of-order CPU aims to maximize throughput with low misprediction penalty:
-
Fast Next-Line Predictor (NLP)
- Small Branch Target Buffer (BTB) + Bi-Modal Table (BIM)
- Quickly fetches the next sequential block
-
Accurate Backup Predictor (TAGE)
- Refines initial guesses using multi-history patterns
- Branch Checker triggers an immediate redirect on misprediction
Fetched instructions and metadata are held in:
- Fetch Queue: buffers instruction packets
- Fetch Target Queue (FTQ): stores branch addresses & predictors for the ROB
On misprediction or backend redirect, the front-end restarts at the corrected PC.
Models full pipeline stages (decode, execute, memory, commit) with:
- Register-file updates
- Load/store data-memory responses
Each fetch packet (2 instructions) is executed sequentially to simplify inter-instruction dependencies. At commit, FTQ metadata is checked to detect and recover from any misprediction.
- SystemVerilog simulator (e.g., Synopsys VCS)
- GNU Make
- SRAM generator (e.g., OpenRAM)
- CBP2016 benchmark traces
cd fetch/sram
make
cd fetch/Sim
make run_vcs_top_tb PROG=../testcode/coremark_rv32i.elf
cd fetch/Lint
make lint
cd fetch/Synth
make synth
Standalone IPs developed:
- TAGE
- Perceptron
- L-TAGE
- L-Perceptron
Each can be built and benchmarked independently using CBP2016 traces.
The following benchmark traces were used:
- SHORT_MOBILE-28
- SHORT_MOBILE-56
- SHORT_SERVER-11
- SHORT_SERVER-71
- LONG_MOBILE-4
- LONG_MOBILE-5
- LONG_MOBILE-13
- LONG_SERVER-1
cd perceptron
make run_vcs_top_tb TRACE=/class/ece411/cbp2016/traces/SHORT_MOBILE-27.bt9.trace
• [Avijeet Trivedi](https://github.com/avijeet-trivedi)
• [Jessica Vaz](https://github.com/jessicavaz16)
• [Leon Ku](https://github.com/lku-illinois)