A somewhat simplified pipelined datapath without the control signals can be shown as:
There are 3 types of hazards in pipelined designs:
- Structural Hazards (hardware resource conflicts)
- Data Hazards (execution has data dependency of previous results in the pipeline)
- Write after write (WAW)
- Write after read (WAR)
- Read after write (RAW)
- Control Hazards (due to jump/branch instruction target prediction)
Structural hazards are already mitigated by the design of RISC-V: It has two separate memory for text and data, dual port register file and exclusive ALU.
Data Forwarding: An example of a RAW hazard is:
lui x1, 0x0001
addi x1, x1, 0
In addi
, it should read x1
that is written by lui
, but due to pipelining it reads the old value:
To resolve the dependency, we can add a forwarding that forwards the data (generated by the immediate generation unit at the ID stage) from the EX/MEM pipeline register to the input of the EX stage of addi
.
A visual forwarding diagram is like:
RISC-V pipeline implementations are inherently resistant to WAW and WAR hazards, due to:
- WAW: Write operations "overwrite", so newer data is always overwritten by the next instruction.
- WAR: Since write is the last stage of the pipeline design, write operations are always completed after reads by the nature of the design, so read operations fetch the older data successfully, without being affected by the newer write instruction.
Pipeline interlock happens when there is a load-use hazard, specifically when there is a dependency between lw
and addi
, specifically when addi
EX stage needs the data fetched that is present in the MEM/WB
pipeline register, as shown in the diagram:
Since we can't forward data back in time, we have to stall the pipeline for 1 cycle to align the necessary stages:
Therefore, data hazards can be mitigated by either stalling (LOAD-USE) or forwarding data based on detected dependencies.
Control hazards can be mitigated by predicting whether the branch will be taken or not, usually by 2 different approaches in the FETCH
stage:
- Static prediction
- Dynamic prediction
In the next cycle, either PC+4
or target
address will be in the fetch stage. In the EXECUTE
stage of the branch, if the prediction is found wrong, then we have to flush the pipeline, fetching the correct instruction from the correct target.
Here is a diagram view:
As can be seen from the diagram, the two instructions addi
and or
have been flushed, essentially adding 2 NOP
instructions to the pipeline. This is called the branch penalty, and has the most effect on the CPI of the processor.
We can statically predict the branch to be taken for all jump and unconditional branch operations, and branch to be not taken for all conditional branch instructions.
Control signals can be found in CONTROL-SC.md.
Control unit implemented as a FSM (TODO: Diagram)
As can be seen from the waveform, the JAL instruction (assembled 0x00C00A6F
) is located at PC 0x5C
, undergoes states FETCH
-> DECODE
-> JAL
-> FETCH
...
During the JAL
state, PC+4
is written to register x20
, as can be verified from the test program asm/test_rv32i.s.
Note: PC <= PC + 4
assignment is done at the DECODE
state, due to branching instructions need the unmodified PC (i.e. old PC without incremented by 4).
- Add checks for
TEXT_MEM_BEGIN
andDATA_MEM_BEGIN
memory ranges indmem
andimem
.