The “RISC-V based MYTH (Microprocessor for You in Thirty Hours)” workshop provides a structured introduction to RISC-V architecture, covering software-to-hardware concepts through hands-on labs. The workshop delves into combinational and sequential logic, pipeline implementation, and microarchitecture of a single-cycle RISC-V CPU. The instructor for this workshop is Kunal Ghosh Sir and Steeve Hoover Sir.
Name: Maaz Mahmood Siddique
Email ID: maazms999@gmail.com
GitHub Profile: maazm007
LinkedIN Profile: maazm-ece-vlsi
Module 1: Introduction to RISC-V ISA and GNU Toolchain compiler
The RISC-V GNU Compiler Toolchain is a free and open source cross-compiler for C and C++. It supports two build modes: Generic ELF/Newlib and Linux-ELF/glibc. The toolchain can be used to create assembly instructions and sequences for execution in a simulator and target FPGA
- Flow = For any instruction to run and execute on the hardware, there is a defined flow that needs to be followed:-
HLL(.c file) or Assembly Language Code --> RISC-V Compiler --> Machine Level Language (Binary Form) --> Core performs operation based upon Binary Code. There is an interface present between RISC-V ISA and Hardware is HDL. HDL is use to create RISC-V specifications
- RISC-V is an open-source instruction set architecture (ISA) that allows developers to develop processors for specific applications.
- RISC-V is based on reduced instruction set computer principles and is the fifth generation of processors built on this concept.
- RISC-V can also be understood as an alternative processor technology which is free and open, meaning that it does not require you to purchase the license of RISC-V to use it.
The instructions format of a processor is the way in which machine language instructions are structured and organized for a processor to execute. It is made up of series of 0s and 1s, each containing information about the location and operation of data.
There are 6 instruction formats in RISC-V:
- R-format
- I-format
- S-format
- B-format (conditional jump)
- U-format
- J-format (unconditional jump)
We have to follow the given steps to compile any .c file in our machine:
- Open the bash terminal and locate to the directory where you want to create your file. Then run the following command:
gedit sum_1ton.c
- This will open the editor and allows you to write into the file that you have created. You have to write the C code of printing the sum of n numbers. Once you are done with your code, press
Ctrl + S
to save your file, and then pressCtrl + W
to close the editor.
#include<stdio.h>
int main()
{
int sum = 0;
int n;
printf("Enter the number : ");
scanf("%d",&n);
for(int i = 1;i <= n;i++)
{
sum += i;
}
printf("The value of sum of %d numbers are %d\n",n,sum);
return 0;
}
- To execute the C code on your terminal, run the following command:
gcc sum_1ton.c
./a.out
Using the cat
command, the entire C code will be displayed on the terminal. Now run the following command to compile the code in riscv64 gcc compiler:
riscv64-unknown-elf-gcc -Os -mabi=lp64 -march=rv64i -o sum_1ton.o sum_1ton.c
Open a new terminal and run the given command:
riscv64-unknown-elf-objdump -d sum_1ton.o | less
Following are the snapshots that captures the working of above commands
The target is to run the sum_1ton.c
code using both gcc compiler
and riscv compiler
, and both of the compiler must display the same output on the terminal. So to compile the code using gcc compiler, use the following command:
gcc sum_1ton.c
./a.out
And to compile the code using riscv compiler, use the following command:
spike pk sum_1ton.o
Following are the snapshots that captures the working of above commands
-
Fundamental Data Types
Byte = 8 bits
Word = 2 bytes = 16 bits
Double Word = 4 bytes = 32 bits
Quad Word = 8 bytes = 64 bits -
RISC-V double word can represent 0 to (264 - 1), ie
0 to 18,446,744,073,709,551,615
unsigned numbers or postive numbers -
For n-bits, signed number ranges between -(2n - 1) to (2n - 1) - 1
For example: if n = 4 bits, then total signed numbers that can be represented will range from -8 to 7
-
RISC-V double word can represent
-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
signed numbers
#include<stdio.h>
#include<math.h>
int main()
{
long long int max = (long long int)(pow(2,63) -1);
long long int min = (long long int)(pow(2,63) * -1);
printf("Highest Signed Number of 4 bits is %lld\n",max);
printf("Lowest Signed Number of 4 bits is %lld\n",min);
return 0;
}
Following are the snapshots that captures the calculation of range of signed and unsigned numbers using riscv compiler
Module 2: Introduction to ABI and Basic Verification Flow
ABI (application binary interface), as the name says, is an interface, that helps programs access system hardware and services. Some parts of architecture are accessible to Opearting System and some are accessible to Application Program, also called as User and System ISA and User ISA respectively.
System Call is the method through which application programs access some of the ISA of hardware (mainly registers) and the interface that allows the application program to perform this is known as Application Binary Interface
- RISC-V belongs to little-endian memory addressing system. In the RISC-V architecture, a little-endian system stores the least significant byte (LSB) of a multi-byte data type at the lowest memory address, while the most significant byte (MSB) is stored at the highest memory address
- As discussed in Module 1, that there are 6 insruction formats in RISC-V architecture. The bit by bit description of each instruction format is shown in the below table:-
- The ABI performs the System Call through available 32 registers. Each register has their respective ABI names. Following table shows the complete description of ABI Inetrface names of each register:-

We will take the similar program what we did in Lab 1, ie to calculate the sum of numbers from 1 to N. The flowchat that needs to be followed to write the assembly language code of the sum program is as follows:-

Now, we will create two different files to code the mentioned flowchart. Using sum1to9_ASS.c file we will initialize the variables and pass the values to assembly language program
#include<stdio.h>
extern int sum1to9_ASS (int x, int y);
int main()
{
int result = 0;
int count = 9;
result = sum1to9_ASS(0x0, count+1);
printf("Sum of number from 1 to %d is %d\n", count, result);
return 0;
}
Create another file that has been called in the .c program file as sum1to9_ASS.S and write the assembly language code
.section .text
.global load
.type load, @function
load:
add a4, a0, zero //Initialize the sum register a4 with 0x0
add a2, a0, a1 //Store the count of 10 in register a2. Register a1 is loaded with ax0 from main
add a3, a0, zero //Initialize the intermediate sum regsiter a3 by 0x0
loop:
add a4, a3, a4 //Increament addition
addi a3, a3, 1 //Increament intermediate register by 1
blt a3, a2, loop //If a3 is less than a2, go to the branch named as <loop>
add a0, a4, zero //Store the final result to a0 register which will be read by main program
ret
Once done, now simply compile the code using riscv compiler and we will get the desired result
Module 3: Digital Logic with TL-Verilog and Makerchip
Makerchip is a free, online integrated development environment (IDE) for digital logic design, primarily focused on Verilog and Transaction-Level Verilog (TL-Verilog), allowing users to code, compile, simulate, and debug designs directly in their browser
TL-Verilog is a Verilog implementation of TL-X, a language extension defined as a wrapper to any HDL to extend it with transaction-level modeling. This makes it more powerful and has a significant code reduction as compared to other HDL languages
- Introduces simpler syntax unlike other HDL like System Verilog or Verilog, hence reduces the number of lines in a code resulting in fewer bugs.
- Is more flexible. Easier to optimize your logic without bugs.
- Is “timing abstract” for pipelines, which makes retiming easy and safe.
- Knows when signals are valid, which provides easier debug, cleaner design, better error checking, and automated clock gating.
- Visual Debug (VIZ) is an additional feature to the Makerchip platform which makes the debugging process much easier.
- It supports “Transactions” and a simpler form of design hierarchy.
Here, firstly we will design few basic logic gates and then will try to design the circuit that performs mathenatical calculations like finding the value of longest side of right-angled triangle using Pythagoras Theorem. And at last we will be combining all the combinational logic to design the Calculator
- One of the greatest advantage I feel using TL-Verilog is that we don't have to apply test inputs to our design to verify the simulation, it automatically applies random stimulus.
Following are the snapshots that captures the design of combinational circuits on Makerchip platform using TL-Verilog
- Inverter Circuit

- AND Circuit
- OR Circuit
- XOR Circuit
- Use of Vector (Arrays)

- Multiplexer Circuit
- Simple Calculator
Here,
$rand
function is used to select any random value within the specified range

Here, we will be using >>?
operator which is known an ahead of
operator. It helps to provide the value of specified signal '?' cycles before, which can be understood as a feedback path or the memory element that stores the last value executed on that signal
- Free Running Counter

- Fibonacci Series
Here, don't confuse 21 with 15, as 15 is written in hexadecimal format which is actually
0001_0101 = 16 + 4 + 1 = 21

- Sequential Calculator: A real calculator that remembers the last result and use it for next calculation


- Pythagoras Theorem

-
Pipelined Logic: Timing Abstract and Staging is one of the most powerful feature of TL-Verilog. Each pipelined stage is defined as
@?
and the codes written under this scope, falls in the same stage. Pipelining also helps to operate the circuit at high frequency as the computation of signals gets distributed in various stages. -
Validity: Validity is yet again one of most powerful feature of TL-Verilog. It is defined as
?$valid
and the codes falling under this scope is treated like when the signals are valid, it will execute else it will be treated as Don't Care. Validity provides Easier Debug, Cleaner Design, Better Error Checking and Automated Clock Gating -
Cyclic Calculator with Validity


Module 4: Basic RISC-V CPU Microarhitecture
- Fetch: Program Counter store the address of instruction that needs to be fetch from the Memory. Processor fetches the instruction from the memory pointed at the address given by Program Counter.

- Decode: Once the instruction is fetched from the memory, then comes the procedure to understand what does that instruction mean? Decoder Block will come into picture which helps to identify the type of instruction.
//Decode Logic
$is_i_instr = $instr[6:2] ==? 5'b0000x || $instr[6:2] ==? 5'b001x0 || $instr[6:2] ==? 5'b11001;
//$instr[6:2] ==? 5'b11100; //unused
$is_s_instr = $instr[6:2] ==? 5'b0100x;
$is_r_instr = $instr[6:2] ==? 5'b011x0 || $instr[6:2] ==? 5'b01011 || $instr[6:2] ==? 5'b10100;
$is_u_instr = $instr[6:2] ==? 5'b0x101;
$is_b_instr = $instr[6:2] ==? 5'b11000;
$is_j_instr = $instr[6:2] ==? 5'b11011;
//Immediate Decode Logic
$imm[31:0] = $is_i_instr ? {{21{$instr[31]}},$instr[30:20]} :
$is_s_instr ? {{21{$instr[31]}},$instr[30:25],$instr[11:7]} :
$is_u_instr ? {$instr[31:12],12'b0} :
$is_b_instr ? {{20{$instr[31]}},$instr[7],$instr[30:25],$instr[11:8],1'b0} :
$is_j_instr ? {{12{$instr[31]}},$instr[19:12],$instr[20],$instr[30:25],$instr[24:21],1'b0} : 32'b0;
//Instruction Decode
$opcode[6:0] = $instr[6:0];
$rd_valid = $is_r_instr || $is_i_instr || $is_u_instr || $is_j_instr;
$rs1_valid = $is_r_instr || $is_i_instr || $is_s_instr || $is_b_instr;
$rs2_valid = $is_r_instr || $is_s_instr || $is_b_instr;
$func3_valid = $is_r_instr || $is_i_instr || $is_s_instr || $is_b_instr;
$func7_valid = $is_r_instr;
?$rs1_valid
$rs1[4:0] = $instr[19:15];
?$rs2_valid
$rs2[4:0] = $instr[24:20];
?$func3_valid
$func3[2:0] = $instr[14:12];
?$func7_valid
$func7[6:0] = $instr[31:25];
?$rd_valid
$rd[4:0] = $instr[11:7];
- Register File (RF) Read/Write: At this stage there are two read operations and one write operation. Two read operation has been used because there two different Source Registers or you can say there are two different operands on which the operation has to be performed. Following are the signals in Regsiter File Read/Write
1-
$rf_rd_en1
Read Enable 1: While performing read operation on first source register, this signal is logic high
2-
$rf_rd_en2
Read Enable 2: While performing read operation on second source register, this signal is logic high
3-
$rf_rd_index1[4:0]
Read Address 1: It contain the address from where the data has to be read
4-
$rf_rd_index2[4:0]
Read Address 2: It contain the address from where the data has to be read
5-
$rf_wr_en
Write Enable: While performing write operation on first source register, this signal is logic high
6-
$rf_wr_index[4:0]
Write Address: It contain the address where data has to be written
7-
$rf_wr_data[31:0]
Write Data: It contains the data that has to be written on Write Address
- Execute: Once the data has been read from Source Registers, now comes the operation of execution based upon the Opcode. The execution operation is performed by Arithmetic Logic Unit (ALU)
//ALU Operation
$sltu_result = $src1_value < $src2_value;
$sltiu_result = $src1_value < $imm;
$result[31:0] = $is_addi ? $src1_value + $imm :
$is_add ? $src1_value + $src2_value :
$is_andi ? $src1_value & $imm :
$is_xori ? $src1_value ^ $imm :
$is_ori ? $src1_value | $imm :
$is_slli ? $src1_value << $imm[5:0] :
$is_srli ? $src1_value >> $imm[5:0] :
$is_and ? $src1_value & $src2_value :
$is_xor ? $src1_value ^ $src2_value :
$is_or ? $src1_value | $src2_value :
$is_sub ? $src1_value - $src2_value :
$is_sll ? $src1_value << $src2_value[4:0] :
$is_srl ? $src1_value >> $src2_value[4:0] :
$is_sltu ? $src1_value < $src2_value :
$is_lui ? {$imm[31:12],12'b0} :
$is_auipc ? $pc + $imm :
$is_jal ? $pc + $imm :
$is_jalr ? $pc + $imm :
$is_srai ? {{32{$src1_value[31]}},$src1_value} >> $imm[4:0] :
$is_sra ? {{32{$src1_value[31]}},$src1_value} >> $src2_value[4:0] :
$is_slt ? ($src1_value[31] == $src2_value[31]) ? $sltu_result : {31'b0,$src1_value[31]} :
$is_slti ? ($src1_value[31] == $imm[31]) ? $sltiu_result : {31'b0,$src1_value[31]} :
$is_sltiu ? $src1_value < $imm : 32'bx;

- Control Logic: While decoding and executing, Branch Target Address is also updated. If any branch instruction is found, it's address will be hooked up in the Program Counter Multiplexer
//Branching Instructions
$taken_br = $is_beq ? ($src1_value == $src2_value) :
$is_bne ? ($src1_value != $src2_value) :
$is_blt ? (($src1_value < $src2_value) ^ ($src1_value[31] != $src2_value[31])) :
$is_bge ? (($src1_value >= $src2_value) ^ ($src1_value[31] != $src2_value[31])) :
$is_bltu ? ($src1_value < $src2_value) :
$is_bgeu ? ($src1_value >= $src2_value) : 1'b0;
$valid_taken_br = $valid && $taken_br;
Following is the snapshot after completing 3 stage RISC-V Core. At this point, our Visualization Diagram (VIZ) must be producing the result of summation which ultimately gets stored in register10


Module 5: Complete Pipelined RISC-V CPU Microarchitecture
- Now we want to increase the performance of our CPU and make it capable to operate at higher frequencies. So to acheive this functionality, we will use Pipelining of 5 stages. But, while executing the pipelining, two problems comes into the picture which is also known as Hazards
There are two types of Harards:-
1- Control Flow Hazard: In pipelined processors, control flow hazards, also known as branch hazards, occur when the next instruction to be executed depends on the outcome of a previous instruction (like a conditional branch or jump), causing pipeline stalls and performance degradation.
2- Read After Write Hazard: In RISC-V, a Read-After-Write (RAW) hazard occurs when an instruction needs to read a register before a previous instruction has finished writing to it. This can lead to incorrect data being read, and can be mitigated through techniques like pipeline stalling or forwarding.
- Solution for Read After Write Hazard is to introduce Register File Bypass. Instead of taking the write data for next instruction, we will bypass the path and take the data that was read before write and make it input to the next instruction
- The last step is to complete the ALU Instructions, Instruction Decode and add Load Instruction, Store Instruction, Jump Instructions.
- Testcase: Added test case to check fucntionality of load/store. Stored the summation of 1 to 9 on address 4 of Data Memory and loaded that value from Data Memory to r15
*passed = |cpu/xreg[15]>>5$value == (1+2+3+4+5+6+7+8+9);
Following are the snapshots that shows the Load Operation, Store Operation and Summation operation after completing the design of RISC-V CPU Core Microarchitecture
SW r0, r10, 100
LW r15, r0, 100



Following is the snapshot that shows the final waveform of output

Following is the snapshot of complete RISC-V CPU Core

- Sandbox URL: RISC-V CPU Core