Cordell Compiler is a compact hobby compiler for Cordell Programming Language
with a simple syntax, inspired by C and assembly. It is designed for studying compilation, code optimization, translation, and low-level microcode generation.
Main goal of this project is learning of compilers architecture and porting one to CordellOS
project.
Main idea of this compiler is simplification of architecture of real compilers like gcc
or CMake
. This project splitted into two parts: frontend
and backend
.
frontend
part, like frontend part in real compilers, takes files with .CPL extention, generates abstract syntax tree (AST), and optimize code with next algorithms:
- funcopt - First optimization algorithm, that takes care about unused functions, iterate through files from input, register all used functions, and delete all unregistered.
- stropt - Second optimization algorithm takes care about strings that read only. If user uses strings with ro flag somewhere, this algorithm will allocate memory for them in the .rodata section.
- assignopt - Third optimization works in group with fourth. This is a constant folding algorithm implementation. If we can use value of the variable, instead the variable itself, we replace it and remove the variable declaration.
- muldivopt - The fourth algorithm always called after the third, and extends the constant folding cycle by convolution of constants. For example, if the expression was looks like:
5 + 5
, it convolve it into10
. - stmtopt - The fifth algorithm always works only with
switch
,if
andwhile
, and looks similar tofuncopt
. The main idea is to remove all unreacheble code. For example, we can always say, that the code inwhile (0)
never called. - varuseopt - The sixth algorithm continues code cleaning. At this point we try to remove all unused variables.
- offsetopt - The seventh algorithm completes all the work described above work by recalculating local and global offsets for variables and arrays.
- deadopt - [WIP algorithm. See description here: https://en.wikipedia.org/wiki/Control_flow]
After compiler optimization, we go into the backend
.
In backend we generate ASM microcode (WIP opcodes and ELF executable), whose architecture depends on the target system architecture, such as x86_64
or x86_32
.
A number of compilers generate an Abstract Syntax Tree (next AST
), and this one of them. The alghorithm is simple. Before generating the tree, we already tokenize the entire file, and now we only need register a bunch of parsers for each token type (We will speak about several types of tokens):
-
LONG_TYPE_TOKEN
- This token indicates, that the following sequence of tokens is an expression that will be placed into the variable of typelong
. If we print this structure, it will looks like:[LONG_TYPE_TOKEN] [NAME] [expression] [...]
-
CALL_TOKEN
- This token tells us, that the next few tokens are the function name and the function's input arguments. Token itself is the function name:[CALL_TOKEN (name)] [SCOPE] [ARG1 expression] [ARG2 expression] ...
-
WHILE_TOKEN
- This token similar toIF_TOKEN
, and tells us about the structure of following tokens:[WHILE_TOKEN] [STMT] [SCOPE] [BODY]
-
PLUS_TOKEN
- This is binary operator token, that has next structure:[PLUS_TOKEN] [LEFT expression] [RIGHT expression]
Every program begins with the start
entrypoint and ends with the exit [return_code];
statement.
start
... // code
exit 0;
Also every program can contain pre-implemented
code blocks and data segments:
function a ; { }
glob int b = 0;
start
exit 0;
The following types are supported:
long
— Integer (64-bit).int
— Integer (32-bit).short
— Integer (16-bit).char
— Integer (8-bit).str
— String (Array of characters).arr
— Array.
int a = 5;
ro int aReadOnly = 5; : Const and global :
glob int aGlobal = 5; : Global :
short b = 1234 + (432 * (2 + 12)) / 87;
char c = 'X';
str name = "Hello, World!";
ptr char strPtr = name; : Pointer to name string :
: Pointers can be used as arrays :
strPtr[0] = 'B';
arr farr 100 char =; // Will allocate array with size 100 and elem size 1 byte
arr sarr 5 int = { 1, 2, 3, 4, 5 }; // Will allocate array for provided elements
Basic arithmetic and logical operations are supported:
Operation | Description |
---|---|
+ |
Addition |
- |
Subtraction |
* |
Multiplication |
/ |
Division (int) |
% |
Module (int) |
== |
Equality |
!= |
Inequality |
> < |
Comparison |
&& || |
Logic operations |
>> << & | ^ |
Bit operations |
switch (a) {
case 1; {
}
case 111; {
}
case 1111; {
}
: ... :
}
Note: Switch statement based on binary search algorithm, thats why, prefer switch in situations with many cases. In other hand, with three or less options, use if for avoiding overhead.
if a > b; {
: ... if code :
}
else {
: ... else code :
}
while (x < 10) && (y > 20); {
: ... loop body :
}
Functions are declared using the function
keyword.
function [name] [type1] arg1; [type2] arg2; ...; {
: function body :
return something;
}
function sumfunc int a; int b; {
return a + b;
}
int result = sumfunc(5, 10);
: Functions without return values can be called directly :
printStr(strptr, size);
syscall(4, 1, ptr, size);
syscall(3, 0, ptr, size);
function printStr ptr char buffer; int size; {
return syscall(4, 1, buffer, size);
}
function getStr ptr char buffer; int size; {
return syscall(3, 0, buffer, size);
}
Comments are written as annotations :
within functions and code blocks:
: Comment in one line :
:
Function description.
Params
- name Description
:
If you want see more examples, please look into the folder examples
.
function itoa ptr char buffer; int dsize; int num; {
int index = dsize - 1;
int tmp = 0;
int isNegative = 0;
if num < 0; {
isNegative = 1;
num = num * -1;
}
while num > 0; {
tmp = num % 10;
buffer[index] = tmp + 48;
index = index - 1;
num = num / 10;
}
if isNegative; {
buffer[index - 1] = '-';
}
return 1;
}
start
int a = 0;
int b = 1;
int c = 0;
int count = 0;
while count < 20; {
c = a + b;
a = b;
b = c;
arr buffer 40 char =;
itoa(buffer, 40, c);
prints(buffer, 40);
count = count + 1;
}
exit 1;
glob arr _mm_head 100000 char = {};
glob arr _blocks_info 100000 int = {};
glob long _head = 0;
function memset ptr char buffer; int val; long size; {
long index = 0;
while index < size; {
buffer[index] = val;
index = index + 1;
}
return 1;
}
function malloc long size; {
if size > 0; {
ptr int curr_mem = _mm_head;
int block_index = 0;
while block_index < 100000; {
if _blocks_info[block_index] == 0; {
_blocks_info[block_index] = 1;
_blocks_info[block_index + 1] = size;
_blocks_info[block_index + 2] = curr_mem;
return curr_mem;
}
curr_mem = curr_mem + _blocks_info[block_index + 1];
block_index = block_index + 3;
}
}
return -1;
}
function free ptr int mem; {
int block_index = 0;
while block_index < 100000; {
if _blocks_info[block_index + 2] == mem; {
_blocks_info[block_index] = 0;
return 1;
}
block_index = block_index + 3;
}
return 1;
}