Skip to content

j1sk1ss/CordellCompiler.PETPRJ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cordell Compiler Reference

Navigation

Summary

Cordell Compiler is a compact hobby compiler for Cordell Programming Language with a simple syntax, inspired by C and assembly. It is designed for studying compilation, code optimization, translation, and low-level microcode generation.
Main goal of this project is learning of compilers architecture and porting one to CordellOS project.


Architecture

Frontend, optimization and Backend

Main idea of this compiler is simplification of architecture of real compilers like gcc or CMake. This project splitted into two parts: frontend and backend.
frontend part, like frontend part in real compilers, takes files with .CPL extention, generates abstract syntax tree (AST), and optimize code with next algorithms:

  • funcopt - First optimization algorithm, that takes care about unused functions, iterate through files from input, register all used functions, and delete all unregistered.
  • stropt - Second optimization algorithm takes care about strings that read only. If user uses strings with ro flag somewhere, this algorithm will allocate memory for them in the .rodata section.
  • assignopt - Third optimization works in group with fourth. This is a constant folding algorithm implementation. If we can use value of the variable, instead the variable itself, we replace it and remove the variable declaration.
  • muldivopt - The fourth algorithm always called after the third, and extends the constant folding cycle by convolution of constants. For example, if the expression was looks like: 5 + 5, it convolve it into 10.
  • stmtopt - The fifth algorithm always works only with switch, if and while, and looks similar to funcopt. The main idea is to remove all unreacheble code. For example, we can always say, that the code in while (0) never called.
  • varuseopt - The sixth algorithm continues code cleaning. At this point we try to remove all unused variables.
  • offsetopt - The seventh algorithm completes all the work described above work by recalculating local and global offsets for variables and arrays.
  • deadopt - [WIP algorithm. See description here: https://en.wikipedia.org/wiki/Control_flow]

After compiler optimization, we go into the backend.
In backend we generate ASM microcode (WIP opcodes and ELF executable), whose architecture depends on the target system architecture, such as x86_64 or x86_32.

AST

A number of compilers generate an Abstract Syntax Tree (next AST), and this one of them. The alghorithm is simple. Before generating the tree, we already tokenize the entire file, and now we only need register a bunch of parsers for each token type (We will speak about several types of tokens):

  • LONG_TYPE_TOKEN - This token indicates, that the following sequence of tokens is an expression that will be placed into the variable of type long. If we print this structure, it will looks like:

      [LONG_TYPE_TOKEN]
          [NAME]
          [expression]
              [...]
    
  • CALL_TOKEN - This token tells us, that the next few tokens are the function name and the function's input arguments. Token itself is the function name:

      [CALL_TOKEN (name)]
          [SCOPE]
              [ARG1 expression]
              [ARG2 expression]
              ...
    
  • WHILE_TOKEN - This token similar to IF_TOKEN, and tells us about the structure of following tokens:

      [WHILE_TOKEN]
          [STMT]
          [SCOPE]
              [BODY]
    
  • PLUS_TOKEN - This is binary operator token, that has next structure:

      [PLUS_TOKEN]
          [LEFT expression]
          [RIGHT expression]
    

.CPL Documentation

Program Structure

Every program begins with the start entrypoint and ends with the exit [return_code]; statement.

    start 
        ... // code 
    exit 0;

Also every program can contain pre-implemented code blocks and data segments:

    function a ; { }
    glob int b = 0;

    start
    exit 0;

Variables and Types

The following types are supported:

  • long — Integer (64-bit).
  • int — Integer (32-bit).
  • short — Integer (16-bit).
  • char — Integer (8-bit).
  • str — String (Array of characters).
  • arr — Array.

Declaring Variables

    int a = 5;
    ro int aReadOnly = 5; : Const and global :
    glob int aGlobal = 5; : Global :

    short b = 1234 + (432 * (2 + 12)) / 87;
    char c = 'X';

    str name = "Hello, World!";
    ptr char strPtr = name; : Pointer to name string :
    : Pointers can be used as arrays :
    strPtr[0] = 'B';

    arr farr 100 char =; // Will allocate array with size 100 and elem size 1 byte
    arr sarr 5 int = { 1, 2, 3, 4, 5 }; // Will allocate array for provided elements

Operations

Basic arithmetic and logical operations are supported:

Operation Description
+ Addition
- Subtraction
* Multiplication
/ Division (int)
% Module (int)
== Equality
!= Inequality
> < Comparison
&& || Logic operations
>> << & | ^ Bit operations

Loops and Conditions

Switch expression

    switch (a) {
        case 1; {
        }
        case 111; {
        }
        case 1111; {
        }
        : ... :
    }

Note: Switch statement based on binary search algorithm, thats why, prefer switch in situations with many cases. In other hand, with three or less options, use if for avoiding overhead.

If Condition

    if a > b; {
        : ... if code :
    }
    else {
        : ... else code :
    }

While Loop

    while (x < 10) && (y > 20); {
        : ... loop body : 
    }

Functions

Functions are declared using the function keyword.

Function Signature:

    function [name] [type1] arg1; [type2] arg2; ...; {
        : function body :
        return something;
    }

Example:

    function sumfunc int a; int b; {
        return a + b;
    }

Calling the function

    int result = sumfunc(5, 10);
    : Functions without return values can be called directly :
    printStr(strptr, size);

Input/Output (via system calls)

String Input/Output

    syscall(4, 1, ptr, size);
    syscall(3, 0, ptr, size);

Wrapping in a function:

    function printStr ptr char buffer; int size; {
        return syscall(4, 1, buffer, size); 
    }

    function getStr ptr char buffer; int size; {
        return syscall(3, 0, buffer, size); 
    }

Comments

Comments are written as annotations : within functions and code blocks:

    : Comment in one line :

    :
    Function description.
    Params
        - name Description
    :

Examples

If you want see more examples, please look into the folder examples.

Example of Printing a Number:

    function itoa ptr char buffer; int dsize; int num; {
        int index = dsize - 1;
        int tmp = 0;

        int isNegative = 0;
        if num < 0; {
            isNegative = 1;
            num = num * -1;
        }

        while num > 0; {
            tmp = num % 10;
            buffer[index] = tmp + 48;
            index = index - 1;
            num = num / 10;
        }

        if isNegative; {
            buffer[index - 1] = '-';
        }

        return 1;
    }

Example of Fibonacci N-number print:

    start
        int a = 0;
        int b = 1;
        int c = 0;
        int count = 0;
        while count < 20; {
            c = a + b;
            a = b;
            b = c;
            
            arr buffer 40 char =;
            itoa(buffer, 40, c);
            prints(buffer, 40);

            count = count + 1;
        }
    exit 1;

Example of simple memory manager:

    glob arr _mm_head 100000 char = {};
    glob arr _blocks_info 100000 int = {};
    glob long _head = 0;

    function memset ptr char buffer; int val; long size; {
        long index = 0;
        while index < size; {
            buffer[index] = val;
            index = index + 1;
        }

        return 1;
    }

    function malloc long size; {
        if size > 0; {
            ptr int curr_mem = _mm_head;
            int block_index = 0;
            while block_index < 100000; {
                if _blocks_info[block_index] == 0; {
                    _blocks_info[block_index] = 1;
                    _blocks_info[block_index + 1] = size;
                    _blocks_info[block_index + 2] = curr_mem;
                    return curr_mem;
                }

                curr_mem = curr_mem + _blocks_info[block_index + 1];
                block_index = block_index + 3;
            }
        }

        return -1;
    }

    function free ptr int mem; {
        int block_index = 0;
        while block_index < 100000; {
            if _blocks_info[block_index + 2] == mem; {
                _blocks_info[block_index] = 0;
                return 1;
            }

            block_index = block_index + 3;
        }

        return 1;
    }

Links

About

Little hobby compiler

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages