This is a simple compiler for a [memory-safe] programming language where everything (even numbers) are arrays. It is very primitive, but it is turing complete.
Note: I really can't provide an online demo or a screen recording for this. However, it should be fairly easy to clone and compile this with minimal effort. The demo image below is the closest I could get to a real demo (below the image is the explanation):
Figure A out of A: Top left shows input alan code. The rest of the left side is the disassembled machine code of the program. Top right is the intermediate representation of the program, and bottom right is the compiled machine code output of the program.
This is a work-in-progress compiler (with a self-estimation of around 90% completion), which does not use any external dependencies. Not even an assembler, I wrote the machine code output functions by hand, all by myself. The C compiler uses no external libraries but the standard libc. It takes alan code, parses it, optimizes it, error checks it, converts it into semi-assembly code using IR techniques, then it converts the IR code into x86 machine code that can be ran in C like this:
// ...
mc function_pointer = __x86_64_linux_get_exec(target->array, environment->func_start, target->size);
// ...
function_pointer();
with no errors. That's right, the compiled program creates a function for you to run in your code. This means I can even integrate this compiler into my future projects and use it just like you would use an interpreted language, with the difference of the code being actually compiled.
- The code that the compiler produces is actually less code than writing the same program in C. Yes, you heard that right!
- The name wasn't originally supposed to be a reference to Alan Turing, but later on I realized the correlation.
- The static analyzer is called the Inspector. Why? I don't know, I didn't want to just call it "Static Analyzer". That would be boring. The parser is called Librarian, the intermediate representation generator is called Tourist, and the executable generators are called Scribes.
- Update 1.p3C: Static analyzer is now improved, and the compiler can (sloppily) target x86-64 direct bytecode output for linux, but only as JIT.
- Update 1.p2C: Created and improved the C parser + static analyzer. The end goal is compiling programs directly to x86_64 machine code as an executable.
- Update 1.p1: Added literal unrolling and caching, optimized function calls and reduced dereference count. The compiled executable now runs about 100x faster!
Bad news (yes I hid this over here):
- The C version is not fully usable yet (however it does produce direct x86 output!)
- The typescript version is outdated and it requires most commonly used utilities to be written using C interfacing, or written directly into the C templating. An updating including a proper guide and an stdlib alongside troubleshooting coming soon.
This version is in-development and is not fully fledged, as in it only produces a Just-In-Time compiled function pointer for 64-bit x86 systems running linux. Even that isn't fully fledged out.
- Install clang and make
- Clone the repo and cd into it
- Run
make
. The parser executable will be in theout/
folder and it will also automatically run it with the test file of the day.
Currently the typescript version that emits C code and compiles it using
clang
is available. The C version is trying to emit machine code directly with no dependency requirements and it is a work in progress.
- Install bun
- Clone the repo and cd into the repo and then to the
ts_version/
folder - Run
make build
- Compiler executable is now in
ts_version/out/alc
. Run like this:alc <input_file> <output_executable>
Note: The compiler requires clang
in path to work.
puts "Hello World!\n";
list x;
set x [push x 'H'];
set x [push x 'i'];
puts x;
fn void say_hello [
arg list name;
puts "Hello, ";
puts name;
puts "!\n";
]
say_hello "Alan";
num i 0;
while [ sub 256 i ] [ # Print all ASCII characters
puts i;
set i [ add i 1 ];
];
fn num mod [
arg a;
arg b;
c{ $single($value(sym_a) % $value(sym_b), &cur_scope) };
];
log [ mod 5 2 ];
Warning: REALLY not recommended unless you know what you're doing. If you do:
- Your C interface MUST return an object of type
A
.- All variables are prefixed with
sym_
.- All functions are prefixed with
fn_
.- This feature is ONLY supported in the Typescript version of the compiler.
fn void fizzbuzz [
arg num count;
num i 1;
while [ sub count i ] [
num mod_3 [ mod i 3 ];
num mod_5 [ mod i 5 ];
num and_val [ and mod_3 mod_5 ];
if and_val [ log i; puts "\n" ];
unless and_val [
unless mod_3 [ puts "Fizz" ];
unless mod_5 [ puts "Buzz" ];
puts "\n";
];
set i [ add i 1 ];
];
];
fizzbuzz 100;
fn list cat [
arg list left;
arg list right;
list result;
set left_len [ len left ];
set right_len [ len right ];
set li 0;
set ri 0;
while [ sub left_len li ] [
push result [ get left li ];
set li [ dec li ];
];
while [ sub right_len ri ] [
push result [ get right ri ];
set ri [ dec ri ];
];
ret result;
];
print [ cat "Hello, " "World!" ];
This should print Hello, World!
.
The C version is memory-safe, does all the parsing on its own and currently can convert the AST into sloppy JIT bytecode. The currently supported IR language is very basic and has only 12 instructions. It is semi-stack-based, pushing arguments to a dynamic stack for when functions are called. The #1 planned architecture to be supported is x86, #2 being javascript, #3 being C, #4 being arm, #5 being RISC and #6 being my own custom cpu architecture, bit.
- The following code snippet:
num i [ add 2 5 ];
log i;
puts "Hello World!";
- Converts into this IR (the x86 output for complex functions isn't done yet):
main:
CONST 5
PUSH
CONST 2
PUSH
CALL __0x28 ; add
SET 0x0_1 ; i
ADDR [0x0_1] ; [i]
PUSH
CALL __0x26 ; log
ADDR 0x0_1F ; "Hello World!"
PUSH
CALL __0x24 ; puts
Coming soon in future updates! I've started work on the x86 output, so expect this spot to be pretty cluttered.
- Implement an arena stack
-
call(complicated...);
-
ret(complicated...);
- Better syntax?
- Write a better stdlib (trying)
- Implement carrying for arithmetic functions (eg. addc)
- Bundle code + x86 together to allow runtime code inspection and modification (maybe?)
- Write code elimination
- Add bytecode optimizations
- Write the allocator in alan itself
-
pop(mov [cur_stack], %rex);
-
push(push_stack %rex);
-
jmp0(cmp %rex; jmp0 [addr]);
-
jmpn0(reverse of jmp0)
-
addrI(mov %rex, addr);
-
addrD(mov %rex, [addr]);
-
set(mov [addr], %rex);
- Implement a call stack
- Implement direct linux x86 output (almost done)
- Better error checking (soon) (kinda done?)
- Eliminate / shorten code (Code structuralized)
- Rewrite parser in C
- Rewrite codegen in C
- Optimize performance
- Finish the IR emitter (done besides bug fixes and testing)
- Eliminate / shorten code pass 1