Skip to content

A miniJava to LLVM compiler, implementing a pipeline for parsing, semantic analysis, and code generation.

Notifications You must be signed in to change notification settings

kondim23/minijava-to-llvm-compiler

Repository files navigation

MiniJava Front-End Compiler

This repository contains a complete front-end compiler for MiniJava, a rigorously defined subset of Java. The project demonstrates the design and implementation of a modern compiler pipeline, including parsing, semantic analysis, and LLVM IR code generation. The generated LLVM IR files are fully compatible with the LLVM toolchain and can be compiled to native executables using clang, enabling seamless integration with modern compilation workflows. This project is suitable for both educational and professional review.

What is MiniJava?

MiniJava is a minimal, object-oriented programming language based on Java, designed for clarity, safety, and ease of analysis. Its features are directly reflected in its grammar and semantics:

  • Every MiniJava program consists of a single main class (with a public static void main(String[] args) method) and zero or more additional class declarations.
  • Classes may extend a single parent class (single inheritance). Each class can declare fields and methods.
  • Fields can be of type int, boolean, int[] (integer array), or any user-defined class type. Methods must declare a return type and can take parameters of any valid type.
  • Local variables are declared at the beginning of methods. Variable and method names must be unique within their scope.
  • Supported statements include variable assignment, array assignment, blocks ({ ... }), conditionals (if-else), loops (while), and print statements (System.out.println).
  • Expressions support integer and boolean operations, array access and length, method calls, and object/array allocation. Only the < operator is allowed for comparisons, and logical operations are limited to && (and) and ! (not).
  • Arrays are zero-indexed and can only be of type int[].
  • The language does not support interfaces, function overloading, static fields or methods (except for main), or user-defined constructors/destructors. The new operator creates objects and arrays with default initialization.
  • The language is a strict subset of Java: every MiniJava program is also a valid Java program, but not all Java programs are valid MiniJava.

This design makes MiniJava an ideal target for demonstrating compiler construction, as it captures the essence of object-oriented programming while remaining concise and manageable.

How the Compiler Works

1. Parsing

The compiler uses JavaCC and JTB to generate a parser from the MiniJava grammar (minijava.jj). The parser reads MiniJava source files and builds an abstract syntax tree (AST).

2. Semantic Analysis (Static Checking)

Semantic analysis is a crucial phase in a compiler, following parsing. While parsing checks if the program follows the correct syntax (grammar), semantic analysis checks if the program makes sense according to the rules of the language. This includes:

  • Type Checking: Ensuring variables and expressions are used with compatible types.
  • Scope Checking: Ensuring variables and methods are declared before use and are visible in the current context.
  • Inheritance and Overriding: Ensuring correct use of inheritance, method overriding, and that there are no illegal redefinitions.
  • Other Rules: Enforcing language-specific rules, such as no duplicate variable names in the same scope, correct method signatures, and valid field/method access.

In this project, semantic analysis is performed by traversing the AST with custom visitors. The process includes:

  • Building a symbol table that records all classes, fields, methods, and their types.
  • Checking for undeclared types, duplicate declarations, and illegal inheritance.
  • Verifying that method calls, assignments, and expressions are type-correct.
  • Ensuring that method overriding follows the rules (same signature, return type, etc.).
  • Calculating and reporting field and method offsets for each class, which is important for later code generation.

If any semantic error is found, the compiler reports it and stops processing the file.

3. Code Generation

The CodeGenerationVisitor is responsible for translating the semantically validated abstract syntax tree (AST) into LLVM Intermediate Representation (IR) code. This visitor traverses the AST and, for each construct in the MiniJava language (such as classes, methods, statements, and expressions), emits the corresponding LLVM IR instructions. The code generator handles:

  • Emitting function and class structure in LLVM IR.
  • Translating MiniJava statements (assignments, control flow, method calls, etc.) into their LLVM equivalents.
  • Managing memory allocation for objects and arrays, including calls to runtime functions for allocation and printing.
  • Generating code for arithmetic and logical operations, respecting MiniJava's type system and operator semantics.
  • Handling method dispatch and virtual method calls, using offsets and tables built during semantic analysis.

The output is written to a .ll file, which can be further compiled and optimized using the LLVM toolchain. This phase demonstrates how high-level object-oriented constructs can be mapped to a low-level, platform-independent intermediate representation, bridging the gap between source code and executable machine code.

4. Error Handling

The compiler handles file not found, parse errors, and semantic errors gracefully, cleaning up any generated files if errors occur.

Testing & Validation

The compiler has been validated using a comprehensive suite of MiniJava test cases, including both correct programs and programs with intentional errors. The test-cases/ directory contains a variety of examples that exercise all language features, inheritance, method overriding, and error handling. This ensures robust parsing, semantic analysis, and code generation. Automated testing and manual inspection of generated LLVM IR further confirm the correctness and reliability of the compiler.

Theoretical Background: Why is Semantic Analysis Important?

Semantic analysis ensures that a program is meaningful and safe to execute. For example, it prevents:

  • Assigning a boolean value to an integer variable.
  • Calling a method that does not exist or is not visible.
  • Using variables before they are declared.
  • Inheriting from a class that is not defined.
  • Overriding a method with a different return type or parameter list.

Without semantic analysis, a program might be syntactically correct but still produce unpredictable or incorrect results at runtime.

MiniJava Grammar

Below is the grammar for MiniJava, as used in this project (from minijava.jj):

Goal ::= MainClass ( TypeDeclaration )* <EOF>
MainClass ::= "class" Identifier "{" "public" "static" "void" "main" "(" "String" "[" "]" Identifier ")" "{" ( VarDeclaration )* ( Statement )* "}" "}"
TypeDeclaration ::= ClassDeclaration | ClassExtendsDeclaration
ClassDeclaration ::= "class" Identifier "{" ( VarDeclaration )* ( MethodDeclaration )* "}"
ClassExtendsDeclaration ::= "class" Identifier "extends" Identifier "{" ( VarDeclaration )* ( MethodDeclaration )* "}"
VarDeclaration ::= Type Identifier ";"
MethodDeclaration ::= "public" Type Identifier "(" ( FormalParameterList )? ")" "{" ( VarDeclaration )* ( Statement )* "return" Expression ";" "}"
FormalParameterList ::= FormalParameter FormalParameterTail
FormalParameter ::= Type Identifier
FormalParameterTail ::= ( FormalParameterTerm )*
FormalParameterTerm ::= "," FormalParameter
Type ::= "int" "[" "]" | "boolean" | "int" | Identifier
Statement ::= Block | AssignmentStatement | ArrayAssignmentStatement | IfStatement | WhileStatement | PrintStatement
Block ::= "{" ( Statement )* "}"
AssignmentStatement ::= Identifier "=" Expression ";"
ArrayAssignmentStatement ::= Identifier "[" Expression "]" "=" Expression ";"
IfStatement ::= "if" "(" Expression ")" Statement "else" Statement
WhileStatement ::= "while" "(" Expression ")" Statement
PrintStatement ::= "System.out.println" "(" Expression ")" ";"
Expression ::= Clause "&&" Clause | PrimaryExpression "<" PrimaryExpression | PrimaryExpression "+" PrimaryExpression | PrimaryExpression "-" PrimaryExpression | PrimaryExpression "*" PrimaryExpression | PrimaryExpression "[" PrimaryExpression "]" | PrimaryExpression "." "length" | PrimaryExpression "." Identifier "(" ( ExpressionList )? ")" | Clause
ExpressionList ::= Expression ExpressionTail
ExpressionTail ::= ( ExpressionTerm )*
ExpressionTerm ::= "," Expression
Clause ::= NotExpression | PrimaryExpression
PrimaryExpression ::= IntegerLiteral | TrueLiteral | FalseLiteral | Identifier | ThisExpression | "new" "int" "[" Expression "]" | "new" Identifier "(" ")" | "(" Expression ")"
IntegerLiteral ::= <INTEGER_LITERAL>
TrueLiteral ::= "true"
FalseLiteral ::= "false"
Identifier ::= <IDENTIFIER>
ThisExpression ::= "this"
NotExpression ::= "!" Clause

How to Build and Run

  • Build the Compiler: Use the provided Makefile:

    make

    This runs JTB and JavaCC to generate the parser and then compiles the Java source files.

  • Compile a MiniJava File: Use the run target with the ARGS variable:

    make run ARGS="<MiniJavaFile.java>"

    This will compile the specified MiniJava file and produce an LLVM IR file (.ll).

  • Compile the Generated LLVM IR to an Executable: The generated .ll files are fully compatible with the LLVM toolchain and can be compiled to native executables using clang:

    clang -o output_executable file.ll

    This allows you to run the compiled MiniJava program as a native binary on your system.

  • Show Help:

    make help

    This will print usage instructions for all available Makefile targets.

  • Clean Build Artifacts:

    make clean

    This removes all generated files and build artifacts.

Project Structure

  • miniJava Compiler/: Main source code (compiler logic, visitors, symbol table, etc.)
  • test-cases/: Example MiniJava programs (both valid and error cases).
  • commons-lang3-3.12.0.jar, javacc5.jar, jtb132di.jar: Dependencies for parsing and utility functions.

Summary

This project demonstrates a complete front-end compiler for MiniJava, handling parsing, semantic analysis, and code generation to LLVM IR. The generated IR can be compiled to native executables using clang, allowing MiniJava programs to be run as efficient binaries. The semantic analysis phase is especially important, as it ensures that MiniJava programs are not only syntactically correct but also meaningful and safe to execute. This project is a valuable educational and technical resource for understanding compiler construction and the translation of high-level object-oriented code to low-level intermediate representations.

About

A miniJava to LLVM compiler, implementing a pipeline for parsing, semantic analysis, and code generation.

Topics

Resources

Stars

Watchers

Forks