add readme and move the main.cpp to src/main.cpp

orcalinux · orcalinux · commit ee3711d94a65 · 2024-12-25T08:38:57.000+02:00
diff --git a/parser/Makefile b/parser/Makefile
@@ -1,6 +1,6 @@
 # Compiler and Flags
 CXX = g++
-CXXFLAGS = -std=c++17 -Iinclude -Wall -Wextra -g
+CXXFLAGS = -std=c++17 -Iinclude -Wall -Wextra -g -MMD -MP
 
 # Directories
 SRCDIR = src
@@ -11,14 +11,12 @@ BINDIR = bin
 # Source and Object Files
 SRCS = $(wildcard $(SRCDIR)/*.cpp)
 OBJS = $(patsubst $(SRCDIR)/%.cpp,$(OBJDIR)/%.o,$(SRCS))
-MAIN_SRC = main.cpp
-MAIN_OBJ = $(OBJDIR)/main.o
 
 # Target Executable
 TARGET = $(BINDIR)/tiny-parser
 
 # Phony Targets
-.PHONY: all clean directories help run
+.PHONY: all clean directories help run test
 
 # Default Target
 all: directories $(TARGET)
@@ -28,17 +26,16 @@ directories:
 	@mkdir -p $(OBJDIR) $(BINDIR)
 
 # Linking the Target Executable
-$(TARGET): $(MAIN_OBJ) $(OBJS)
+$(TARGET): $(OBJS)
 	$(CXX) $(CXXFLAGS) -o $@ $^
 
-# Compiling main.cpp into Object File
-$(MAIN_OBJ): $(MAIN_SRC) $(INCDIR)/parser.hpp $(INCDIR)/token.hpp
-	$(CXX) $(CXXFLAGS) -c $< -o $@
-
 # Compiling Source Files into Object Files
-$(OBJDIR)/%.o: $(SRCDIR)/%.cpp $(INCDIR)/%.hpp
+$(OBJDIR)/%.o: $(SRCDIR)/%.cpp
 	$(CXX) $(CXXFLAGS) -c $< -o $@
 
+# Include dependency files
+-include $(OBJS:.o=.d)
+
 # Run the Parser
 run: all
 	@echo "Running the parser..."
diff --git a/parser/README.md b/parser/README.md
@@ -0,0 +1,328 @@
+# LL(1) Parser Project
+
+## **Table of Contents**
+
+1. [Project Overview](#project-overview)
+2. [What is an LL(1) Parser?](#what-is-an-ll1-parser)
+   - [Key Characteristics](#key-characteristics)
+3. [How the LL(1) Parser Works](#how-the-ll1-parser-works)
+   - [Tokenization (Lexical Analysis)](#tokenization-lexical-analysis)
+   - [Parsing Table Construction](#parsing-table-construction)
+   - [Parsing Process](#parsing-process)
+   - [Error Handling](#error-handling)
+4. [Usage of the Stack in the LL(1) Parser](#usage-of-the-stack-in-the-ll1-parser)
+   - [Role of the Stack](#role-of-the-stack)
+   - [Stack Operations](#stack-operations)
+   - [Parsing Loop Example](#parsing-loop-example)
+   - [Benefits of Using a Stack](#benefits-of-using-a-stack)
+5. [Grammar Specification](#grammar-specification)
+   - [Grammar Overview](#grammar-overview)
+   - [Tokens Definition](#tokens-definition)
+   - [Parser Components](#parser-components)
+6. [Conclusion](#conclusion)
+7. [Appendix: Makefile Explained](#appendix-makefile-explained)
+
+---
+
+## **1. Project Overview**
+
+The **LL(1) Parser Project** is an implementation of an LL(1) parser tailored for a simplified programming language, often referred to as **TINY**. This project serves as an educational tool to understand the fundamentals of compiler construction, specifically focusing on parsing techniques. By leveraging the LL(1) parsing strategy, the project demonstrates how to analyze and interpret the syntactic structure of source code, ensuring its adherence to predefined grammatical rules.
+
+## **2. What is an LL(1) Parser?**
+
+An **LL(1) parser** is a type of **top-down parser** used in compiler design to analyze the syntax of programming languages. The acronym **LL(1)** stands for:
+
+- **L**: **Left-to-right** scanning of the input.
+- **L**: **Leftmost** derivation of the parse tree.
+- **1**: **One** lookahead token used to make parsing decisions.
+
+### **Key Characteristics**
+
+1. **Deterministic Parsing:** LL(1) parsers make parsing decisions based solely on the current non-terminal and the next input token (lookahead), ensuring deterministic behavior without backtracking.
+2. **Predictive Parsing:** By constructing a **parsing table**, the parser can predict which production rule to apply next, facilitating efficient and error-free parsing.
+3. **Grammar Constraints:** Not all grammars are suitable for LL(1) parsing. The grammar must be **non-left-recursive** and **factored** to eliminate ambiguities, ensuring that each decision point in the parsing process is unambiguous.
+
+## **3. How the LL(1) Parser Works**
+
+The LL(1) parser operates through a series of well-defined steps to analyze and interpret the structure of the input source code. Here's an overview of its operational flow:
+
+### **Tokenization (Lexical Analysis)**
+
+Before parsing begins, the source code is **tokenized**. The **lexer (tokenizer)** scans the input characters and groups them into meaningful **tokens** such as identifiers, keywords, operators, and delimiters.
+
+### **Parsing Table Construction**
+
+The core of the LL(1) parser is the **parsing table**, which is a two-dimensional matrix that guides the parsing process. It maps pairs of **non-terminals** and **terminal tokens** to specific **production rules**. This table is constructed based on the grammar of the language, ensuring that each parsing decision is deterministic.
+
+### **Parsing Process**
+
+The parsing process utilizes a **stack** to manage the current state of the parse tree. Here's a step-by-step breakdown:
+
+1. **Initialization:**
+
+   - **Input Buffer:** Contains the sequence of tokens generated by the lexer.
+   - **Stack:** Initialized with the start symbol of the grammar (e.g., `Program`) and an end-of-input marker (e.g., `$`).
+
+2. **Parsing Loop:**
+
+   - **Top of Stack (X):** Examine the symbol at the top of the stack.
+   - **Current Token (a):** Look at the current token from the input buffer.
+
+3. **Decision Making:**
+
+   - **If X is a Terminal:**
+
+     - **Match:** If `X` matches `a`, pop `X` from the stack and advance to the next token.
+     - **Error:** If `X` does not match `a`, report a syntax error.
+
+   - **If X is a Non-Terminal:**
+     - **Lookup:** Use the parsing table entry for `(X, a)` to determine the production rule to apply.
+     - **Apply Production:**
+       - **Pop X:** Remove the non-terminal from the stack.
+       - **Push RHS Symbols:** Push the right-hand side symbols of the production rule onto the stack in reverse order.
+     - **Error:** If no valid entry exists in the parsing table for `(X, a)`, report a syntax error.
+
+4. **Termination:**
+   - The parser successfully terminates when both the stack and the input buffer are empty (only the end-of-input marker `$` remains).
+   - If discrepancies remain, a syntax error is reported.
+
+### **Error Handling**
+
+The LL(1) parser is equipped to detect and report syntactic errors. When the parser encounters an unexpected token or an invalid sequence, it generates meaningful error messages indicating the nature and location of the error, facilitating easier debugging and code correction.
+
+## **4. Usage of the Stack in the LL(1) Parser**
+
+The **stack** is a pivotal component in the LL(1) parsing process, serving as the backbone for managing the parsing state and guiding the derivation of the parse tree. Here's an in-depth look at its role and functionality:
+
+### **Role of the Stack**
+
+- **State Management:** The stack keeps track of the current parsing state, maintaining a record of non-terminals and terminals that need to be processed.
+- **Derivation Control:** It dictates the order in which production rules are applied, ensuring that the parser adheres to the grammar's hierarchical structure.
+
+### **Stack Operations**
+
+#### **a. Initialization**
+
+The stack is initialized with two primary symbols:
+
+- **Start Symbol:** Represents the entry point of the grammar (e.g., `Program`).
+- **End Marker (`$`):** Signifies the end of the input.
+
+**Example:**
+
+```
+Stack: [ $, Program ]
+```
+
+#### **b. Symbol Processing**
+
+- **Top Symbol Examination:** At each step, the parser examines the symbol at the top of the stack to decide the next action.
+- **Terminal Symbols:**
+  - If the top symbol is a terminal and matches the current input token, it's popped from the stack, and the parser advances to the next token.
+- **Non-Terminal Symbols:**
+  - If the top symbol is a non-terminal, the parser consults the parsing table to determine which production rule to apply. The non-terminal is then replaced by the production's right-hand side symbols, which are pushed onto the stack in reverse order to maintain the correct processing sequence.
+
+**Example:**
+
+**Given Production:**
+
+```
+StatementList → Statement StatementList'
+```
+
+**Stack Update:**
+
+```
+Before: [ $, StatementList ]
+After: [ $, StatementList', Statement ]
+```
+
+_(Note: Symbols are pushed in reverse order.)_
+
+### **Parsing Loop Example**
+
+To illustrate how the stack operates during the parsing process, consider the following example:
+
+**Input Source Code:**
+
+```plaintext
+BEGIN
+    x := 10;
+    y := x + 20;
+    WRITE(y);
+END
+```
+
+**Parsing Steps:**
+
+| Step | Stack                                                  | Input Buffer                                | Action                                           |
+| ---- | ------------------------------------------------------ | ------------------------------------------- | ------------------------------------------------ |
+| 1    | [ $, Program ]                                         | BEGIN x := 10; y := x + 20; WRITE(y); END $ | Initialize parser                                |
+| 2    | [ $, END, StatementList, BEGIN ]                       | BEGIN x := 10; y := x + 20; WRITE(y); END $ | Apply `Program → BEGIN StatementList END`        |
+| 3    | [ $, END, StatementList ]                              | BEGIN x := 10; y := x + 20; WRITE(y); END $ | Match `BEGIN` token and pop from stack           |
+| 4    | [ $, END, StatementList', Statement ]                  | x := 10; y := x + 20; WRITE(y); END $       | Apply `StatementList → Statement StatementList'` |
+| 5    | [ $, END, StatementList', Assignment ]                 | x := 10; y := x + 20; WRITE(y); END $       | Apply `Statement → Assignment`                   |
+| 6    | [ $, END, StatementList', Expression, :=, Identifier ] | x := 10; y := x + 20; WRITE(y); END $       | Apply `Assignment → Identifier := Expression`    |
+| 7    | [ $, END, StatementList', Expression ]                 | := 10; y := x + 20; WRITE(y); END $         | Match `Identifier` (`x`) and pop from stack      |
+| 8    | [ $, END, StatementList', Expression ]                 | := 10; y := x + 20; WRITE(y); END $         | Match `:=` and pop from stack                    |
+| 9    | [ $, END, StatementList', Expression ]                 | 10; y := x + 20; WRITE(y); END $            | Apply `Expression → Term Expression'`            |
+| ...  | ...                                                    | ...                                         | Continue parsing similarly                       |
+| N    | [ $ ]                                                  | $                                           | Successfully parsed all tokens                   |
+
+_(Note: "..." indicates continuation of similar steps for subsequent tokens.)_
+
+### **Benefits of Using a Stack**
+
+- **LIFO Structure:** The Last-In-First-Out (LIFO) nature of the stack aligns perfectly with the hierarchical and recursive nature of grammar rules, enabling the parser to backtrack and manage nested structures efficiently.
+- **Efficiency:** Stack operations (push and pop) are computationally efficient, ensuring that the parsing process remains swift even for moderately complex inputs.
+- **Simplified State Tracking:** The stack provides a straightforward mechanism to keep track of which grammar symbols have been processed and which are pending, eliminating the need for more complex state management systems.
+
+## **5. Grammar Specification**
+
+The LL(1) parser operates based on a predefined grammar. Here's an overview of the grammar used in the TINY language for this project.
+
+### **Grammar Overview**
+
+An LL(1) grammar is a type of context-free grammar that can be parsed by an LL(1) parser, which uses one lookahead token to make parsing decisions. The grammar should be **non-left-recursive** and **factored** to fit the LL(1) constraints.
+
+### **Example Grammar for TINY Language**
+
+```plaintext
+1. Program        → BEGIN StatementList END
+2. StatementList  → Statement StatementList'
+3. StatementList' → ; Statement StatementList' | ε
+4. Statement      → Assignment | Write
+5. Assignment     → Identifier := Expression
+6. Write          → WRITE ( Identifier )
+7. Expression     → Term Expression'
+8. Expression'    → + Term Expression' | - Term Expression' | ε
+9. Term           → Factor Term'
+10. Term'         → * Factor Term' | / Factor Term' | ε
+11. Factor        → ( Expression ) | Number | Identifier
+```
+
+### **Tokens Definition**
+
+- **Keywords:** `BEGIN`, `END`, `WRITE`
+- **Operators:** `+`, `-`, `*`, `/`, `:=`
+- **Delimiters:** `(`, `)`, `;`
+- **Identifiers:** Strings starting with a letter, followed by letters or digits.
+- **Numbers:** Integer literals.
+
+### **Parser Components**
+
+- **Lexer (Tokenizer):** Converts the input source code into a stream of tokens.
+- **Parser:** Uses the LL(1) parsing table to validate the token sequence against the grammar.
+- **Parsing Table:** A two-dimensional table used by the parser to decide which production rule to apply based on the current non-terminal and lookahead token.
+- **Stack:** Utilized to keep track of the parsing process, holding both grammar symbols and tokens.
+
+## **6. Conclusion**
+
+The **LL(1) Parser Project** encapsulates the essence of predictive parsing, demonstrating how grammars can be effectively analyzed and validated using a stack-based approach. By understanding the interplay between the parsing table, stack, and grammar rules, this project offers valuable insights into the mechanics of compiler construction and syntax analysis. Whether used as an educational tool or a foundational component for more advanced compiler features, the LL(1) parser stands as a testament to the power and elegance of deterministic parsing strategies.
+
+## **7. Appendix: Makefile Explained**
+
+For those interested in the intricacies of the Makefile, here's a breakdown of its components:
+
+```makefile
+# Compiler and Flags
+CXX = g++
+CXXFLAGS = -std=c++17 -Iinclude -Wall -Wextra -g -MMD -MP
+
+# Directories
+SRCDIR = src
+INCDIR = include
+OBJDIR = obj
+BINDIR = bin
+
+# Source and Object Files
+SRCS = $(wildcard $(SRCDIR)/*.cpp)
+OBJS = $(patsubst $(SRCDIR)/%.cpp,$(OBJDIR)/%.o,$(SRCS))
+
+# Target Executable
+TARGET = $(BINDIR)/tiny-parser
+
+# Phony Targets
+.PHONY: all clean directories help run test
+
+# Default Target
+all: directories $(TARGET)
+
+# Rule to Create Necessary Directories
+directories:
+	@mkdir -p $(OBJDIR) $(BINDIR)
+
+# Linking the Target Executable
+$(TARGET): $(OBJS)
+	$(CXX) $(CXXFLAGS) -o $@ $^
+
+# Compiling Source Files into Object Files
+$(OBJDIR)/%.o: $(SRCDIR)/%.cpp
+	$(CXX) $(CXXFLAGS) -c $< -o $@
+
+# Include dependency files
+-include $(OBJS:.o=.d)
+
+# Run the Parser
+run: all
+	@echo "Running the parser..."
+	@./$(TARGET)
+
+# Help Target
+help:
+	@echo "========================================"
+	@echo "          Makefile Help Menu            "
+	@echo "========================================"
+	@echo "Available Targets:"
+	@echo "  all       Build the project."
+	@echo "  run       Build and run the parser."
+	@echo "  clean     Remove build artifacts."
+	@echo "  help      Show this help message."
+	@echo ""
+	@echo "Usage Examples:"
+	@echo "  make        # Builds the project."
+	@echo "  make run    # Builds and runs the parser."
+	@echo "  make clean  # Cleans all build artifacts."
+	@echo "  make help   # Displays this help menu."
+	@echo "========================================"
+
+# Clean Target to Remove Build Artifacts
+clean:
+	rm -rf $(OBJDIR) $(BINDIR)
+```
+
+### **Key Components**
+
+- **Variables:**
+
+  - **`CXX`**: Specifies the compiler (`g++`).
+  - **`CXXFLAGS`**: Compiler flags for C++17 standard, include paths, warnings, debugging, and dependency generation.
+  - **`SRCDIR`, `INCDIR`, `OBJDIR`, `BINDIR`**: Define directories for source files, headers, object files, and binaries.
+
+- **Targets:**
+
+  - **`all`**: Default target that builds the project.
+  - **`directories`**: Ensures necessary directories exist.
+  - **`$(TARGET)`**: Links all object files to create the executable.
+  - **`$(OBJDIR)/%.o`**: Pattern rule to compile `.cpp` files into `.o` files.
+  - **`run`**: Builds and runs the parser.
+  - **`help`**: Displays available Makefile targets.
+  - **`clean`**: Removes build artifacts.
+
+- **Dependency Management:**
+  - **`-MMD -MP`**: Flags for automatic dependency generation.
+  - **`-include $(OBJS:.o=.d)`**: Includes generated dependency files to track header dependencies.
+
+### **Usage Tips**
+
+- **Parallel Builds:** Speed up compilation using:
+
+  ```bash
+  make -j4
+  ```
+
+  Replace `4` with the number of cores you wish to utilize.
+
+- **Verbose Output:** Remove `@` symbols in the Makefile commands to see all build commands executed.
+
+- **Adding New Source Files:** Place new `.cpp` and `.hpp` files in the `src/` and `include/` directories respectively. The Makefile automatically detects and compiles them.
diff --git a/parser/src/main.cpp b/parser/src/main.cpp