Skip to content

Commit ee3711d

Browse files
committed
add readme and move the main.cpp to src/main.cpp
1 parent 6c5e685 commit ee3711d

File tree

3 files changed

+335
-10
lines changed

3 files changed

+335
-10
lines changed

parser/Makefile

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Compiler and Flags
22
CXX = g++
3-
CXXFLAGS = -std=c++17 -Iinclude -Wall -Wextra -g
3+
CXXFLAGS = -std=c++17 -Iinclude -Wall -Wextra -g -MMD -MP
44

55
# Directories
66
SRCDIR = src
@@ -11,14 +11,12 @@ BINDIR = bin
1111
# Source and Object Files
1212
SRCS = $(wildcard $(SRCDIR)/*.cpp)
1313
OBJS = $(patsubst $(SRCDIR)/%.cpp,$(OBJDIR)/%.o,$(SRCS))
14-
MAIN_SRC = main.cpp
15-
MAIN_OBJ = $(OBJDIR)/main.o
1614

1715
# Target Executable
1816
TARGET = $(BINDIR)/tiny-parser
1917

2018
# Phony Targets
21-
.PHONY: all clean directories help run
19+
.PHONY: all clean directories help run test
2220

2321
# Default Target
2422
all: directories $(TARGET)
@@ -28,17 +26,16 @@ directories:
2826
@mkdir -p $(OBJDIR) $(BINDIR)
2927

3028
# Linking the Target Executable
31-
$(TARGET): $(MAIN_OBJ) $(OBJS)
29+
$(TARGET): $(OBJS)
3230
$(CXX) $(CXXFLAGS) -o $@ $^
3331

34-
# Compiling main.cpp into Object File
35-
$(MAIN_OBJ): $(MAIN_SRC) $(INCDIR)/parser.hpp $(INCDIR)/token.hpp
36-
$(CXX) $(CXXFLAGS) -c $< -o $@
37-
3832
# Compiling Source Files into Object Files
39-
$(OBJDIR)/%.o: $(SRCDIR)/%.cpp $(INCDIR)/%.hpp
33+
$(OBJDIR)/%.o: $(SRCDIR)/%.cpp
4034
$(CXX) $(CXXFLAGS) -c $< -o $@
4135

36+
# Include dependency files
37+
-include $(OBJS:.o=.d)
38+
4239
# Run the Parser
4340
run: all
4441
@echo "Running the parser..."

parser/README.md

Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
# LL(1) Parser Project
2+
3+
## **Table of Contents**
4+
5+
1. [Project Overview](#project-overview)
6+
2. [What is an LL(1) Parser?](#what-is-an-ll1-parser)
7+
- [Key Characteristics](#key-characteristics)
8+
3. [How the LL(1) Parser Works](#how-the-ll1-parser-works)
9+
- [Tokenization (Lexical Analysis)](#tokenization-lexical-analysis)
10+
- [Parsing Table Construction](#parsing-table-construction)
11+
- [Parsing Process](#parsing-process)
12+
- [Error Handling](#error-handling)
13+
4. [Usage of the Stack in the LL(1) Parser](#usage-of-the-stack-in-the-ll1-parser)
14+
- [Role of the Stack](#role-of-the-stack)
15+
- [Stack Operations](#stack-operations)
16+
- [Parsing Loop Example](#parsing-loop-example)
17+
- [Benefits of Using a Stack](#benefits-of-using-a-stack)
18+
5. [Grammar Specification](#grammar-specification)
19+
- [Grammar Overview](#grammar-overview)
20+
- [Tokens Definition](#tokens-definition)
21+
- [Parser Components](#parser-components)
22+
6. [Conclusion](#conclusion)
23+
7. [Appendix: Makefile Explained](#appendix-makefile-explained)
24+
25+
---
26+
27+
## **1. Project Overview**
28+
29+
The **LL(1) Parser Project** is an implementation of an LL(1) parser tailored for a simplified programming language, often referred to as **TINY**. This project serves as an educational tool to understand the fundamentals of compiler construction, specifically focusing on parsing techniques. By leveraging the LL(1) parsing strategy, the project demonstrates how to analyze and interpret the syntactic structure of source code, ensuring its adherence to predefined grammatical rules.
30+
31+
## **2. What is an LL(1) Parser?**
32+
33+
An **LL(1) parser** is a type of **top-down parser** used in compiler design to analyze the syntax of programming languages. The acronym **LL(1)** stands for:
34+
35+
- **L**: **Left-to-right** scanning of the input.
36+
- **L**: **Leftmost** derivation of the parse tree.
37+
- **1**: **One** lookahead token used to make parsing decisions.
38+
39+
### **Key Characteristics**
40+
41+
1. **Deterministic Parsing:** LL(1) parsers make parsing decisions based solely on the current non-terminal and the next input token (lookahead), ensuring deterministic behavior without backtracking.
42+
2. **Predictive Parsing:** By constructing a **parsing table**, the parser can predict which production rule to apply next, facilitating efficient and error-free parsing.
43+
3. **Grammar Constraints:** Not all grammars are suitable for LL(1) parsing. The grammar must be **non-left-recursive** and **factored** to eliminate ambiguities, ensuring that each decision point in the parsing process is unambiguous.
44+
45+
## **3. How the LL(1) Parser Works**
46+
47+
The LL(1) parser operates through a series of well-defined steps to analyze and interpret the structure of the input source code. Here's an overview of its operational flow:
48+
49+
### **Tokenization (Lexical Analysis)**
50+
51+
Before parsing begins, the source code is **tokenized**. The **lexer (tokenizer)** scans the input characters and groups them into meaningful **tokens** such as identifiers, keywords, operators, and delimiters.
52+
53+
### **Parsing Table Construction**
54+
55+
The core of the LL(1) parser is the **parsing table**, which is a two-dimensional matrix that guides the parsing process. It maps pairs of **non-terminals** and **terminal tokens** to specific **production rules**. This table is constructed based on the grammar of the language, ensuring that each parsing decision is deterministic.
56+
57+
### **Parsing Process**
58+
59+
The parsing process utilizes a **stack** to manage the current state of the parse tree. Here's a step-by-step breakdown:
60+
61+
1. **Initialization:**
62+
63+
- **Input Buffer:** Contains the sequence of tokens generated by the lexer.
64+
- **Stack:** Initialized with the start symbol of the grammar (e.g., `Program`) and an end-of-input marker (e.g., `$`).
65+
66+
2. **Parsing Loop:**
67+
68+
- **Top of Stack (X):** Examine the symbol at the top of the stack.
69+
- **Current Token (a):** Look at the current token from the input buffer.
70+
71+
3. **Decision Making:**
72+
73+
- **If X is a Terminal:**
74+
75+
- **Match:** If `X` matches `a`, pop `X` from the stack and advance to the next token.
76+
- **Error:** If `X` does not match `a`, report a syntax error.
77+
78+
- **If X is a Non-Terminal:**
79+
- **Lookup:** Use the parsing table entry for `(X, a)` to determine the production rule to apply.
80+
- **Apply Production:**
81+
- **Pop X:** Remove the non-terminal from the stack.
82+
- **Push RHS Symbols:** Push the right-hand side symbols of the production rule onto the stack in reverse order.
83+
- **Error:** If no valid entry exists in the parsing table for `(X, a)`, report a syntax error.
84+
85+
4. **Termination:**
86+
- The parser successfully terminates when both the stack and the input buffer are empty (only the end-of-input marker `$` remains).
87+
- If discrepancies remain, a syntax error is reported.
88+
89+
### **Error Handling**
90+
91+
The LL(1) parser is equipped to detect and report syntactic errors. When the parser encounters an unexpected token or an invalid sequence, it generates meaningful error messages indicating the nature and location of the error, facilitating easier debugging and code correction.
92+
93+
## **4. Usage of the Stack in the LL(1) Parser**
94+
95+
The **stack** is a pivotal component in the LL(1) parsing process, serving as the backbone for managing the parsing state and guiding the derivation of the parse tree. Here's an in-depth look at its role and functionality:
96+
97+
### **Role of the Stack**
98+
99+
- **State Management:** The stack keeps track of the current parsing state, maintaining a record of non-terminals and terminals that need to be processed.
100+
- **Derivation Control:** It dictates the order in which production rules are applied, ensuring that the parser adheres to the grammar's hierarchical structure.
101+
102+
### **Stack Operations**
103+
104+
#### **a. Initialization**
105+
106+
The stack is initialized with two primary symbols:
107+
108+
- **Start Symbol:** Represents the entry point of the grammar (e.g., `Program`).
109+
- **End Marker (`$`):** Signifies the end of the input.
110+
111+
**Example:**
112+
113+
```
114+
Stack: [ $, Program ]
115+
```
116+
117+
#### **b. Symbol Processing**
118+
119+
- **Top Symbol Examination:** At each step, the parser examines the symbol at the top of the stack to decide the next action.
120+
- **Terminal Symbols:**
121+
- If the top symbol is a terminal and matches the current input token, it's popped from the stack, and the parser advances to the next token.
122+
- **Non-Terminal Symbols:**
123+
- If the top symbol is a non-terminal, the parser consults the parsing table to determine which production rule to apply. The non-terminal is then replaced by the production's right-hand side symbols, which are pushed onto the stack in reverse order to maintain the correct processing sequence.
124+
125+
**Example:**
126+
127+
**Given Production:**
128+
129+
```
130+
StatementList → Statement StatementList'
131+
```
132+
133+
**Stack Update:**
134+
135+
```
136+
Before: [ $, StatementList ]
137+
After: [ $, StatementList', Statement ]
138+
```
139+
140+
_(Note: Symbols are pushed in reverse order.)_
141+
142+
### **Parsing Loop Example**
143+
144+
To illustrate how the stack operates during the parsing process, consider the following example:
145+
146+
**Input Source Code:**
147+
148+
```plaintext
149+
BEGIN
150+
x := 10;
151+
y := x + 20;
152+
WRITE(y);
153+
END
154+
```
155+
156+
**Parsing Steps:**
157+
158+
| Step | Stack | Input Buffer | Action |
159+
| ---- | ------------------------------------------------------ | ------------------------------------------- | ------------------------------------------------ |
160+
| 1 | [ $, Program ] | BEGIN x := 10; y := x + 20; WRITE(y); END $ | Initialize parser |
161+
| 2 | [ $, END, StatementList, BEGIN ] | BEGIN x := 10; y := x + 20; WRITE(y); END $ | Apply `Program → BEGIN StatementList END` |
162+
| 3 | [ $, END, StatementList ] | BEGIN x := 10; y := x + 20; WRITE(y); END $ | Match `BEGIN` token and pop from stack |
163+
| 4 | [ $, END, StatementList', Statement ] | x := 10; y := x + 20; WRITE(y); END $ | Apply `StatementList → Statement StatementList'` |
164+
| 5 | [ $, END, StatementList', Assignment ] | x := 10; y := x + 20; WRITE(y); END $ | Apply `Statement → Assignment` |
165+
| 6 | [ $, END, StatementList', Expression, :=, Identifier ] | x := 10; y := x + 20; WRITE(y); END $ | Apply `Assignment → Identifier := Expression` |
166+
| 7 | [ $, END, StatementList', Expression ] | := 10; y := x + 20; WRITE(y); END $ | Match `Identifier` (`x`) and pop from stack |
167+
| 8 | [ $, END, StatementList', Expression ] | := 10; y := x + 20; WRITE(y); END $ | Match `:=` and pop from stack |
168+
| 9 | [ $, END, StatementList', Expression ] | 10; y := x + 20; WRITE(y); END $ | Apply `Expression → Term Expression'` |
169+
| ... | ... | ... | Continue parsing similarly |
170+
| N | [ $ ] | $ | Successfully parsed all tokens |
171+
172+
_(Note: "..." indicates continuation of similar steps for subsequent tokens.)_
173+
174+
### **Benefits of Using a Stack**
175+
176+
- **LIFO Structure:** The Last-In-First-Out (LIFO) nature of the stack aligns perfectly with the hierarchical and recursive nature of grammar rules, enabling the parser to backtrack and manage nested structures efficiently.
177+
- **Efficiency:** Stack operations (push and pop) are computationally efficient, ensuring that the parsing process remains swift even for moderately complex inputs.
178+
- **Simplified State Tracking:** The stack provides a straightforward mechanism to keep track of which grammar symbols have been processed and which are pending, eliminating the need for more complex state management systems.
179+
180+
## **5. Grammar Specification**
181+
182+
The LL(1) parser operates based on a predefined grammar. Here's an overview of the grammar used in the TINY language for this project.
183+
184+
### **Grammar Overview**
185+
186+
An LL(1) grammar is a type of context-free grammar that can be parsed by an LL(1) parser, which uses one lookahead token to make parsing decisions. The grammar should be **non-left-recursive** and **factored** to fit the LL(1) constraints.
187+
188+
### **Example Grammar for TINY Language**
189+
190+
```plaintext
191+
1. Program → BEGIN StatementList END
192+
2. StatementList → Statement StatementList'
193+
3. StatementList' → ; Statement StatementList' | ε
194+
4. Statement → Assignment | Write
195+
5. Assignment → Identifier := Expression
196+
6. Write → WRITE ( Identifier )
197+
7. Expression → Term Expression'
198+
8. Expression' → + Term Expression' | - Term Expression' | ε
199+
9. Term → Factor Term'
200+
10. Term' → * Factor Term' | / Factor Term' | ε
201+
11. Factor → ( Expression ) | Number | Identifier
202+
```
203+
204+
### **Tokens Definition**
205+
206+
- **Keywords:** `BEGIN`, `END`, `WRITE`
207+
- **Operators:** `+`, `-`, `*`, `/`, `:=`
208+
- **Delimiters:** `(`, `)`, `;`
209+
- **Identifiers:** Strings starting with a letter, followed by letters or digits.
210+
- **Numbers:** Integer literals.
211+
212+
### **Parser Components**
213+
214+
- **Lexer (Tokenizer):** Converts the input source code into a stream of tokens.
215+
- **Parser:** Uses the LL(1) parsing table to validate the token sequence against the grammar.
216+
- **Parsing Table:** A two-dimensional table used by the parser to decide which production rule to apply based on the current non-terminal and lookahead token.
217+
- **Stack:** Utilized to keep track of the parsing process, holding both grammar symbols and tokens.
218+
219+
## **6. Conclusion**
220+
221+
The **LL(1) Parser Project** encapsulates the essence of predictive parsing, demonstrating how grammars can be effectively analyzed and validated using a stack-based approach. By understanding the interplay between the parsing table, stack, and grammar rules, this project offers valuable insights into the mechanics of compiler construction and syntax analysis. Whether used as an educational tool or a foundational component for more advanced compiler features, the LL(1) parser stands as a testament to the power and elegance of deterministic parsing strategies.
222+
223+
## **7. Appendix: Makefile Explained**
224+
225+
For those interested in the intricacies of the Makefile, here's a breakdown of its components:
226+
227+
```makefile
228+
# Compiler and Flags
229+
CXX = g++
230+
CXXFLAGS = -std=c++17 -Iinclude -Wall -Wextra -g -MMD -MP
231+
232+
# Directories
233+
SRCDIR = src
234+
INCDIR = include
235+
OBJDIR = obj
236+
BINDIR = bin
237+
238+
# Source and Object Files
239+
SRCS = $(wildcard $(SRCDIR)/*.cpp)
240+
OBJS = $(patsubst $(SRCDIR)/%.cpp,$(OBJDIR)/%.o,$(SRCS))
241+
242+
# Target Executable
243+
TARGET = $(BINDIR)/tiny-parser
244+
245+
# Phony Targets
246+
.PHONY: all clean directories help run test
247+
248+
# Default Target
249+
all: directories $(TARGET)
250+
251+
# Rule to Create Necessary Directories
252+
directories:
253+
@mkdir -p $(OBJDIR) $(BINDIR)
254+
255+
# Linking the Target Executable
256+
$(TARGET): $(OBJS)
257+
$(CXX) $(CXXFLAGS) -o $@ $^
258+
259+
# Compiling Source Files into Object Files
260+
$(OBJDIR)/%.o: $(SRCDIR)/%.cpp
261+
$(CXX) $(CXXFLAGS) -c $< -o $@
262+
263+
# Include dependency files
264+
-include $(OBJS:.o=.d)
265+
266+
# Run the Parser
267+
run: all
268+
@echo "Running the parser..."
269+
@./$(TARGET)
270+
271+
# Help Target
272+
help:
273+
@echo "========================================"
274+
@echo " Makefile Help Menu "
275+
@echo "========================================"
276+
@echo "Available Targets:"
277+
@echo " all Build the project."
278+
@echo " run Build and run the parser."
279+
@echo " clean Remove build artifacts."
280+
@echo " help Show this help message."
281+
@echo ""
282+
@echo "Usage Examples:"
283+
@echo " make # Builds the project."
284+
@echo " make run # Builds and runs the parser."
285+
@echo " make clean # Cleans all build artifacts."
286+
@echo " make help # Displays this help menu."
287+
@echo "========================================"
288+
289+
# Clean Target to Remove Build Artifacts
290+
clean:
291+
rm -rf $(OBJDIR) $(BINDIR)
292+
```
293+
294+
### **Key Components**
295+
296+
- **Variables:**
297+
298+
- **`CXX`**: Specifies the compiler (`g++`).
299+
- **`CXXFLAGS`**: Compiler flags for C++17 standard, include paths, warnings, debugging, and dependency generation.
300+
- **`SRCDIR`, `INCDIR`, `OBJDIR`, `BINDIR`**: Define directories for source files, headers, object files, and binaries.
301+
302+
- **Targets:**
303+
304+
- **`all`**: Default target that builds the project.
305+
- **`directories`**: Ensures necessary directories exist.
306+
- **`$(TARGET)`**: Links all object files to create the executable.
307+
- **`$(OBJDIR)/%.o`**: Pattern rule to compile `.cpp` files into `.o` files.
308+
- **`run`**: Builds and runs the parser.
309+
- **`help`**: Displays available Makefile targets.
310+
- **`clean`**: Removes build artifacts.
311+
312+
- **Dependency Management:**
313+
- **`-MMD -MP`**: Flags for automatic dependency generation.
314+
- **`-include $(OBJS:.o=.d)`**: Includes generated dependency files to track header dependencies.
315+
316+
### **Usage Tips**
317+
318+
- **Parallel Builds:** Speed up compilation using:
319+
320+
```bash
321+
make -j4
322+
```
323+
324+
Replace `4` with the number of cores you wish to utilize.
325+
326+
- **Verbose Output:** Remove `@` symbols in the Makefile commands to see all build commands executed.
327+
328+
- **Adding New Source Files:** Place new `.cpp` and `.hpp` files in the `src/` and `include/` directories respectively. The Makefile automatically detects and compiles them.
File renamed without changes.

0 commit comments

Comments
 (0)