This repository contains my implementation for Project 3, a Regular Expression Engine, from my CMSC330: Organization of Programming Languages at UMD. This project delves into the fundamental algorithms behind regular expressions, Non-deterministic Finite Automata (NFAs), and Deterministic Finite Automata (DFAs).
The core objective was to implement key transformations and simulations that form the basis of a regular expression interpreter, demonstrating proficiency in functional programming and theoretical computer science concepts.
This project is structured into three interconnected parts, each focusing on a critical component of regular expression processing:
In this section, I implemented functions to simulate the behavior of NFAs. This involved understanding and translating formal definitions into OCaml code.
-
move nfa qs s
: Computes the set of states reachable from a given set of statesqs
by making a single transition on a symbols
(or an epsilon transition). -
e_closure nfa qs
: Determines the set of states reachable from a given set of statesqs
by following zero or more epsilon transitions. -
accept nfa s
: Checks if a given strings
is accepted by the NFAnfa
. This function leverages themove
ande_closure
functions to simulate the NFA's behavior on the input string.
This part focuses on the subset construction algorithm, a crucial process for converting an NFA into an equivalent DFA. The challenge here was to manage the state explosion that can occur during this conversion.
-
new_states nfa qs
: Identifies all new DFA states (sets of NFA states) reachable from a current DFA stateqs
by consuming a single input symbol. -
new_trans nfa qs
: Generates the transitions for a new DFA stateqs
based on the NFA's transitions and epsilon closures. -
new_finals nfa qs
: Determines if a new DFA stateqs
should be an accepting state based on whether it contains any of the NFA's final states. -
nfa_to_dfa nfa
: The main function that orchestrates the subset construction, taking an NFA and returning an equivalent DFA.
The final part involves building an NFA directly from a given regular expression. This demonstrates an understanding of how regular expression operators (concatenation, union, Kleene star, empty string, character) translate into NFA structures.
regexp_to_nfa regexp
: Converts aregexp_t
(OCaml datatype representing a regular expression) into an equivalentnfa_t
. This function required careful construction of NFAs for each regular expression operation, ensuring the resulting NFA accepts the same language.
-
Functional Programming in OCaml: All implementations strictly adhere to functional programming principles, avoiding imperative features like loops and mutable references (except for the provided
fresh
function for unique state generation). -
Finite Automata Theory: Deepened understanding of NFAs, DFAs, epsilon transitions, and the subset construction algorithm.
-
Recursive Data Types: Extensive use of OCaml's powerful pattern matching and recursive data types for representing regular expressions and NFAs.
-
Algorithm Efficiency: Particular attention was paid to the efficiency of
nfa_to_dfa
to avoid timeouts on large inputs, requiring careful design to minimize redundant computations. -
Parser Integration: While the parser (
string_to_regexp
) was provided, understanding its output and how to build NFAs from theregexp_t
datatype was essential.
-
src/nfa.ml
: Contains implementations for NFA operations (move
,e_closure
,accept
) and the NFA to DFA conversion (nfa_to_dfa
and its helpers). -
src/regexp.ml
: Contains theregexp_to_nfa
function. (Parser code forstring_to_regexp
was provided). -
src/sets.ml
: (Provided) Module for set operations, crucial for managing state sets in NFAs/DFAs. -
test/
: Contains testing infrastructure and student-written tests.
To run and test this OCaml project, ensure you have OCaml version 4.13.0 or newer installed. The project uses dune
as its build system.
-
Build the Project: Compile your code:
dune build
-
Run All Tests (Public and Student): Execute all available tests:
dune runtest -f
-
Run Specific Test File (e.g., public tests): To run tests from a particular file:
dune runtest -f test/public
(Replace
test/public
with the path to your desired test file.) -
Interactive Testing with Utop: For an interactive OCaml top-level environment with your project functions loaded:
dune utop src
In
utop
, all commands must end with;;
. Exitutop
by typing#quit;;
or pressingCtrl-D
/Cmd-D
.