Toy header only regex matching library. Uses a backtrackng virtual machine internally. Still a WIP. Heavily inspired by Russ Cox articles on re2.
This library has no dependencies. You will need only CMake and a C++17 compatible compiler. To compile an example application, go to the repository folder, and run
mkdir build
cd build
cmake ../CMakeLists.txt
make
This will build an executable called regex_matcher
. The executable has three
functions. It can match from stdin a la grep
:
$ printf 'foobar\nfoo\nbar' | ./regex_matcher --match 'fo*'
foobar
foo
It can print the bytecode of a particular regex:
$./regex_matcher --bytecode 'foo*'
Split(3, 1)
Bitset(...)
Jump(-2)
Character(f)
Character(o)
Split(1, 3)
Character(o)
Jump(-2)
Match()
Or it can run some tests using the --tests
flag.
All the functions are exposed in interface.h
.
To match a regex to the entirety of a string, use the full_match
function:
bool does_match = full_match("hello( world)?!", "hello!");
bool does_not_match = full_match("hello", "hello world!");
assert(does_match);
assert(!does_not_match);
To match in the middle of a string, use partial_match
:
bool does_match = partial_match("hello", "hello world!");
assert(does_match);
Regular expressions can be compiled and reused using compile_full
and compile_partial
.
A compiled regex can be matched to a string with match
:
auto re = std::string {"hello"};
auto s = std::string {"hello world!"};
auto compiled = compile_partial(re, s);
assert(match(compiled, s) == partial_match(re, s));
The example application provides grep-like functionality:
$ printf 'hello\nworld'
hello
world
$ printf 'hello\nworld' | ./regex_matcher --match 'wa?'
world
- Support bracketed character classes
- Return match groups
- Thompson algorithm (rather than backtracking)
- Unicode support