Skip to content

grammars: x{min,max} repetition operator #6640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
0160469
grammars: x{min,max} repetition operator + tweak +/*/? to avoid dupli…
Apr 12, 2024
f2030e3
grammars: handle `x{n}` and fix `x{n,n}`
Apr 12, 2024
de0fd3f
grammars: document new repetition operators
Apr 12, 2024
9d9b5a3
grammars: nit
Apr 12, 2024
6b5518c
grammars: uniform use of int for min & max
Apr 12, 2024
0ceb69a
grammars: refactor parser test
Apr 12, 2024
8938a05
grammar: parsing tests w/ natural pretty print of updated expectations
Apr 12, 2024
0d7347f
grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_…
Apr 12, 2024
2e2df72
grammars: improve test pretty print again
Apr 12, 2024
ffe321d
grammars: pretty print rules and chars
Apr 12, 2024
a9351b8
grammars: fix copy rule skipping
Apr 12, 2024
9d8efa5
grammars: disallow `a{,}` (not allowed in regexps)
Apr 12, 2024
2d98ebf
Update common/grammar-parser.cpp
ochafik Apr 12, 2024
ec91342
grammars: fix copy rule skipping (again) & display of expectations
Apr 12, 2024
22faba6
grammars: more test cases
Apr 12, 2024
1fb7787
Merge remote-tracking branch 'origin/master' into grammar-reps
Apr 15, 2024
15585e0
grammars: update reps parsing to bring ? / * / + closer to before
Apr 19, 2024
93b754e
json: use new GBNF repetitions{m,n} syntax
Apr 19, 2024
2ecc2ae
grammars: update performance gotchas w/ repetition advice
Apr 20, 2024
a9a2983
Merge remote-tracking branch 'origin/master' into grammar-reps
Apr 21, 2024
d47f537
Update examples/json_schema_to_grammar.py
ochafik Apr 24, 2024
724f879
Update examples/server/public/json-schema-to-grammar.mjs
ochafik Apr 24, 2024
a61281f
grammars: comment on rule repetitions
Apr 24, 2024
d03c98e
grammars: ensure unambiguous number alternatives
Apr 24, 2024
21bac1e
grammar: nit typo switched error msgs
Apr 24, 2024
0c74ad3
grammar: nit numbering in comment
Apr 24, 2024
218f41f
json: update numeric rule to be unambiguous
Apr 24, 2024
2813835
Apply suggestions from code review
ochafik Apr 24, 2024
46fe648
Update examples/server/public/json-schema-to-grammar.mjs
ochafik Apr 24, 2024
eb7ccd8
json: fix integral-part
Apr 24, 2024
3c02508
Merge branch 'grammar-reps' of https://github.com/ochafik/llama.cpp i…
Apr 24, 2024
476c97d
Merge remote-tracking branch 'origin/master' into grammar-reps
Apr 30, 2024
990bf57
grammar: add repetition tests
Apr 30, 2024
d070aee
Merge remote-tracking branch 'origin/master' into grammar-reps
May 18, 2024
8266b7c
Merge remote-tracking branch 'origin/master' into grammar-reps
May 21, 2024
2b79d47
Merge remote-tracking branch 'origin/master' into grammar-reps
Jun 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 7 additions & 8 deletions common/grammar-parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ namespace grammar_parser {

uint32_t sub_rule_id = generate_symbol_id(state, rule_name);
std::vector<llama_grammar_element> sub_rule;
for (size_t i = 0; i < min_times; i++) {
for (int i = 0; i < min_times; i++) {
sub_rule.push_back({LLAMA_GRETYPE_RULE_REF, content_rule_id});
}
if (max_times < 0) {
Expand Down Expand Up @@ -294,16 +294,15 @@ namespace grammar_parser {
handle_repetitions(0, 1);
} else if (*pos == '{') {
pos = parse_space(pos + 1, is_nested);
size_t min_times = 0;
int max_times = -1;

if (is_digit_char(*pos)) {
const char * int_end = parse_int(pos);
min_times = std::stoul(std::string(pos, int_end - pos));
pos = parse_space(int_end, is_nested);
} else if (*pos != ',') {
if (!is_digit_char(*pos)) {
throw std::runtime_error(std::string("expecting an int or ',' at ") + pos);
}
const char * int_end = parse_int(pos);
int min_times = std::stoul(std::string(pos, int_end - pos));
pos = parse_space(int_end, is_nested);

int max_times = -1;

if (*pos == '}') {
max_times = min_times;
Expand Down
4 changes: 2 additions & 2 deletions grammars/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ Parentheses `()` can be used to group sequences, which allows for embedding alte
- `?` makes the preceding symbol or sequence optional (equivalent to `{0,1}`).
- `{m}` repeats the precedent symbol or sequence exactly `m` times
- `{m,}` repeats the precedent symbol or sequence at least `m` times
- `{m,n}` repeats the precedent symbol or sequence at betwen `m` and `n` times (included)
- `{,n}` repeats the precedent symbol or sequence at most `n` times (included)
- `{m,n}` repeats the precedent symbol or sequence at between `m` and `n` times (included)
- `{0,n}` repeats the precedent symbol or sequence at most `n` times (included)

## Comments and newlines

Expand Down
20 changes: 20 additions & 0 deletions tests/test-grammar-parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,12 @@ static void verify_parsing(const char *grammar_bytes, const std::vector<std::pai

fprintf(stderr, "Testing grammar:%s\n", grammar_bytes);

if (parsed_grammar.symbol_ids.size() != expected.size()) {
fprintf(stderr, "Code to update expectation (set TEST_GRAMMAR_PARSER_PRINT_ALL=1 to print all):\n");
print_all();
assert(parsed_grammar.symbol_ids.size() == expected.size());
}

for (auto it = parsed_grammar.symbol_ids.begin(); it != parsed_grammar.symbol_ids.end(); ++it)
{
std::string key = it->first;
Expand Down Expand Up @@ -118,6 +124,12 @@ static void verify_parsing(const char *grammar_bytes, const std::vector<std::pai
}
}

static void verify_failure(const char *grammar_bytes) {
fprintf(stderr, "Testing expected failure:%s\n", grammar_bytes);
auto result = grammar_parser::parse(grammar_bytes);
assert(result.rules.empty() && "should have failed");
}

int main()
{
verify_parsing(R"""(
Expand Down Expand Up @@ -289,6 +301,14 @@ int main()
{LLAMA_GRETYPE_END, 0},
});

verify_failure(R"""(
root ::= "a"{,}"
)""");

verify_failure(R"""(
root ::= "a"{,10}"
)""");

verify_parsing(R"""(
root ::= (expr "=" term "\n")+
expr ::= term ([-+*/] term)*
Expand Down