Skip to content

Commit 9a7946f

Browse files
committed
Merge pull request '2.2.0' (#234) from develop into master
2 parents e56614d + 20cf9bb commit 9a7946f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+2940
-1230
lines changed

.github/workflows/tests.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
11
name: Tests
22

3-
on: [push]
3+
on:
4+
push:
5+
branches: [master, develop]
6+
pull_request:
7+
branches: [master, develop]
8+
types: [ opened, synchronize, reopened ]
49

510
jobs:
611
build:
7-
runs-on: ubuntu-22.04
12+
runs-on: ubuntu-latest
813
strategy:
914
max-parallel: 4
1015
matrix:

.pre-commit-config.yaml

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,3 @@ repos:
2828
hooks:
2929
- id: sphinx-lint
3030
stages: [pre-commit]
31-
- repo: local
32-
hooks:
33-
- id: pytest-check
34-
name: pytest-check
35-
entry: coverage run -m pytest
36-
stages: [pre-push]
37-
language: system
38-
pass_filenames: false
39-
always_run: true

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Changelog
22

3+
## 2.2.0
4+
5+
### Foundation foods:
6+
7+
* Bias foundation food matching to prefer "raw" FDC ingredients, but only if the ingredient name does not include any verbs that indicate the ingredient is not raw (e.g. "cooked").
8+
* Normalise spelling of tokens in ingredient names to align with spelling used in FDC ingredient descriptions.
9+
* Fix a bug where foundation foods were never calculated if `separate_names=False`.
10+
11+
### General
12+
13+
* Add logging to library, under the `ingredient-parser` namespace.
14+
15+
### Model
16+
17+
* Improve parser model performance with new features related to sentence structure, such as whether a token is part of an example phrase, a multi-ingredient phrase, or after the split in a compound sentence. See the [Feature Generation](https://ingredient-parser.readthedocs.io/en/latest/explanation/features.html) of the docs for more details.
18+
19+
### Processing
20+
21+
* Improve post processing of names to avoid returning multiple names if the name is split by a non-name token. For example, in the sentence "*8 fresh large basil leaves*", the name should be returned as "*fresh basil leaves*" and not as two separate names: "*fresh*", "*basil leaves*".
22+
323
## 2.1.1
424

525
* Pin Pint version to 0.24.4, as future versions intend to drop support for Python 3.10.

MANIFEST.in

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
include ingredient_parser/density_context.txt
2-
include ingredient_parser/en/model.en.crfsuite
3-
include ingredient_parser/en/ModelCard.en.md
4-
include ingredient_parser/en/ingredient_embeddings.25d.glove.txt.gz
5-
include ingredient_parser/en/fdc_ingredients.csv.gz
2+
include ingredient_parser/en/data/model.en.crfsuite
3+
include ingredient_parser/en/data/ModelCard.en.md
4+
include ingredient_parser/en/data/ingredient_embeddings.25d.glove.txt.gz
5+
include ingredient_parser/en/data/fdc_ingredients.csv.gz
66
global-exclude test*
77
prune */__pycache__

README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,13 +48,13 @@ The model has the following accuracy on a test data set of 20% of the total data
4848

4949
```
5050
Sentence-level results:
51-
Accuracy: 94.66%
51+
Accuracy: 94.94%
5252
5353
Word-level results:
54-
Accuracy 97.82%
55-
Precision (micro) 97.81%
56-
Recall (micro) 97.82%
57-
F1 score (micro) 97.81%
54+
Accuracy 97.90%
55+
Precision (micro) 97.88%
56+
Recall (micro) 97.90%
57+
F1 score (micro) 97.88%
5858
```
5959

6060
## Development
@@ -68,6 +68,8 @@ pre-commit install
6868

6969
to install the pre-commit hooks.
7070

71+
Please target the **develop** branch for pull requests. The main branch is used for stable releases and hotfixes only.
72+
7173
There is a simple web app for testing the parser with ingredient sentences and showing the parsed output. To run the web app, run the command
7274

7375
```bash

benchmark.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
#!/usr/bin/env python3
2+
import argparse
23
import time
34

45
from ingredient_parser import parse_ingredient
56

6-
ITERATIONS = 500
7-
87
if __name__ == "__main__":
98
sentences = [
109
("½ cup warm water (105°F)", "0.5 cup warm water (105°F)"),
@@ -44,12 +43,23 @@
4443
),
4544
]
4645

46+
parser = argparse.ArgumentParser(description="Ingredient Parser benchmark")
47+
parser.add_argument(
48+
"--iterations", "-i", type=int, help="Number of iterations to run.", default=500
49+
)
50+
parser.add_argument(
51+
"--foundationfoods", "-ff", action="store_true", help="Enable foundation foods."
52+
)
53+
args = parser.parse_args()
54+
4755
start = time.time()
48-
for i in range(ITERATIONS):
56+
for i in range(args.iterations):
4957
for sent, _ in sentences:
50-
parse_ingredient(sent, expect_name_in_output=True)
58+
parse_ingredient(
59+
sent, expect_name_in_output=True, foundation_foods=args.foundationfoods
60+
)
5161

52-
total_sentences = ITERATIONS * len(sentences)
62+
total_sentences = args.iterations * len(sentences)
5363
duration = time.time() - start
5464
print(f"Elapsed time: {duration:.2f} s")
5565
print(f"{1e6 * duration / total_sentences:.2f} us/sentence")

0 commit comments

Comments
 (0)