Skip to content

Commit ac92cf4

Browse files
committed
Merge branch 'develop' into noun-phrase-features
2 parents 5426012 + 589cc98 commit ac92cf4

25 files changed

+458
-66
lines changed

CHANGELOG.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
# Changelog
22

3-
## 2.10
3+
## develop
4+
5+
* Foundation food improvements:
6+
* Bias foundation food matching to prefer "raw" FDC ingredients, but only if the ingredient name does not include any verbs that indicate the ingredient is not raw (e.g. "cooked").
7+
* Normalise spelling of tokens in ingredient names to align with spelling used in FDC ingredient descriptions.
8+
9+
## 2.1.1
10+
11+
* Pin Pint version to 0.24.4, as future versions intend to drop support for Python 3.10.
12+
13+
## 2.1.0
414

515
> [!WARNING]
616
>

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,13 +48,13 @@ The model has the following accuracy on a test data set of 20% of the total data
4848

4949
```
5050
Sentence-level results:
51-
Accuracy: 94.66%
51+
Accuracy: 94.65%
5252
5353
Word-level results:
5454
Accuracy 97.82%
55-
Precision (micro) 97.81%
55+
Precision (micro) 97.80%
5656
Recall (micro) 97.82%
57-
F1 score (micro) 97.81%
57+
F1 score (micro) 97.80%
5858
```
5959

6060
## Development

benchmark.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
#!/usr/bin/env python3
2+
import argparse
23
import time
34

45
from ingredient_parser import parse_ingredient
56

6-
ITERATIONS = 500
7-
87
if __name__ == "__main__":
98
sentences = [
109
("½ cup warm water (105°F)", "0.5 cup warm water (105°F)"),
@@ -44,12 +43,23 @@
4443
),
4544
]
4645

46+
parser = argparse.ArgumentParser(description="Ingredient Parser benchmark")
47+
parser.add_argument(
48+
"--iterations", "-i", type=int, help="Number of iterations to run.", default=500
49+
)
50+
parser.add_argument(
51+
"--foundationfoods", "-ff", action="store_true", help="Enable foundation foods."
52+
)
53+
args = parser.parse_args()
54+
4755
start = time.time()
48-
for i in range(ITERATIONS):
56+
for i in range(args.iterations):
4957
for sent, _ in sentences:
50-
parse_ingredient(sent, expect_name_in_output=True)
58+
parse_ingredient(
59+
sent, expect_name_in_output=True, foundation_foods=args.foundationfoods
60+
)
5161

52-
total_sentences = ITERATIONS * len(sentences)
62+
total_sentences = args.iterations * len(sentences)
5363
duration = time.time() - start
5464
print(f"Elapsed time: {duration:.2f} s")
5565
print(f"{1e6 * duration / total_sentences:.2f} us/sentence")

docs/source/how-to/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@ How to guides
55
:maxdepth: 1
66

77
Convert units <convert-units>
8+
Logging <logging>
89
Extend to other languages <extending>

docs/source/how-to/logging.rst

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
Logging
2+
=======
3+
4+
Python’s standard `logging <https://docs.python.org/3/library/logging.html>`_ module is used to implement debug log output for ingredient-parser.
5+
This allows ingredient-parser's logging to integrate in a standard way with other application and libraries.
6+
7+
All logging for ingredient-parser is within ``ingredient-parser`` namespace.
8+
9+
* The ``ingredient-parser`` namespace contains general logging for parsing of ingredient sentences.
10+
* The ``ingredient-parser.foundation-foods`` namespace contains logging related to the :doc:`Foundation Foods </explanation/foundation>` functionality.
11+
12+
For example, to output debug logs to stdout:
13+
14+
.. code:: python
15+
16+
>>> import logging, sys
17+
>>> from ingredient_parser import parse_ingredient
18+
>>>
19+
>>> logging.basicConfig(stream=sys.stdout)
20+
>>> logging.getLogger("ingredient-parser").setLevel(logging.DEBUG)
21+
>>>
22+
>>> parsed = parse_ingredient("24 fresh basil leaves or dried basil")
23+
DEBUG:ingredient-parser:Parsing sentence "24 fresh basil leaves or dried basil" using "en" parser.
24+
DEBUG:ingredient-parser:Normalised sentence: "24 fresh basil leaves or dried basil".
25+
DEBUG:ingredient-parser:Tokenized sentence: ['24', 'fresh', 'basil', 'leaf', 'or', 'dried', 'basil'].
26+
DEBUG:ingredient-parser:Singularised tokens at indices: [3].
27+
DEBUG:ingredient-parser:Generating features for tokens.
28+
DEBUG:ingredient-parser:Sentence token labels: ['QTY', 'B_NAME_TOK', 'I_NAME_TOK', 'I_NAME_TOK', 'NAME_SEP', 'B_NAME_TOK', 'I_NAME_TOK'].
29+
30+
Only enabling logging for foundation foods:
31+
32+
.. code:: python
33+
34+
>>> import logging, sys
35+
>>> from ingredient_parser import parse_ingredient
36+
>>>
37+
>>> logging.basicConfig(stream=sys.stdout)
38+
>>> logging.getLogger("ingredient-parser.foundation-foods").setLevel(logging.DEBUG)
39+
>>>
40+
>>> parsed = parse_ingredient("24 fresh basil leaves or dried basil", foundation_foods=True)
41+
DEBUG:ingredient-parser.foundation-foods:Matching FDC ingredient for ingredient name tokens: ['fresh', 'basil', 'leaves']
42+
DEBUG:ingredient-parser.foundation-foods:Prepared tokens: ['fresh', 'basil', 'leav'].
43+
DEBUG:ingredient-parser.foundation-foods:Loaded 13318 FDC ingredients.
44+
DEBUG:ingredient-parser.foundation-foods:Selecting best match from 1 candidates based on preferred FDC datatype.
45+
DEBUG:ingredient-parser.foundation-foods:Matching FDC ingredient for ingredient name tokens: ['dried', 'basil']
46+
DEBUG:ingredient-parser.foundation-foods:Prepared tokens: ['dri', 'basil'].
47+
DEBUG:ingredient-parser.foundation-foods:Selecting best match from 1 candidates based on preferred FDC datatype.

ingredient_parser/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@
99
"show_model_card",
1010
]
1111

12-
__version__ = "2.1.0"
12+
__version__ = "2.1.1"

ingredient_parser/_common.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
#!/usr/bin/env python3
22

33
import collections
4+
import logging
45
import os
56
import platform
67
import re
@@ -20,6 +21,10 @@
2021

2122
SUPPORTED_LANGUAGES = ["en"]
2223

24+
# Logging
25+
logger = logging.getLogger("ingredient-parser")
26+
logger.addHandler(logging.NullHandler())
27+
2328
# Regex pattern for matching a numeric range e.g. 1-2, 2-3, #1$2-1#3$4.
2429
RANGE_PATTERN = re.compile(r"^[\d\#\$]+\s*[\-][\d\#\$]+$")
2530

ingredient_parser/en/ModelCard.en.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
### Model Date and Version
1010

11-
Date: April 2025
11+
Date: May 2025
1212

1313
Version: The model version is the same has the `ingredient_parser_nlp` package version.
1414

@@ -124,7 +124,7 @@ The model has the following performance metrics:
124124

125125
| Word level accuracy | Sentence level accuracy |
126126
| ------------------- | ----------------------- |
127-
| 97.82 ± 0.18% | 94.62 ± 0.44% |
127+
| 97.82 ± 0.18% | 94.65 ± 0.44% |
128128

129129
These metrics were determined by executing 20 training/evaluation cycles and calculating the mean and standard deviation for the two metrics across all cycles. The uncertainty values provided represent the 99.7% confidence bounds (i.e. 3x standard deviation). The uncertainty is due to the randomisation of the selection of training and evaluation data whenever the model is trained.
130130

0 commit comments

Comments
 (0)