Skip to content

Commit ec619bb

Browse files
committed
Merge pull request '2.1.0' (#211) from develop into master
2 parents 2f4fc96 + f9af29a commit ec619bb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+1880
-1173
lines changed

.pre-commit-config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ repos:
55
rev: v5.0.0
66
hooks:
77
- id: check-added-large-files
8+
args: ['--maxkb=5000']
89
stages: [pre-commit]
910
- id: check-ast
1011
stages: [pre-commit]
@@ -15,7 +16,7 @@ repos:
1516
- id: debug-statements
1617
stages: [pre-commit]
1718
- repo: https://github.com/astral-sh/ruff-pre-commit
18-
rev: v0.9.6
19+
rev: v0.11.6
1920
hooks:
2021
- id: ruff
2122
args: ["--fix"]

CHANGELOG.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,27 @@
11
# Changelog
22

3+
## 2.10
4+
5+
> [!WARNING]
6+
>
7+
> This version replaces the floret dependency with numpy.
8+
>
9+
> Numpy was already a dependency of floret, so if you are upgrading from v2.0.0 there should be little impact.
10+
11+
* This release overhauls the foundation foods functionality so that ingredient names are matched to entries in the [FoodData Central](https://fdc.nal.usda.gov/) (FDC) database.
12+
13+
* This update does not change the API. It adds additional fields to `FoundationFood` objects for FDC ID, category and data type. The `text` field now returns the description for the matching FDC entry.
14+
15+
* Beware that enabling this functionality causes the `parse_ingredient` function to be much slower than when disabled (default).
16+
17+
| | foundation_foods=False (default) | foundation_foods=True |
18+
| -------------------- | -------------------------------- | --------------------- |
19+
| Sentences per second | ~1500 | ~20 |
20+
21+
* This functionality works entirely offline.
22+
23+
* See the [foundation foods](https://ingredient-parser.readthedocs.io/en/latest/explanation/foundation.html) page of the docs for specifics.
24+
325
## 2.0.0
426

527
> [!Caution]

MANIFEST.in

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
include ingredient_parser/density_context.txt
22
include ingredient_parser/en/model.en.crfsuite
33
include ingredient_parser/en/ModelCard.en.md
4-
include ingredient_parser/en/ff_model.en.crfsuite
5-
include ingredient_parser/en/FF_ModelCard.en.md
6-
include ingredient_parser/en/embeddings.floret.bin
4+
include ingredient_parser/en/ingredient_embeddings.25d.glove.txt.gz
5+
include ingredient_parser/en/fdc_ingredients.csv.gz
76
global-exclude test*
87
prune */__pycache__

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,19 +42,19 @@ Refer to the documentation [here](https://ingredient-parser.readthedocs.io/en/la
4242

4343
## Model
4444

45-
The core of the library is a sequence labelling model that is used to label each token in the sentence with the part of the sentence it belongs to. A data set of 75,000 example sentences is used to train and evaluate the model. See the [Model Guide](https://ingredient-parser.readthedocs.io/en/latest/guide/index.html) in the documentation for mode details.
45+
The core of the library is a sequence labelling model that is used to label each token in the sentence with the part of the sentence it belongs to. A data set of 81,000 example sentences is used to train and evaluate the model. See the [Model Guide](https://ingredient-parser.readthedocs.io/en/latest/guide/index.html) in the documentation for mode details.
4646

4747
The model has the following accuracy on a test data set of 20% of the total data used:
4848

4949
```
5050
Sentence-level results:
51-
Accuracy: 94.72%
51+
Accuracy: 94.66%
5252
5353
Word-level results:
5454
Accuracy 97.82%
55-
Precision (micro) 97.80%
55+
Precision (micro) 97.81%
5656
Recall (micro) 97.82%
57-
F1 score (micro) 97.80%
57+
F1 score (micro) 97.81%
5858
```
5959

6060
## Development

benchmark.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,4 @@
5353
duration = time.time() - start
5454
print(f"Elapsed time: {duration:.2f} s")
5555
print(f"{1e6 * duration / total_sentences:.2f} us/sentence")
56-
print(f"{int(total_sentences / duration)} sentences/second")
56+
print(f"{total_sentences / duration:.2f} sentences/second")

0 commit comments

Comments
 (0)