Skip to content

Commit 557cd5e

Browse files
committed
Docs: Add limitations section to foundation food docs
1 parent 64f111a commit 557cd5e

File tree

1 file changed

+13
-8
lines changed

1 file changed

+13
-8
lines changed

docs/source/explanation/foundation.rst

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ These vectors are used to compute a fuzzy distance score between the ingredient
7171

7272
The full process is as follows:
7373

74-
#. Load the :abbr:`FDC (Food Data Central)` data. Tokenize the description for each entry and remove tokens that don't provide useful semantic information*.
74+
#. Load the :abbr:`FDC (Food Data Central)` data. Tokenize the description for each entry and remove tokens that don't provide useful semantic information\*.
7575

7676
#. Prepare the ingredient name tokens in the same way.
7777

@@ -82,26 +82,31 @@ The full process is as follows:
8282

8383
#. Compute the fuzzy distance score between each :abbr:`FDC (Food Data Central)` entry and the ingredient name tokens.
8484

85-
#. Sort the :abbr:`FDC (Food Data Central)` by their score.
85+
#. Sort the :abbr:`FDC (Food Data Central)` entries by the fuzzy distance score.
8686

87-
#. If there is a match with a score below the threshold, return the best match.
87+
#. If the lowest (best) score is below the threshold, return the :class:`FoundationFood <ingredient_parser.dataclasses.FoundationFood>` object for the corresponding :abbr:`FDC (Food Data Central)` entry.
8888

89-
#. If there are not any matches with a good enough score, store the best match for fallback matching.
89+
#. If best score is not below the threshold, store the best entry and it's score for fallback matching.
9090

9191
#. If none of the :abbr:`FDC (Food Data Central)` datasets contained a good enough match, attempt fallback matching.
9292

93-
#. Sort the best match from each :abbr:`FDC (Food Data Central)` data set.
93+
#. Sort the best matches from each :abbr:`FDC (Food Data Central)` data set.
9494

9595
#. If the score for the best of these matches is below a threshold, return this match.
9696

9797
#. If no match is good enough, return ``None``.
9898

9999
.. note::
100100

101-
Tokens that do not provide useful semantic information are as follows: numbers, white space, punctuation, stop words, single character words.
101+
\*Tokens that do not provide useful semantic information are as follows: numbers, white space, punctuation, stop words, single character words.
102102

103+
Limitations
104+
^^^^^^^^^^^
103105

106+
The current implementation has a some limitations.
104107

108+
#. The fuzzy distance scoring will sometimes result in returning an :abbr:`FDC (Food Data Central)` entry that has a good score but is not a good match.
109+
Work is ongoing to improve this, and suggestions and contributions are welcome.
105110

106-
Limitations
107-
^^^^^^^^^^^
111+
#. This functionality can be very slow.
112+
The more datasets that need to be checked to find a good match, the slower it will be.

0 commit comments

Comments
 (0)