Spanish ingredient-parser #14
-
|
Hi, i have pull the project to try to train a spanish ingredient parser to contribute to your amazing work. However I'm having some difficulties to train the model because I don't understand how to do it. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 7 replies
-
|
Hi @theripnono The command to train a new model is python train.py train --database train/data/training.sqlite3For this to work well you will need to do a few things:
def _postprocess_amounts(self) -> list[IngredientAmount]:
funcs = [
#self._sizable_unit_pattern, # Comment this out
#self._composite_amounts_pattern, # Comment this out
self._fallback_pattern,
]
amounts = []
for func in funcs:
idx = self._unconsumed(list(range(len(self.tokens))))
tokens = self._unconsumed(self.tokens)
labels = self._unconsumed(self.labels)
scores = self._unconsumed(self.scores)
parsed_amounts = func(idx, tokens, labels, scores)
amounts.extend(parsed_amounts)
return sorted(amounts, key=lambda x: x._starting_index)and see how well that performs. I hope this helps you get started. Feel free to ask more questions, and I hope you're able to make this work successfully. |
Beta Was this translation helpful? Give feedback.
-
|
Hi again, I'm hard working on the spanish dataset (around 20k rows) for training the model. I'm labelling the tokenized sentences and two question come to mind:
Thank you! For now I'm labbeling as it is shown in the picture below: |
Beta Was this translation helpful? Give feedback.
-
|
Hi again! I've good news! I have done a little util using GPT and langchain because I didn't want to spend much time creating the dataset. I hope to give you news ASAP :) |
Beta Was this translation helpful? Give feedback.
-
|
Hi again @strangetom, sorry to keep asking so many... I'm traying to train the model, but I don't know the steps I need to follow. I makes me doubts because when I see your csv formats: e.g: Do I need to create another "training.sqlite3" file with my data or should insert in the same db my data? what are the steps I should take? |
Beta Was this translation helpful? Give feedback.







Hi @theripnono
The command to train a new model is
For this to work well you will need to do a few things:
training, which has the following fields:id: a unique ID for each sentence (this isn't used for training)
source: where the sentence came from (this isn't used for training)
sentence: the ingredient sentence
tokens: the list of tokens from the sentence
labels: the list of labels for each token in the sentence
The hard part is making sure all the labels are correct and consistent.
Consider any modifications t…