[NLP] No text_vectorization or embedding on the test/validation datasets? #267
Unanswered
david-ben-gurion
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So for NLP, we use adapt() to vectorize the training sentences and fit the Naive Bayes base model. However for prediction, we simply pass in the string data type validation sentences for prediction and evaluation. How does this work? Shouldn't both the train and test datasets be of the same format for the ML and DL models to predict and evaluate on?
An analogy would be in the case of Image classification models where the image that needs to be predicted should be in the correct format example the image shapes etc - same as the train dataset images.
How can the models evaluate and predict on literal string-based sentences?
Beta Was this translation helpful? Give feedback.
All reactions