[NLP] No text_vectorization or embedding on the test/validation datasets? #267

david-ben-gurion · 2021-11-18T10:41:24Z

david-ben-gurion
Nov 18, 2021

So for NLP, we use adapt() to vectorize the training sentences and fit the Naive Bayes base model. However for prediction, we simply pass in the string data type validation sentences for prediction and evaluation. How does this work? Shouldn't both the train and test datasets be of the same format for the ML and DL models to predict and evaluate on?

An analogy would be in the case of Image classification models where the image that needs to be predicted should be in the correct format example the image shapes etc - same as the train dataset images.

How can the models evaluate and predict on literal string-based sentences?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NLP] No text_vectorization or embedding on the test/validation datasets? #267

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[NLP] No text_vectorization or embedding on the test/validation datasets? #267

Uh oh!

Uh oh!

david-ben-gurion Nov 18, 2021

Replies: 0 comments

david-ben-gurion
Nov 18, 2021