Unable to import DPRContextEncoderTokenizer #2711
-
Hello guys! First of all, I would like to thank you for all the work you have put into this library. I am having a problem importing the tokenizer from the context encoder. I am using the following model: https://huggingface.co/IIC/dpr-spanish-question_encoder-allqa-base Every time I try to create the DensePassageRetriever, I get the Tokenizer class error: "Tokenizer class DPRContextEncoderTokenizer does not exist or is not currently imported." I would like to know if the problem is due to the model itself or to the library. I have tried downloading all the model files and uploading the model locally instead of from the model hub but I can't get it to work at all. The DPRQuestionEncoderTokenizer seems to work correctly. I am quite desperate because I would like it to work properly. Any idea of what's happening? Thanks for the help in advance |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hello! I've been reading the library for some time now and I have a question related with the parameter "infer_tokenizer_classes" when initializing the DPR. When I set it True, looks like the model can be loaded without problem, but I've seen that this parameter is set to False and I do not really understand what it does in general. In the code I see that when the parameter is False by default the tokenizer_default_classes takes as question and context tokenizers the AutoTokenizer class. In the other hand when it is True, this variable the question and context tokenizers are None and I would like to know from where they are imported if they are None. Thanks for your help! |
Beta Was this translation helpful? Give feedback.
-
Hi @Myko-10, thanks for bringing our attention to this issue. This was a bug and should be fixed once #2755 is merged. The following was happening: |
Beta Was this translation helpful? Give feedback.
Hi @Myko-10, thanks for bringing our attention to this issue. This was a bug and should be fixed once #2755 is merged.
The following was happening:
When initializing a
DensePassageRetriever
, we were usingAutoTokenizer
s in order to load the tokenizers for the query embedding model and the passage embedding model. In the config of the model you tried load, the tokenizer_clas is set explicitly toDPRContextEncoderTokenizer
. However, it seems that theAutoTokenizer
doesn't support loading this tokenizer, that's why you got the error message. It only worked when settinginfer_tokenizer_classes
toTrue
because with this setting, we didn't useAutoTokenizer
but try to infer the correct tokenize…