What are the files that haystack downloads to cache directory? and how can I create a local package? #2066
-
Hi, When I am using a new model in the
I notice there are files downloaded to
I tried downloading the model locally from hugging face using these steps:
But I still need what is in the |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi @asharm0662 If you would like to fill the cache, I recommend that you have a look at how we create our docker files with the models already cached: https://github.com/deepset-ai/haystack/pull/1978/files However, when you run your cloud-deployed application for the first time (with internet connection) the cache will be filled automatically. Is there any particular reason why you want to make sure that the model is already in the cache even before using it for the first time? |
Beta Was this translation helpful? Give feedback.
Hi @asharm0662
the files downloaded to
~/.cache/hugging/transformers/
are language models that interpret queries and find an answer in a text document (for example) and they include the corresponding tokenizers that split any arbitrary input strings, e.g., queries, into sequences of tokens. When you runreader = TransformersReader(tokenizer="deepset/roberta-base-squad2", use_gpu=-1)
the tokenizer is loaded from https://huggingface.co/deepset/roberta-base-squad2These files are about 2 GB large, sometimes even larger. We cache these models inside the transformers library because it would take a long time to download them on-the-fly every time you want to run a query. The models won't chang…