Replies: 3 comments 1 reply
-
Have you checked if you file is available at https://github.com/dkbs12/External_test/raw/main/test01.zip ? I can fetch it with wget from that location. i.e |
Beta Was this translation helpful? Give feedback.
-
Hey, @dkbs12! @vblagoje was only suggesting to change the URL. The following code works for me: from haystack.utils import fetch_archive_from_http
doc_dir = "data/test01"
url = "https://github.com/dkbs12/External_test/raw/main/test01.zip"
fetch_archive_from_http(url=url, output_dir=doc_dir) |
Beta Was this translation helpful? Give feedback.
-
Hi, thank you everyone! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm studying the tutorial "Preprocessing Your Documents" now.
I downloaded the three files('bert', 'classics', 'heavy_metal') from the tutorial and
uploaded a zip file made of above 3 files on my github.
Finally I found the BadZipFile Error when I ran as below;
%%bash
pip install --upgrade pip
pip install farm-haystack[colab,elasticsearch,inference,ocr,preprocessing,file-conversion,pdf]
from haystack.utils import fetch_archive_from_http
doc_dir = "data/test01"
url = "https://github.com/dkbs12/External_test/blob/main/test01.zip"
fetch_archive_from_http(url=url, output_dir=doc_dir)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 4>:4 │
│ │
│ /usr/local/lib/python3.10/dist-packages/haystack/utils/import_utils.py:82 in │
│ fetch_archive_from_http │
│ │
│ 79 │ │ request_data = requests.get(url, proxies=proxies, timeout=timeout) │
│ 80 │ │ │
│ 81 │ │ if archive_extension == "zip": │
│ ❱ 82 │ │ │ zip_archive = zipfile.ZipFile(io.BytesIO(request_data.content)) │
│ 83 │ │ │ zip_archive.extractall(output_dir) │
│ 84 │ │ elif archive_extension == "gz" and not "tar.gz" in url: │
│ 85 │ │ │ gzip_archive = gzip.GzipFile(fileobj=io.BytesIO(request_data.content)) │
│ │
│ /usr/lib/python3.10/zipfile.py:1269 in init │
│ │
│ 1266 │ │ │
│ 1267 │ │ try: │
│ 1268 │ │ │ if mode == 'r': │
│ ❱ 1269 │ │ │ │ self._RealGetContents() │
│ 1270 │ │ │ elif mode in ('w', 'x'): │
│ 1271 │ │ │ │ # set the modified flag so central directory gets written │
│ 1272 │ │ │ │ # even if no files are added to the archive │
│ │
│ /usr/lib/python3.10/zipfile.py:1336 in _RealGetContents │
│ │
│ 1333 │ │ except OSError: │
│ 1334 │ │ │ raise BadZipFile("File is not a zip file") │
│ 1335 │ │ if not endrec: │
│ ❱ 1336 │ │ │ raise BadZipFile("File is not a zip file") │
│ 1337 │ │ if self.debug > 1: │
│ 1338 │ │ │ print(endrec) │
│ 1339 │ │ size_cd = endrec[_ECD_SIZE] # bytes in central directory │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
BadZipFile: File is not a zip file
Can you help me with this problem?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions