Skip to content

Some downloaded files are gzip stream #259

@FoxAhead

Description

@FoxAhead

I used this great tool to download the site http://web.archive.org/web/20230713110210/http://users.tpg.com.au/jpwbeest/. At first glance everything went well, but then I found out that some downloaded files, regardless of extension, were saved as GZIP stream. Some were fine. The result was consistently repeated on repeated downloads. It was about 30 "corrupted" files out of total 245.

Examples of gzipped files (The first two bytes 1F 8B are gzip magic number, and the third 08 is deflate compression)

image
image

I would like to know what causes this to happen. Is it a bug or peculiarities of this site or the whole Wayback Machine? Is it possible to fix it?

So far I've solved this problem with a simple python script that scans the files in the directory, and if the file has signs of a gzip stream, decompresses it, or otherwise just copies it to the output folder.
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions