Skip to content

Downloading an entire website results is missing pages and content. #2

@Bryson14

Description

@Bryson14

I am trying to recover a website that was created for a non-profit organization with WordPress. It was hosted on a third-party site but the organization has lost its admin access and somehow broke the site. I'm trying to recover the site as it was in January of 2024 when the site was working. When trying to recover the website from archive.org, I ran the CLI utility, but it didn't download all the pages I was expecting.

I ran: wayback_machine_downloader http://sorensonlegacyfoundation.org --to 20240101. It downloads 250 files, but there are still lots of HTML pages that are missing. Like the entry file index.html is there, but /what-we-fund, how-to-apply, and other about 10 other pages are not there.

Looking over the raw text files in vs code, I confirmed that these pages are missing and not just nested away somewhere by searching for specific text unique to each page.

Is there something I'm missing or should I just download each page individually from archive.org?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions