GitHub - scienceapps/Psion-the-lost-archive: A Python-based toolset for extracting URLs from legacy .txt and .zip files, and downloading their archived versions from the Wayback Machine, for recovering Psion Epoc / Sibo software from old CD-ROMs

A Python-based toolset for extracting URLs from legacy .txt and .zip files, and downloading their archived versions from the Wayback Machine, for recovering Psion Epoc / Sibo software from old CD-ROMs. This project was initiated to expand the Psion Software Index : https://github.com/scienceapps/Psion-the-lost-archive

Features :

Scans directories for .txt files and .zip archives containing text.
Extracts all valid URLs using regular expressions.
Downloads archived versions of URLs from the Wayback Machine (1997–2005).
Cleans and formats URLs for compatibility with wayback_machine_downloader.
Organizes downloads by domain and timestamp.

Extract URLs from legacy files

python 01-extract_urls.py This will scan the specified directory, search for .txt and .txt inside .zip and output a list of URLs to a .txt file.

palmtops_url.txt sample file is generated by placing all the Team Palmtops magazine CD-ROMs unpacked ISO files from issues 01 to 36 in the input directory. https://archive.org/search?query=team+palmtops

Download archived content

python 02-dlfromwayback.py This reads the list of URLs and downloads their archived versions from the Wayback Machine.

Here's a sample output with previous Team Palmtops Magazine : https://archive.org/details/backups_202508

Configuration

You can modify the following paths and parameters directly in the scripts:

input_directory: Folder to scan for .txt and .zip files.
output_urls_file: Destination file for extracted URLs.
url_file_list: Input file for archived downloads.
--from / --to: Time range for Wayback Machine snapshots.
--only: Regex filter for specific file types (e.g., .zip).

Requirements

Python 3.7+
wayback_machine_downloader Ruby gem gem install wayback_machine_downloader

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
01-scanurl.py		01-scanurl.py
02-dlfromwayback.py		02-dlfromwayback.py
README.md		README.md
palmtops_url.txt		palmtops_url.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features :

Extract URLs from legacy files

Download archived content

Configuration

Requirements

About

Uh oh!

Languages

scienceapps/Psion-the-lost-archive

Folders and files

Latest commit

History

Repository files navigation

Features :

Extract URLs from legacy files

Download archived content

Configuration

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages