Skip to content

Commit b9b964d

Browse files
author
bitdruid
committed
first commit
0 parents  commit b9b964d

File tree

11 files changed

+485
-0
lines changed

11 files changed

+485
-0
lines changed

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.venv/
2+
.test/
3+
pywaybackup/__pycache__/
4+
waybackup_snapshots/
5+
dist/
6+
pywaybackup.egg-info/
7+
build/
8+
```

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2023 bitdruid
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# archive wayback downloader
2+
3+
[![PyPI](https://img.shields.io/pypi/v/pywaybackup)](https://pypi.org/project/pywaybackup/)
4+
[![PyPI - Downloads](https://img.shields.io/pypi/dm/pywaybackup)](https://pypi.org/project/pywaybackup/)
5+
![Release](https://img.shields.io/badge/Release-alpha-red)
6+
![Python Version](https://img.shields.io/badge/Python-3.6-blue)
7+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8+
9+
Downloading archived web pages from the [Wayback Machine](https://archive.org/web/).
10+
11+
Internet-archive is a nice source for several OSINT-information. This script is a work in progress to query and fetch archived web pages.
12+
13+
## Installation
14+
15+
### Pip
16+
17+
1. Install the package <br>
18+
```pip install pywaybackup```
19+
2. Run the script <br>
20+
```waybackup -h```
21+
22+
### Manual
23+
24+
1. Clone the repository <br>
25+
```git clone https://github.com/bitdruid/waybackup.git```
26+
2. Install <br>
27+
```pip install .```
28+
- in a virtual env or use `--break-system-package`
29+
30+
## Usage
31+
32+
This script allows you to download content from the Wayback Machine (archive.org). You can use it to download either the latest version or all versions of web page snapshots within a specified range.
33+
34+
### Arguments
35+
36+
- `-h`, `--help`: Show the help message and exit.
37+
- `-v`, `--version`: Show the script's version.
38+
39+
#### Required Arguments
40+
41+
- `-u URL`, `--url URL`: The URL of the web page to download. This argument is required.
42+
43+
#### Mode Selection (Choose One)
44+
45+
- `-c`, `--current`: Download the latest version of each file snapshot. This option is mutually exclusive with `-f/--full`.
46+
- `-f`, `--full`: Download snapshots of all timestamps. This option is mutually exclusive with `-c/--current`.
47+
48+
#### Optional Arguments
49+
50+
- `-l`, `--list`: Only print the snapshots available within the specified range. Does not download the snapshots.
51+
- `-r RANGE`, `--range RANGE`: Specify the range in years for which to search and download snapshots.
52+
- `-o OUTPUT`, `--output OUTPUT`: The folder where downloaded files will be saved.
53+
54+
#### Additional
55+
56+
- `--retry [RETRY_FAILED]`: Retry failed downloads. You can specify the number of retry attempts as an integer. If no number is provided, the script will keep retrying indefinitely.
57+
- `--worker [AMOUNT]`: The number of worker to use for downloading (simultaneous downloads). Default is 1. Beware: Using too many worker will lead into refused connections from the Wayback Machine. Duration about 1.5 minutes.
58+
59+
### Examples
60+
61+
Download latest snapshot of all files:<br>
62+
`waybackup -u http://example.com -c`
63+
64+
Download latest snapshot of all files with retries:<br>
65+
`waybackup -u http://example.com -c --retry 3`
66+
67+
Download all snapshots sorted per timestamp with a specified range:<br>
68+
`waybackup -u http://example.com -f -r 5`
69+
70+
Download all snapshots sorted per timestamp with a specified range and save to a specified folder with 3 worker:<br>
71+
`waybackup -u http://example.com -f -r 5 -o /home/user/Downloads/snapshots --worker 3`
72+
73+
List available snapshots per timestamp without downloading:<br>
74+
`waybackup -u http://example.com -f -l`
75+
76+
## Contributing
77+
78+
I'm always happy for some feature requests to improve the usability of this script.
79+
Feel free to give suggestions and report issues. Project is still far from being perfect.

dev/pip_build.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!bin/bash
2+
3+
# path of the script
4+
SCRIPT_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
5+
TARGET_PATH="$SCRIPT_PATH/.."
6+
7+
# check if venv is activated
8+
if [ -z "$VIRTUAL_ENV" ]; then
9+
echo "Please activate your virtual environment"
10+
exit 1
11+
fi
12+
13+
# build
14+
python $TARGET_PATH/setup.py sdist bdist_wheel --verbose
15+
python -m twine upload dist/*
16+
#pip install -e $TARGET_PATH
17+
18+
# clean up
19+
rm -rf $TARGET_PATH/build $TARGET_PATH/dist # $TARGET_PATH/*.egg-info

dev/venv_create.sh

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!bin/bash
2+
3+
# path of the script
4+
SCRIPT_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
5+
TARGET_PATH="$SCRIPT_PATH/.."
6+
echo "Preparing virtual environment in $TARGET_PATH"
7+
# Create a virtual environment
8+
if [ ! -d "..$SCRIPT_PATH/.venv" ]; then
9+
python3 -m venv "$TARGET_PATH/.venv"
10+
fi
11+
12+
# update pip
13+
"$TARGET_PATH/.venv/bin/python" -m pip install --upgrade pip
14+
"$TARGET_PATH/.venv/bin/python" -m pip install twine wheel
15+
16+
# install requirements
17+
"$TARGET_PATH/.venv/bin/python" -m pip install -r "$TARGET_PATH/requirements.txt"

pywaybackup/__init__.py

Whitespace-only changes.

pywaybackup/__version__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
__version__ = "0.4.2"

0 commit comments

Comments
 (0)