You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -31,43 +30,41 @@ Internet-archive is a nice source for several OSINT-information. This script is
31
30
32
31
This script allows you to download content from the Wayback Machine (archive.org). You can use it to download either the latest version or all versions of web page snapshots within a specified range.
33
32
34
-
<!-- ## Info -->
35
-
36
33
### Arguments
37
34
38
35
-`-h`, `--help`: Show the help message and exit.
39
36
-`-a`, `--about`: Show information about the script and exit.
40
37
41
38
#### Required Arguments
42
39
43
-
-`-u URL`, `--url URL`: The URL of the web page to download. This argument is required.
40
+
-`-u`, `--url`: The URL of the web page to download. This argument is required.
44
41
45
42
#### Mode Selection (Choose One)
46
43
47
-
-`-c`, `--current`: Download the latest version of each file snapshot. You will get a rebuild of the current website with all available files.
44
+
-`-c`, `--current`: Download the latest version of each file snapshot. You will get a rebuild of the current website with all available files (but not any original state because new and old versions are mixed).
48
45
-`-f`, `--full`: Download snapshots of all timestamps. You will get a folder per timestamp with the files available at that time.
49
46
-`-s`, `--save`: Save a page to the Wayback Machine. (beta)
50
47
51
48
#### Optional Arguments
52
49
53
50
-`-l`, `--list`: Only print the snapshots available within the specified range. Does not download the snapshots.
54
51
-`-e`, `--explicit`: Only download the explicit given url. No wildcard subdomains or paths.
55
-
-`-o OUTPUT`, `--output OUTPUT`: The folder where downloaded files will be saved.
52
+
-`-o`, `--output`: The folder where downloaded files will be saved.
56
53
57
54
-**Range Selection:**<br>
58
55
Specify the range in years or a specific timestamp either start, end or both. If you specify the `range` argument, the `start` and `end` arguments will be ignored. Format for timestamps: YYYYMMDDhhmmss. You can only give a year or increase specificity by going through the timestamp starting on the left.<br>
-`-r RANGE`, `--range RANGE`: Specify the range in years for which to search and download snapshots.
57
+
-`-r`, `--range`: Specify the range in years for which to search and download snapshots.
61
58
-`--start`: Timestamp to start searching.
62
59
-`--end`: Timestamp to end searching.
63
60
64
61
#### Additional
65
62
66
-
-`--csv`: Save a csv file with the list of snapshots inside the output folder.
63
+
-`--csv`: Save a csv file with the list of snapshots inside the output folder or a specified folder. If you set `--list` the csv will contain the cdx list of snapshots. If you set either `--current` or `--full` the csv will contain the downloaded files.
67
64
-`--no-redirect`: Do not follow redirects of snapshots. Archive.org sometimes redirects to a different snapshot for several reasons. Downloading redirects may lead to timestamp-folders which contain some files with a different timestamp. This does not matter if you only want to download the latest version (`-c`).
68
-
-`--verbosity [LEVEL]`: Set the verbosity: json (print json response), progress (show progress bar) or standard (default).
69
-
-`--retry [RETRY_FAILED]`: Retry failed downloads. You can specify the number of retry attempts as an integer.
70
-
-`--worker [AMOUNT]`: The number of worker to use for downloading (simultaneous downloads). Default is 1. A safe spot is about 10 workers. Beware: Using too many worker will lead into refused connections from the Wayback Machine. Duration about 1.5 minutes.
65
+
-`--verbosity`: Set the verbosity: json (print json response), progress (show progress bar).
66
+
-`--retry`: Retry failed downloads. You can specify the number of retry attempts as an integer.
67
+
-`--workers`: The number of workers to use for downloading (simultaneous downloads). Default is 1. A safe spot is about 10 workers. Beware: Using too many workers will lead into refused connections from the Wayback Machine. Duration about 1.5 minutes.
71
68
72
69
### Examples
73
70
@@ -77,14 +74,20 @@ Download latest snapshot of all files:<br>
77
74
Download latest snapshot of all files with retries:<br>
78
75
`waybackup -u http://example.com -c --retry 3`
79
76
80
-
Download all snapshots sorted per timestamp with a specified range and follow redirects:<br>
0 commit comments