Skip to content

Commit 0ba2984

Browse files
authored
Merge pull request #588 from machawk1/update-heritrix-3.7
Update WAIL to use Heritrix 3.7
2 parents e3be5d8 + f80a071 commit 0ba2984

File tree

680 files changed

+19150
-7378
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

680 files changed

+19150
-7378
lines changed

.codeclimate.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ exclude_paths:
77
- "build/*"
88
- "support/*"
99
- "WAIL.spec"
10-
- "bundledApps/heritrix-3.2.0/*"
10+
- "bundledApps/heritrix-3.4.0-20240909/*"
1111
- "bundledApps/html/*"
1212
- "bundledApps/tomcat/*"

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ WAIL.spec
2424
.idea
2525
bundledApps/tomcat/logs
2626
config/path-index.txt
27+
archiveIndexes/*
2728
bundledApps/tomcat/work
2829
jobs/
2930
!bundledApps/memgator*

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
Web Archiving Integration Layer (WAIL) is a graphical user interface (GUI) atop multiple web archiving tools intended to be used as an easy way for anyone to preserve and replay web pages.
77

8-
Tools included and accessible through the GUI are <a href="https://github.com/internetarchive/heritrix3">Heritrix 3.2.0</a> and <a href="https://github.com/iipc/openwayback">OpenWayback 2.4.0</a>. Support packages include Apache Tomcat, <a href="https://github.com/pyinstaller/pyinstaller/">pyinstaller</a>, and <a href="https://github.com/oduwsdl/memgator">MemGator</a>.
8+
Tools included and accessible through the GUI are <a href="https://github.com/internetarchive/heritrix3">Heritrix 3.4.0-20240909</a> and <a href="https://github.com/iipc/openwayback">OpenWayback 2.4.0</a>. Support packages include Apache Tomcat, <a href="https://github.com/pyinstaller/pyinstaller/">pyinstaller</a>, and <a href="https://github.com/oduwsdl/memgator">MemGator</a>.
99

1010
WAIL is written in Python and compiled to a native executable using <a href="http://www.pyinstaller.org/">PyInstaller</a>.
1111

build/Info.plist

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<plist version="1.0">
44
<dict>
55
<key>CFBundleShortVersionString</key>
6-
<string>2020.03.20</string>
6+
<string>2024.10.03</string>
77
<key>NSHumanReadableCopyright</key>
88
<string>Copyright © Mat Kelly - Web Archiving Integration Layer (WAIL)</string>
99
<key>CFBundleExecutable</key>

bundledApps/HeritrixJob.py

Lines changed: 180 additions & 143 deletions
Large diffs are not rendered by default.
Binary file not shown.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
legal/java.base/LICENSE
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Thank you for using the Oracle JDK.
2+
The license for this software can be found in the LICENSE file.
3+
4+
Information on installing, configuring, and running this program is available on https://java.com/readme
5+
6+
Documentation on the Java SE Platform can be found on https://docs.oracle.com/java
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
############################################################
2+
# Default Logging Configuration File
3+
#
4+
# You can use a different file by specifying a filename
5+
# with the java.util.logging.config.file system property.
6+
# For example, java -Djava.util.logging.config.file=myfile
7+
############################################################
8+
9+
############################################################
10+
# Global properties
11+
############################################################
12+
13+
# "handlers" specifies a comma-separated list of log Handler
14+
# classes. These handlers will be installed during VM startup.
15+
# Note that these classes must be on the system classpath.
16+
# By default we only configure a ConsoleHandler, which will only
17+
# show messages at the INFO and above levels.
18+
handlers= java.util.logging.ConsoleHandler
19+
20+
# To also add the FileHandler, use the following line instead.
21+
#handlers= java.util.logging.FileHandler, java.util.logging.ConsoleHandler
22+
23+
# Default global logging level.
24+
# This specifies which kinds of events are logged across
25+
# all loggers. For any given facility this global level
26+
# can be overridden by a facility-specific level
27+
# Note that the ConsoleHandler also has a separate level
28+
# setting to limit messages printed to the console.
29+
.level= INFO
30+
31+
############################################################
32+
# Handler specific properties.
33+
# Describes specific configuration info for Handlers.
34+
############################################################
35+
36+
# default file output is in user's home directory.
37+
java.util.logging.FileHandler.pattern = %h/java%u.log
38+
java.util.logging.FileHandler.limit = 50000
39+
java.util.logging.FileHandler.count = 1
40+
# Default number of locks FileHandler can obtain synchronously.
41+
# This specifies maximum number of attempts to obtain lock file by FileHandler
42+
# implemented by incrementing the unique field %u as per FileHandler API documentation.
43+
java.util.logging.FileHandler.maxLocks = 100
44+
java.util.logging.FileHandler.formatter = java.util.logging.XMLFormatter
45+
46+
# Limit the messages that are printed on the console to INFO and above.
47+
java.util.logging.ConsoleHandler.level = INFO
48+
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
49+
50+
# Example to customize the SimpleFormatter output format
51+
# to print one-line log message like this:
52+
# <level>: <log message> [<date/time>]
53+
#
54+
# java.util.logging.SimpleFormatter.format=%4$s: %5$s [%1$tc]%n
55+
56+
############################################################
57+
# Facility-specific properties.
58+
# Provides extra control for each logger.
59+
############################################################
60+
61+
# For example, set the com.xyz.foo logger to only log SEVERE
62+
# messages:
63+
# com.xyz.foo.level = SEVERE
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
######################################################################
2+
# Default Access Control File for Remote JMX(TM) Monitoring
3+
######################################################################
4+
#
5+
# Access control file for Remote JMX API access to monitoring.
6+
# This file defines the allowed access for different roles. The
7+
# password file (jmxremote.password by default) defines the roles and their
8+
# passwords. To be functional, a role must have an entry in
9+
# both the password and the access files.
10+
#
11+
# The default location of this file is $JRE/conf/management/jmxremote.access
12+
# You can specify an alternate location by specifying a property in
13+
# the management config file $JRE/conf/management/management.properties
14+
# (See that file for details)
15+
#
16+
# The file format for password and access files is syntactically the same
17+
# as the Properties file format. The syntax is described in the Javadoc
18+
# for java.util.Properties.load.
19+
# A typical access file has multiple lines, where each line is blank,
20+
# a comment (like this one), or an access control entry.
21+
#
22+
# An access control entry consists of a role name, and an
23+
# associated access level. The role name is any string that does not
24+
# itself contain spaces or tabs. It corresponds to an entry in the
25+
# password file (jmxremote.password). The access level is one of the
26+
# following:
27+
# "readonly" grants access to read attributes of MBeans.
28+
# For monitoring, this means that a remote client in this
29+
# role can read measurements but cannot perform any action
30+
# that changes the environment of the running program.
31+
# "readwrite" grants access to read and write attributes of MBeans,
32+
# to invoke operations on them, and optionally
33+
# to create or remove them. This access should be granted
34+
# only to trusted clients, since they can potentially
35+
# interfere with the smooth operation of a running program.
36+
#
37+
# The "readwrite" access level can optionally be followed by the "create" and/or
38+
# "unregister" keywords. The "unregister" keyword grants access to unregister
39+
# (delete) MBeans. The "create" keyword grants access to create MBeans of a
40+
# particular class or of any class matching a particular pattern. Access
41+
# should only be granted to create MBeans of known and trusted classes.
42+
#
43+
# For example, the following entry would grant readwrite access
44+
# to "controlRole", as well as access to create MBeans of the class
45+
# javax.management.monitor.CounterMonitor and to unregister any MBean:
46+
# controlRole readwrite \
47+
# create javax.management.monitor.CounterMonitorMBean \
48+
# unregister
49+
# or equivalently:
50+
# controlRole readwrite unregister create javax.management.monitor.CounterMBean
51+
#
52+
# The following entry would grant readwrite access as well as access to create
53+
# MBeans of any class in the packages javax.management.monitor and
54+
# javax.management.timer:
55+
# controlRole readwrite \
56+
# create javax.management.monitor.*,javax.management.timer.* \
57+
# unregister
58+
#
59+
# The \ character is defined in the Properties file syntax to allow continuation
60+
# lines as shown here. A * in a class pattern matches a sequence of characters
61+
# other than dot (.), so javax.management.monitor.* matches
62+
# javax.management.monitor.CounterMonitor but not
63+
# javax.management.monitor.foo.Bar.
64+
#
65+
# A given role should have at most one entry in this file. If a role
66+
# has no entry, it has no access.
67+
# If multiple entries are found for the same role name, then the last
68+
# access entry is used.
69+
#
70+
#
71+
# Default access control entries:
72+
# o The "monitorRole" role has readonly access.
73+
# o The "controlRole" role has readwrite access and can create the standard
74+
# Timer and Monitor MBeans defined by the JMX API.
75+
76+
monitorRole readonly
77+
controlRole readwrite \
78+
create javax.management.monitor.*,javax.management.timer.* \
79+
unregister
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# ----------------------------------------------------------------------
2+
# Template for jmxremote.password
3+
#
4+
# o Copy this template to jmxremote.password
5+
# o Set the user/password entries in jmxremote.password
6+
# o Change the permission of jmxremote.password to be accessible
7+
# only by the owner.
8+
# o The jmxremote.passwords file will be re-written by the server
9+
# to replace all plain text passwords with hashed passwords when
10+
# the file is read by the server.
11+
#
12+
13+
##############################################################
14+
# Password File for Remote JMX Monitoring
15+
##############################################################
16+
#
17+
# Password file for Remote JMX API access to monitoring. This
18+
# file defines the different roles and their passwords. The access
19+
# control file (jmxremote.access by default) defines the allowed
20+
# access for each role. To be functional, a role must have an entry
21+
# in both the password and the access files.
22+
#
23+
# Default location of this file is $JRE/conf/management/jmxremote.password
24+
# You can specify an alternate location by specifying a property in
25+
# the management config file $JRE/conf/management/management.properties
26+
# or by specifying a system property (See that file for details).
27+
28+
##############################################################
29+
# File format of the jmxremote.password file
30+
##############################################################
31+
#
32+
# The file contains multiple lines where each line is blank,
33+
# a comment (like this one), or a password entry.
34+
#
35+
# password entry follows the below syntax
36+
# role_name W [clearPassword|hashedPassword]
37+
#
38+
# role_name is any string that does not itself contain spaces or tabs.
39+
# W = spaces or tabs
40+
#
41+
# Passwords can be specified via clear text or via a hash. Clear text password
42+
# is any string that does not contain spaces or tabs. Hashed passwords must
43+
# follow the below format.
44+
# hashedPassword = base64_encoded_64_byte_salt W base64_encoded_hash W hash_algorithm
45+
# where,
46+
# base64_encoded_64_byte_salt = 64 byte random salt
47+
# base64_encoded_hash = Hash_algorithm(password + salt)
48+
# W = spaces or tabs
49+
# hash_algorithm = Algorithm string specified using the format below
50+
# https://docs.oracle.com/javase/9/docs/specs/security/standard-names.html#messagedigest-algorithms
51+
# This is an optional field. If not specified, SHA3-512 will be assumed.
52+
#
53+
# If passwords are in clear, they will be overwritten by their hash if all of
54+
# the below criteria are met.
55+
# * com.sun.management.jmxremote.password.toHashes property is set to true in
56+
# management.properties file
57+
# * the password file is writable
58+
# * the system security policy allows writing into the password file, if a
59+
# security manager is configured
60+
#
61+
# In order to change the password for a role, replace the hashed password entry
62+
# with a new clear text password or a new hashed password. If the new password
63+
# is in clear, it will be replaced with its hash when a new login attempt is made.
64+
#
65+
# A given role should have at most one entry in this file. If a role
66+
# has no entry, it has no access.
67+
# If multiple entries are found for the same role name, then the last one
68+
# is used.
69+
#
70+
# A user generated hashed password file can also be used instead of clear-text
71+
# password file. If generated by the user, hashed passwords must follow the
72+
# format specified above.
73+
#
74+
# Caution: It is recommended not to edit the password file while the
75+
# agent is running, as edits could be lost if a client connection triggers the
76+
# hashing of the password file at the same time that the file is externally modified.
77+
# The integrity of the file is guaranteed, but any external edits made to the
78+
# file during the short period between the time that the agent reads the file
79+
# and the time that it writes it back might get lost
80+
81+
##############################################################
82+
# File permissions of the jmxremote.password file
83+
##############################################################
84+
# This file must be made accessible by ONLY the owner,
85+
# otherwise the program will exit with an error.
86+
#
87+
# In a typical installation, this file can be accessed by anybody on the
88+
# local machine, and possibly by people on other machines.
89+
# For security, you should either restrict the access to this file except for owner,
90+
# or specify another, less accessible file in the management config file
91+
# as described above.
92+
#
93+
# In order to prevent inadverent edits to the password file in the
94+
# production environment, it is recommended to deploy a read-only
95+
# hashed password file. The hashed entries for clear passwords can be generated
96+
# in advance by running the JMX agent.
97+
#
98+
99+
##############################################################
100+
# Sample of the jmxremote.password file
101+
##############################################################
102+
# Following are two commented-out entries. The "monitorRole" role has
103+
# password "QED". The "controlRole" role has password "R&D". This is an example
104+
# of specifying passwords in the clear
105+
#
106+
# monitorRole QED
107+
# controlRole R&D
108+
#
109+
# Once a login attempt is made, passwords will be hashed and the file will have
110+
# below entries with clear passwords overwritten by their respective
111+
# SHA3-512 hash
112+
#
113+
# monitorRole trilby APzBTt34rV2l+OMbuvbnOQ4si8UZmfRCVbIY1+fAofV5CkQzXS/FDMGteQQk/R3q1wtt104qImzJEA7gCwl6dw== 4EeTdSJ7X6Imu0Mb+dWqIns7a7QPIBoM3NB/XlpMQSPSicE7PnlALVWn2pBY3Q3pGDHyAb32Hd8GUToQbUhAjA== SHA3-512
114+
# controlRole roHEJSbRqSSTII4Z4+NOCV2OJaZVQ/dw153Fy2u4ILDP9XiZ426GwzCzc3RtpoqNMwqYIcfdd74xWXSMrWtGaA== w9qDsekgKn0WOVJycDyU0kLBa081zbStcCjUAVEqlfon5Sgx7XHtaodbmzpLegA1jT7Ag36T0zHaEWRHJe2fdA== SHA3-512
115+
#

0 commit comments

Comments
 (0)