Skip to content

docs: add docs for path / virtual addressing #2669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,14 @@ storages:
endpoint_url: "http://local-minio:9000/"
access_endpoint_url: *minio_access_path

#access_addressing_style: 'path' or 'virtual'
# determine if bucket should be accessed as:
# - virtual - https://<bucket>.<host>/<key>
# - path - https://<host>/<bucket>/<key>
#
# if not specified, defaults to 'path' for local minio or
# 'virtual' for all other storages


# optional: duration in minutes for WACZ download links to be valid
# used by webhooks and replay
Expand Down
35 changes: 30 additions & 5 deletions frontend/docs/docs/deploy/customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,33 @@ Since the local Minio service is not used, `minio_local: false` can be set to sa

### Custom Access Endpoint URL

It may be useful to provide a custom access endpoint for accessing WACZ files and other data. if the `access_endpoint_url` is provided,
it should be in 'virtual host' form (the bucket is not added to the path, but is assumed to be the in the host).
It may be useful to provide a custom access endpoint for accessing WACZ files and other data. If the `access_endpoint_url` is provided, it can be in either the 'virtual host' or 'path' form, while the `endpoint_url` should always be in path-prefix form.

The host portion of the URL is then replaced with the `access_endpoint_url`. For example, given `endpoint_url: https://s3provider.example.com/bucket/path/` and `access_endpoint_url: https://my-custom-domain.example.com/path/`, a URL to a WACZ files in 'virtual host' form may be `https://bucket.s3provider.example.com/path/to/files/crawl.wacz?signature...`.
Here are two example of the addressing modes:

The `https://bucket.s3provider.example.com/path/` is then replaced with the `https://my-custom-domain.example.com/path/`, and the final URL becomes `https://my-custom-domain.example.com/path/to/files/crawl.wacz?signature...`.
#### Virtual Host vs Path Addressing for Access Endpoints

Virtual host addressing:
```
endpoint_url: https://s3provider.example.com/bucket/path/
access_endpoint_url: https://my-custom-domain.example.com/path/
access_addressing_style: virtual

# Files loaded from: https://my-custom-domain.example.com/path/to/files/crawl.wacz?signature...
```

Path addressing:
```
...
endpoint_url: https://s3provider.example.com/bucket/path/
access_endpoint_url: https://my-custom-domain.example.com/bucket/path/
access_addressing_style: path

# Files loaded from: https://my-custom-domain.example.com/bucket/path/to/files/crawl.wacz?signature...
```

Note that when using the local Minio for storage, path-style addressing is used automatically as the
data is accessed via `/data/path/to/files`. Otherwise, virtual-style addressing is assumed as the default.


### Storage Replicas
Expand All @@ -117,6 +138,8 @@ storages:

endpoint_url: "http://local-minio.default:9000/"
is_default_primary: true
# default for local minio is path
access_addressing_style: path

- name: "replica-0"
type: "s3"
Expand All @@ -126,6 +149,7 @@ storages:

endpoint_url: "http://local-minio.default:9000/"
is_default_replica: true
access_addressing_style: path

- name: "replica-1"
type: "s3"
Expand All @@ -134,7 +158,8 @@ storages:
bucket_name: "replica-1"

endpoint_url: "https://s3provider.example.com/bucket/path/"
access_endpoint_url: "https://my-custom-domain.example.com/path/"
access_endpoint_url: "https://bucket.my-custom-domain.example.com/path/"
access_addressing_style: virtual
```

When replica locations are set, the default behavior when a crawl, upload, or browser profile is deleted is that the replica files are deleted at the same time as the file in primary storage. To delay deletion of replicas, set `replica_deletion_delay_days` in the Helm chart to the number of days by which to delay replica file deletion. This feature gives Browsertrix administrators time in the event of files being deleted accidentally or maliciously to recover copies from configured replica locations.
Expand Down
Loading