Skip to content

helm: add crawler_network_policy_additional_egress #2641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions chart/templates/networkpolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ spec:
policyTypes:
- Egress
egress:
{{- if .Values.crawler_network_policy_additional_egress | default false -}}
{{- .Values.crawler_network_policy_additional_egress | toYaml | nindent 4 -}}
{{- end -}}
{{- if .Values.crawler_network_policy_egress | default false -}}
{{- .Values.crawler_network_policy_egress | toYaml | nindent 4 -}}
{{- else }}
Expand Down
9 changes: 6 additions & 3 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -373,12 +373,15 @@ btrix-proxies:
# crawler_fsgroup: 201407


# optional: enable/disable crawler network policy
# optional: enable/disable crawler network policy, prevents crawler pods from accessing internal services
crawler_enable_network_policy: true

# optional: replace the default crawler egress policy with your own
# optional: add additional egress rules to the default crawler network policy (See chart/templates/networkpolicies.yaml for an example)
# crawler_network_policy_additional_egress: []

# optional: replace the default crawler egress policy with your own egress rules (See chart/templates/networkpolicies.yaml for an example)
# see chart/templates/networkpolicies.yaml for an example
# crawler_network_policy_egress: {}
# crawler_network_policy_egress: []

# time to wait for graceful stop
grace_period: 1000
Expand Down
35 changes: 35 additions & 0 deletions frontend/docs/docs/deploy/customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,8 @@ storages:

When replica locations are set, the default behavior when a crawl, upload, or browser profile is deleted is that the replica files are deleted at the same time as the file in primary storage. To delay deletion of replicas, set `replica_deletion_delay_days` in the Helm chart to the number of days by which to delay replica file deletion. This feature gives Browsertrix administrators time in the event of files being deleted accidentally or maliciously to recover copies from configured replica locations.

??? info "If you are specifying a custom Minio deployment running in the same Kubernetes cluster, be sure to update the [network policy to allow access to your custom resource](#local-network-access-policy-and-custom-services)"

## Horizontal Autoscaling

Browsertrix also includes support for horizontal auto-scaling for both the backend and frontend pods.
Expand Down Expand Up @@ -250,3 +252,36 @@ type btrixEvent = (
```

Tracking is optional and will never expose personally identifiable information.

## Local Network Access Policy and Custom Services

By default, Browsertrix configures the crawlers with a network policy that restricts access to internal Kubernetes resources, to prevent the crawler from snooping around the internal network. This should be fine for crawling
public websites with the default configuration.

However, you may want to provide access to an internal IP (for example, if crawling a site deployed on a local server) or another Kubernetes service (such as a custom Minio deployment)

To provide access, you can extend the existing network policy 'egress' with the `crawler_network_policy_additional_egress` setting:

For example, to allow the crawler to access the `10.0.0.1/32` IP block on port 80,
and to pods that have a label `my-custom-minio` only on port 9000, add:

```yaml
crawler_network_policy_additional_egress:
- to:
- ipBlock:
cidr: 10.0.0.1/32
ports:
- port: 80
protocol: TCP

- to:
- podSelector:
matchLabels:
app: my-custom-minio

ports:
- port: 9000
protocol: TCP
```

Refer to the [default networkpolicies.yaml](https://github.com/webrecorder/browsertrix/blob/main/chart/templates/networkpolicies.yaml) for additional examples and the [official Kubernetes documentation for Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
Loading