snowplow · AH-Avalanche · May 12, 2025 · Apr 16, 2025 · Apr 16, 2025 · Apr 16, 2025
diff --git a/.github/styles/Snowplow/Acronyms.yml b/.github/styles/Snowplow/Acronyms.yml
@@ -10,6 +10,7 @@ second: '(?:\b[A-Z][a-z]+ )+\(([A-Z]{3,5})\)'
 exceptions:
   - API
   - ASP
+  - CDN
   - CLI
   - CPU
   - CSS

diff --git a/.github/styles/Snowplow/Headings.yml b/.github/styles/Snowplow/Headings.yml
@@ -9,6 +9,7 @@ exceptions:
   - CLI
   - Cosmos
   - Docker
+  - DOM
   - Emmet
   - gRPC
   - I

diff --git a/.gitignore b/.gitignore
@@ -28,3 +28,6 @@ yarn-error.log*
 # Python
 __pycache__
 manifests
+
+# Local Netlify folder
+.netlify
diff --git a/README.md b/README.md
@@ -67,7 +67,7 @@ Vale only checks normal prose. Text that's marked as code—code blocks or in-li
 
 To install the extension, find "[Vale VSCode](https://marketplace.visualstudio.com/items?itemName=ChrisChinchilla.vale-vscode)" in the Extensions Marketplace within VS Code, then click **Install**.
 
-The Vale extension will automatically check files when they're opened or saved. It underlines the flagged sections in different colours, based on the severity of the alert - red for errors, orange for warnings, and blue for suggestions. Mouse-over the underlined section to see the alert message, or check the VS Code **Problems** tab.
+The Vale extension will automatically check files when they're opened or saved. It underlines the flagged sections in different colors, based on the severity of the alert - red for errors, orange for warnings, and blue for suggestions. Mouse-over the underlined section to see the alert message, or check the VS Code **Problems** tab.
 
 ### Vale command-line interface
 

diff --git a/docs/api-reference/loaders-storage-targets/bigquery-loader/_deploy-overview.md b/docs/api-reference/loaders-storage-targets/bigquery-loader/_deploy-overview.md
@@ -19,4 +19,4 @@ To run the loader, mount your config file into the docker image, and then provid
   --iglu-config /myconfig/iglu.hocon
 `}</CodeBlock>
 
-Where `loader.hocon` is loader's [configuration file](/docs/api-reference/loaders-storage-targets/bigquery-loader/#configuring-the-loader) and `iglu.hocon` is [iglu resolver](/docs/api-reference/iglu/iglu-resolver/index.md) configuration.
+Where `loader.hocon` is loader's [configuration file](/docs/api-reference/loaders-storage-targets/bigquery-loader/index.md#configuring-the-loader) and `iglu.hocon` is [iglu resolver](/docs/api-reference/iglu/iglu-resolver/index.md) configuration.
diff --git a/...ata-product-studio/data-quality/failed-events/monitoring-failed-events/index.md b/...ata-product-studio/data-quality/failed-events/monitoring-failed-events/index.md
@@ -35,7 +35,7 @@ We've filtered on these two types of errors to reduce noise like bot traffic tha
 
 At the top there is a data quality score that compares the volume of failed events to the volume that were successfully loaded into a data warehouse.
 
-The bar chart shows how the total number of failed events varies over time, colour-coding validation and enrichment errors. The time window is 7 days be default but can be extended to 14 or 30 days.
+The bar chart shows how the total number of failed events varies over time, color-coding validation and enrichment errors. The time window is 7 days be default but can be extended to 14 or 30 days.
 
 In the table, failed events are aggregated by the unique type of failure (validation, enrichment) and the offending schema.
 
@@ -49,7 +49,7 @@ The detailed view shows the error message as well as other useful metadata (when
 
 The data quality dashboard allows you to see failed events directly from your warehouse. Your browser connects directly to an API running within your infrastructure, such that no failed event information flows through Snowplow. The discussion of architecture is important as it highlights the trade-offs between the two ways of monitoring failed events. The aforementioned API is a simple proxy that connects to your warehouse and serves the failed events to your browser. The connection is fully secure, using an encrypted channel (HTTPS) and authenticating/authorizing via the same mechanism used by Console.
 
-In order for you to be able to deploy and use the data quality dashboard, you need to sink failed events to the warehouse via a [Failed Events Loader](/docs/data-product-studio/data-quality/failed-events/exploring-failed-events/warehouse-lake/#setup). The data quality dashboard only supports Snowflake.
+In order for you to be able to deploy and use the data quality dashboard, you need to sink failed events to the warehouse via a [Failed Events Loader](/docs/data-product-studio/data-quality/failed-events/exploring-failed-events/index.md#configure). The data quality dashboard currently supports Snowflake and Bigquery connections.
 
 ![](images/dqd-architecture.png)
 
@@ -67,7 +67,7 @@ Clicking on a particular error type will take you to a detailed view:
 
 ![](images/dqd-details.png)
 
-The detailed view also shows a description of the root cause and the application version ([web](/docs/sources/trackers/snowplow-tracker-protocol/ootb-data/app-information/#application-context-entity-on-web-apps), [mobile](/docs/sources/trackers/mobile-trackers/tracking-events/platform-and-application-context/)). It provides a sample of the failed events in their entirety, as found in your warehouse.
+The detailed view also shows a description of the root cause and the application version ([web](/docs/sources/trackers/snowplow-tracker-protocol/ootb-data/app-information/index.md#application-context-entity-on-web-apps), [mobile](/docs/sources/trackers/mobile-trackers/tracking-events/platform-and-application-context/index.md)). It provides a sample of the failed events in their entirety, as found in your warehouse.
 
 Some columns are too wide to fit in the table: click on them to see the full pretty-printed, syntax-highlighted content. The most useful column to explore is probably `CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_FAILURE_1`, which contains the actual error information encoded as a JSON object:
 

diff --git a/docs/data-product-studio/data-structures/index.md b/docs/data-product-studio/data-structures/index.md
@@ -8,4 +8,4 @@ This section explains how to create, manage, and update [data structures](/docs/
 * Snowplow BDP Console UI
 * Data structures API
 * [Snowplow CLI](/docs/data-product-studio/snowplow-cli/index.md)
-* For Community users: [Iglu](/docs/api-reference/iglu/iglu-repositories/iglu-server/)
+* For Community users: [Iglu](/docs/api-reference/iglu/iglu-repositories/iglu-server/index.md)
diff --git a/docs/fundamentals/schemas/index.md b/docs/fundamentals/schemas/index.md
@@ -117,7 +117,7 @@ After the self section, the remainder of the schema is where you will begin desc
 **“properties”** - Here is where you will describe the fields you intend on collecting. Each field is given a name and a number of arguments, these are important as they feed directly into the validation process.
 
 - **“description”** - Similar to the description field for the schema, this argument is where you should put detailed information on what this field represents to avoid any misunderstanding or misinterpretation during analysis.
-- **"type"** - This denotes the type of data that is collected through this field. The most common types of data collected are `string`, `number`, `integer`, `object`, `array`, `boolean` and `null`. A single field can allow multiple types as shown in the field `job title` in the example schema which allows both `string` and `null`
+- **"type"** - This denotes the type of data that is collected through this field. The most common types of data collected are `string`, `number`, `integer`, `object`, `array`, `boolean` and `null`. A single field can allow multiple types as shown in the field `job role` in the example schema which allows both `string` and `null`
 - Validation arguments can then be passed into the field such as `minLength`, `maxLength` and `enum` for strings and `minimum` and `maximum` for integers. A full set of valid arguments can be found on the [JSON schema specification](https://datatracker.ietf.org/doc/html/draft-fge-json-schema-validation-00#section-5).
 
 **“$supersedes”** / **“$supersededBy”** - _Optional, not shown_. See [marking schemas as superseded](/docs/data-product-studio/data-structures/version-amend/amending/index.md#marking-the-schema-as-superseded).
diff --git a/docs/get-started/feature-comparison/index.md b/docs/get-started/feature-comparison/index.md
@@ -12,7 +12,7 @@ Here is a detailed list of product features, including which are available as pa
 | [35+ trackers and webhooks](/docs/sources/index.md)                                                                       |   ✅   |                                     ✅                                      |
 | First party tracking                                                                                                      |   ✅   |                                     ✅                                      |
 | [Anonymous data collection](/docs/resources/recipes-tutorials/recipe-anonymous-tracking/index.md)                         |   ✅   |                                     ✅                                      |
-| [ID service](/docs/sources/trackers/javascript-trackers/web-tracker/browsers/index.md#what-is-an-id-service)              |   ✅   |                                     ✅                                      |
+| [Cookie Extension service](/docs/sources/trackers/javascript-trackers/web-tracker/browsers/index.md#what-is-a-cookie-extension-service)              |   ✅   |                                     ✅                                      |
 | High availability and auto-scaling                                                                                        |   ✅   |                                     ❌                                      |
 | [Enrichments](/docs/pipeline/enrichments/available-enrichments/index.md)                                                  |   ✅   |                                     ✅                                      |
 | [Failed events](/docs/fundamentals/failed-events/index.md)                                                                |   ✅   |                                     ✅                                      |

diff --git a/docs/get-started/tracking/cookies-and-ad-blockers/index.md b/docs/get-started/tracking/cookies-and-ad-blockers/index.md
@@ -21,14 +21,14 @@ Although first-party cookies don't allow sharing user and session identifiers ac
 Using Snowplow, you can set up your collector to be on the same domain as the website, which is the requirement for the use of first-party cookies.
 To learn how to set up the collector domain, [visit this page](/docs/sources/first-party-tracking/index.md).
 
-**Strategy 2: Snowplow ID service for Safari ITP**
+**Strategy 2: Snowplow Cookie Extension service for Safari ITP**
 
 As of Safari 16.4 released in April 2023, Safari sets the [lifetime of server-set cookies](https://webkit.org/tracking-prevention/#cname-and-third-party-ip-address-cloaking-defense) to a maximum of 7 days under some circumstances even if they are first-party cookies.
 This greatly limits the effectiveness of tracking a customer journey where users are not regularly returning to your website.
 In particular, it affects the `network_userid` identifier in Snowplow events.
 
-Snowplow provides the ID service solution that fully mitigates the impact of this change.
-Visit the [documentation for the ID service](/docs/sources/trackers/javascript-trackers/web-tracker/browsers/index.md#itp-mitigation) to learn more.
+Snowplow provides the Cookie Extension service solution that fully mitigates the impact of this change.
+Visit the [documentation for the Cookie Extension service](/docs/sources/trackers/javascript-trackers/web-tracker/browsers/index.md#itp-mitigation) to learn more.
 
 ## 2. Mitigating the impact of ad-blockers
 

diff --git a/...data/modeling-your-data-with-dbt/dbt-models/dbt-attribution-data-model/index.md b/...data/modeling-your-data-with-dbt/dbt-models/dbt-attribution-data-model/index.md
@@ -213,7 +213,7 @@ You can do either for campaigns, too, with the `snowplow__channels_to_exclude` a
 
 In order to reduce unneccesarily long paths you can apply a number of path transformations that are created as part of user defined functions automatically in your warehouse by the package.
 
-In order to apply these transformations, all you have to do is to define them in the `snowplow__path_transforms` variable as a list of dictionaries, with the transformation name as key and optionally the parameter as value (for `remove_if_last_and_not_all` and `remove_if_not_all`). If the transformation requires no parameter you can just use `null` as values for the dictionary. For more details on how to do this, check out the [configuration page](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-configuration/attribution/index.mdx) E.g.: `{'exposure_path': null, 'remove_if_last_and_not_all': 'direct'}`
+In order to apply these transformations, all you have to do is to define them in the `snowplow__path_transforms` variable as a dictionary. In case of `remove_if_last_and_not_all` and `remove_if_not_all` transformations, the transformation name is the key and a non-empty array is the value. For other transformations (`exposure_path`, `first_path`, `unique_path`), no additional parameter is required, you can just use `null` as values. For more details on how to do this, check out the [configuration page](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-configuration/attribution/index.mdx) E.g.: `{'exposure_path': null, 'remove_if_last_and_not_all': ['channel_to_remove_1', 'campaign_to_remove_1', 'campaign_to_remove_2']}` Please note that the transformations are applied on both campaign and channel paths equally.
 
 <details>
   <summary>Path transform options</summary>

diff --git a/...modeling-your-data/modeling-your-data-with-dbt/dbt-operation/debugging/index.md b/...modeling-your-data/modeling-your-data-with-dbt/dbt-operation/debugging/index.md
@@ -32,7 +32,7 @@ The most common scenario that may happen is there are certain session_identifier
 
 We process the data based on sessions: for each run we take a look at the new events (with default 6 hours of lookback window) then look at the sessions manifest table and identify how far to look back in time to cover those sessions with new events as a whole. One overlooked common scenario is that in case of pipeline issues, if some data is getting processed by the model, but the rest are only added to the warehouse outside of that default 6 hour lookback window, then the package will leave those events behind unless the same session will have new events coming in, and it needs to be reprocessed as a whole again (unless the data modeling jobs were immediately paused after the pipeline went partially down). In such cases you need partial backfilling (see the [late loaded events](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/late-arriving-data/index.md#late-loaded-events) section).
 
-The events can also be [late sent](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/late-arriving-data/#late-sent-events), in which case if they fall out of the limit, they will not get processed.
+The events can also be [late sent](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/late-arriving-data/index.md#late-sent-events), in which case if they fall out of the limit, they will not get processed.
 
 We also "quarantine" the sessions at times, as however hard we may try to filter out bots, there could be odd outliers that keep sessions alive for days, for which we have the `snowplow__max_session_days` variable. If bots keep sending events for longer than defined in that variable, the package will only process the first of those in a subsequent run and archives the session_identifier in the [quarantine table](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/manifest-tables/index.md#quarantine-table) after it runs. In such a case it is always best to look for the missing event's session_identifier (domain_userid as defaulted for web events) in the table to explain why the event is missing.
 

diff --git a/...ing-your-data/modeling-your-data-with-dbt/migration-guides/attribution/index.md b/...ing-your-data/modeling-your-data-with-dbt/migration-guides/attribution/index.md
@@ -0,0 +1,13 @@
+---
+title: "Attribution"
+sidebar_position: 20
+---
+
+### Upgrading to 0.5.0
+- From now on the `snowplow__path_transforms` variable parameters only accept non-empty arrays for `remove_if_last_and_not_all` and `remove_if_not_all` variables instead of strings, please your variable overwrites in your dbt_project.yml accordingly. Previously you could only remove one specific channel or campaign, now you can do multiple, if needed.
+
+```yml title="dbt_project.yml"
+vars:
+  snowplow_attribution:
+    snowplow__path_transforms: {'exposure_path': null, 'remove_if_last_and_not_all': ['channel_to_remove_1', 'campaign_to_remove_1, 'campaign_to_remove_2']}
+  ```
diff --git a/...enrichments/available-enrichments/custom-javascript-enrichment/writing/index.md b/...enrichments/available-enrichments/custom-javascript-enrichment/writing/index.md
@@ -243,25 +243,12 @@ You might be tempted to update derived entities in a similar way by using `event
 
 ## Discarding the event
 
-Sometimes you don’t want the event to appear in your data warehouse or lake, e.g. because you suspect it comes from a bot and not a real user. In this case, you can `throw` an exception in your JavaScript code, which will send the event to [failed events](/docs/fundamentals/failed-events/index.md):
+```mdx-code-block
+import DiscardingEvents from "@site/docs/reusable/discarding-events/_index.md"
 
-```js
-const botPattern = /.*Googlebot.*/;
-
-function process(event) {
-  const useragent = event.getUseragent();
-
-  if (useragent !== null && botPattern.test(useragent)) {
-    throw "Filtered event produced by Googlebot";
-  }
-}
+<DiscardingEvents/>
 ```
 
-:::caution
-
-This will create an “enrichment failure” failed event, which may be tricky to distinguish from genuine failures in your enrichment code, e.g. due to a mistake. In the future, we might provide a better mechanism for discarding events.
-
-:::
 
 ## Accessing Java methods
 

diff --git a/...line/enrichments/available-enrichments/pii-pseudonymization-enrichment/index.md b/...line/enrichments/available-enrichments/pii-pseudonymization-enrichment/index.md
@@ -12,7 +12,7 @@ In Europe the obligations regarding Personal Data handling have been outlined on
 
 ## Configuration
 
-- [Schema](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/2-0-0)
+- [Schema](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/2-0-1)
 - [Example](https://github.com/snowplow/enrich/blob/master/config/enrichments/pii_enrichment_config.json)
 
 ```mdx-code-block
@@ -23,7 +23,7 @@ import TestingWithMicro from "@site/docs/reusable/test-enrichment-with-micro/_in
 
 Two types of fields can be configured to be hashed:
 
-- `pojo`: field that is effectively a scalar field in the enriched event (full list of fields that can be pseudonymized [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/2-0-0#L43-L60))
+- `pojo`: field that is effectively a scalar field in the enriched event (full list of fields that can be pseudonymized [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/2-0-1#L43-L60))
 - `json`: field contained inside a self-describing JSON (e.g. in `unstruct_event`)
 
 With the configuration example, the fields `user_id` and `user_ipaddress` of the enriched event would be hashed, as well as the fields `email` and `ip_opt` of the unstructured event in case its schema matches _iglu:com.mailchimp/subscribe/jsonschema/1-\*-\*_.
@@ -42,9 +42,16 @@ It's **important** to keep these things in mind when using this enrichment:
 - Hashing a field can change its format (e.g. email) and its length, thus making a whole valid original event invalid if its schema is not compatible with the hashing.
 - When updating the `salt` after it has already been used, same original values hashed with previous and new salt will have different hashes, thus making a join impossible and/or creating duplicate values.
 
+### `anonymousOnly` mode
+Enrich 5.3.0 introduced the `anonymousOnly` mode. When [anonymousOnly](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/2-0-1#L155) is set to true, PII fields are masked only in events tracked in anonymous mode (i.e. the `SP-Anonymous` header is present).
+
+This is useful for compliance with regulation such as GDPR, where you would start with [anonymous tracking](/docs/sources/trackers/javascript-trackers/web-tracker/anonymous-tracking/index.md) by default (all identifiers are masked) and switch to non-anonymous tracking when the user consents to data collection (all identifiers are kept).
+
+By default, `anonymousOnly` is `false`, i.e. PII fields are always masked.
+
 ## Input
 
-[These fields](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/2-0-0#L43-L60) of the enriched event and any field of an unstructured event or context can be hashed.
+[These fields](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/2-0-1#L43-L60) of the enriched event and any field of an unstructured event or context can be hashed.
 
 ## Output
 

diff --git a/docs/pipeline/enrichments/filtering-bot-events/index.md b/docs/pipeline/enrichments/filtering-bot-events/index.md
@@ -0,0 +1,10 @@
+---
+title: "Filtering bot events"
+sidebar_position: 100
+---
+
+```mdx-code-block
+import DiscardingEvents from "@site/docs/reusable/discarding-events/_index.md"
+
+<DiscardingEvents/>
+```
-Original file line number
+Diff line change
@@ Expand Up / @@ -10,6 +10,7 @@ second: '(?:\b[A-Z][a-z]+ )+\(([A-Z]{3,5})\)' @@
     exceptions:
       - API
       - ASP
+      - CDN
       - CLI
       - CPU
       - CSS
@@ Expand Down @@