Support for flexible filtering / attribution functionality

In the latest f2f meeting in Paris, we had an action item to outline what a more flexible solution for ad-tech defined matching / attribution functionality could look like. This ensures business logic to sits with the ad-tech, and has the benefit of allowing innovation without changes requiring changes to our specification. On the other hand, full flexibility comes with downsides in terms of risk of side channel attacks (https://github.com/w3c/ppa/issues/153), system health issues, etc which come from executing arbitrary code from untrusted parties.

A natural approach here would be to spin up some Javascript environment to do this, which comes with many benefits (standardized, implemented in all browser engines already, etc), but has a few major drawbacks (it is Turing complete, potentially too heavyweight, large attack surface, etc). In this issue I wanted to pitch an intermediate option in between a fully declarative approach and a fully turing complete approach using [CEL](https://cel.dev/) – a non-turing complete, minimal expression language which was designed to operate on untrusted expressions.

Kudos to @apasel422 for the original suggestion and overall help exploring this option. cc @bmayd , @eriktaubeneck , and @ohpauleez who were interested in this approach.

## Straw API overview

We can allow callers to specify two CEL functions, `match` and `assignCredit` in `measureConversion`:


```javascript
measureConversion({
  // Generalization of https://w3c.github.io/ppa/#logic-matching
  // Returns a subset of impressions.
  match: "<some CEL expr>",

  // Generalization of https://w3c.github.io/ppa/#s-logic 4.4.1.6
  // Assigns credit (in [0,1]) to the list of matching impressions, so the
  // final histogram value will be proportional to credit.
  // Returns a map of {<impression id>: <double>}.
  assignCredit: "<some CEL expr>",
  …
});
```


Both expressions are invoked with the following CEL environment variables:



*   `impressions`: A list of records with the following fields:
    *   `id: string`: An opaque implementation-defined identifier for this impression
    *   `age: duration`: The duration since the impression was registered
    *   `userData: dyn`: An [arbitrary value](https://github.com/google/cel-spec/blob/master/doc/langdef.md#dynamic-values) provided by the impression registrar at the time it was registered, effectively one of the JSON types. In practice, this will likely be a `map<string, dyn>` so that multiple pieces of metadata can be associated with the impression.

For ergonomics, `impressions` will be pre-sorted by `age`, ascending.

## Examples

The CEL extensions used in these examples are:
- [Bindings](https://pkg.go.dev/github.com/google/cel-go/ext#readme-bindings) (not strictly required, but greatly reduces code size)
- [Lists](https://pkg.go.dev/github.com/google/cel-go/ext#readme-lists)
- [Math](https://pkg.go.dev/github.com/google/cel-go/ext#readme-math)
- [Two-Var Comprehensions](https://pkg.go.dev/github.com/google/cel-go/ext#readme-twovarcomprehensions)

We also expose new built-in functions `ppa.sum` and `ppa.pow` for list summation and exponentiation, though they are good candidates to add to the math extension.

### Last-touch with clicks and views

This implements https://github.com/w3c/ppa/issues/42#issuecomment-2962353680. Essentially, we:



*   Filter clicks and views via possibly different lookback windows (and possibly dynamic based on conversion time)
*   Give full credit to the last click if there are any clicks, otherwise give full credit to the last view

The following assume that clicks and views are saved via something like `saveImpression({userData: {click: <bool>}, ...});`.

`match`:


```
cel.bind(click,
  impressions.filter(i, i.userData.click && i.age <= duration('120h')),
  size(click) > 0
    ? click
    : impressions.filter(i, !i.userData.click && i.age <= duration('24h'))
).map(i, i.id)
```


`assignCredit`:


```
 {impressions[0].id: 1.0}
```



### MTA with exponential decay

Requirements:



*   Perform arbitrary filtering (not pictured)
*   Select the n-last events
*   Score the events with an exponential decay model (e.g. w/ half-life 7 days [168 hours])
*   Allocate credit proportional to the score

`assignCredit`:


```
cel.bind(lastN,
  impressions.slice(0, math.least(3, size(impressions))), cel.bind(cp,
  lastN.transformMapEntry(_, i,
    {i.id: ppa.pow(2.0, double(i.age.getHours()) / -168.0)}),
cel.bind(total,
  ppa.sum(cp.transformList(_, v, v)),
  cp.transformMap(_, f, f / total))))
```

### MTA with array-based credit

Requirements:

*   Perform arbitrary filtering (not pictured)
*   Select the n-last events
*   Assign the events credit based on their position in the input list
*   Normalize the credit

`assignCredit`:


```
cel.bind(credit, [8.0, 6.0, 2.0],
cel.bind(lastN,
  impressions.slice(0,
    math.least(size(credit), size(impressions))),
cel.bind(total,
  ppa.sum(credit.slice(0, size(lastN))),
  lastN.transformMapEntry(i, v, {v.id: credit[i] / total}))))
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for flexible filtering / attribution functionality #204

Straw API overview

Examples

Last-touch with clicks and views

MTA with exponential decay

MTA with array-based credit

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for flexible filtering / attribution functionality #204

Description

Straw API overview

Examples

Last-touch with clicks and views

MTA with exponential decay

MTA with array-based credit

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions