-
Notifications
You must be signed in to change notification settings - Fork 18
Description
In the latest f2f meeting in Paris, we had an action item to outline what a more flexible solution for ad-tech defined matching / attribution functionality could look like. This ensures business logic to sits with the ad-tech, and has the benefit of allowing innovation without changes requiring changes to our specification. On the other hand, full flexibility comes with downsides in terms of risk of side channel attacks (#153), system health issues, etc which come from executing arbitrary code from untrusted parties.
A natural approach here would be to spin up some Javascript environment to do this, which comes with many benefits (standardized, implemented in all browser engines already, etc), but has a few major drawbacks (it is Turing complete, potentially too heavyweight, large attack surface, etc). In this issue I wanted to pitch an intermediate option in between a fully declarative approach and a fully turing complete approach using CEL – a non-turing complete, minimal expression language which was designed to operate on untrusted expressions.
Kudos to @apasel422 for the original suggestion and overall help exploring this option. cc @bmayd , @eriktaubeneck , and @ohpauleez who were interested in this approach.
Straw API overview
We can allow callers to specify two CEL functions, match and assignCredit in measureConversion:
measureConversion({
// Generalization of https://w3c.github.io/ppa/#logic-matching
// Returns a subset of impressions.
match: "<some CEL expr>",
// Generalization of https://w3c.github.io/ppa/#s-logic 4.4.1.6
// Assigns credit (in [0,1]) to the list of matching impressions, so the
// final histogram value will be proportional to credit.
// Returns a map of {<impression id>: <double>}.
assignCredit: "<some CEL expr>",
…
});Both expressions are invoked with the following CEL environment variables:
impressions: A list of records with the following fields:id: string: An opaque implementation-defined identifier for this impressionage: duration: The duration since the impression was registereduserData: dyn: An arbitrary value provided by the impression registrar at the time it was registered, effectively one of the JSON types. In practice, this will likely be amap<string, dyn>so that multiple pieces of metadata can be associated with the impression.
For ergonomics, impressions will be pre-sorted by age, ascending.
Examples
The CEL extensions used in these examples are:
- Bindings (not strictly required, but greatly reduces code size)
- Lists
- Math
- Two-Var Comprehensions
We also expose new built-in functions ppa.sum and ppa.pow for list summation and exponentiation, though they are good candidates to add to the math extension.
Last-touch with clicks and views
This implements #42 (comment). Essentially, we:
- Filter clicks and views via possibly different lookback windows (and possibly dynamic based on conversion time)
- Give full credit to the last click if there are any clicks, otherwise give full credit to the last view
The following assume that clicks and views are saved via something like saveImpression({userData: {click: <bool>}, ...});.
match:
cel.bind(click,
impressions.filter(i, i.userData.click && i.age <= duration('120h')),
size(click) > 0
? click
: impressions.filter(i, !i.userData.click && i.age <= duration('24h'))
).map(i, i.id)
assignCredit:
{impressions[0].id: 1.0}
MTA with exponential decay
Requirements:
- Perform arbitrary filtering (not pictured)
- Select the n-last events
- Score the events with an exponential decay model (e.g. w/ half-life 7 days [168 hours])
- Allocate credit proportional to the score
assignCredit:
cel.bind(lastN,
impressions.slice(0, math.least(3, size(impressions))), cel.bind(cp,
lastN.transformMapEntry(_, i,
{i.id: ppa.pow(2.0, double(i.age.getHours()) / -168.0)}),
cel.bind(total,
ppa.sum(cp.transformList(_, v, v)),
cp.transformMap(_, f, f / total))))
MTA with array-based credit
Requirements:
- Perform arbitrary filtering (not pictured)
- Select the n-last events
- Assign the events credit based on their position in the input list
- Normalize the credit
assignCredit:
cel.bind(credit, [8.0, 6.0, 2.0],
cel.bind(lastN,
impressions.slice(0,
math.least(size(credit), size(impressions))),
cel.bind(total,
ppa.sum(credit.slice(0, size(lastN))),
lastN.transformMapEntry(i, v, {v.id: credit[i] / total}))))