Skip to content

d3.join? #52

@mbostock

Description

@mbostock

Imagine you’re joining a TSV file to a GeoJSON feature collection. A typical way of doing that might be to create a Map and then use array.forEach:

var map = new Map(rates.map(d => [d.id, +d.rate]));
collection.features.forEach(f => f.properties.rate = map.get(f.id));

It’d be neat if there was a simple way to join two arrays of objects and invoke a function for each joined row.

Option 1:

d3.join(collection.features, rates, (a, b) => a.properties.rate = +b.rate);

This doesn’t really work because it would assume that d => d.id is always the key function, and in practice you’d want to be able to specify key functions for both the left and the right arrays. I suppose you could require calling array.map on your arrays before passing them to d3.join, but that makes it increasingly less useful than just using a Map as above.

I think we should avoid too many unnamed arguments to a single function especially with optionals, so the following Option 2 probably isn’t a good idea:

d3.join(collection.features, a => a.id, rates, b => b.id, (a, b) => a.properties.rate = +b.rate);

A verbose option 3, a bit like d3.nest:

d3.join()
    .leftKey(a => a.id)
    .rightKey(b => b.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (rates, collection.features);

An enhancement of option 3 with a convenience for setting the left and right key to the same function:

d3.join()
    .key(d => d.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (rates, collection.features);

But what would join.key with no arguments return?

A further or alternative enhancement of option 3 to specify the left and right key to the constructor:

d3.join(d => d.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (rates, collection.features);

Slightly icky problem here is the default case. Unlike d3.nest, there’s a reasonable default join, but to use it requires extra parens:

d3.join()(rates, collection.features);

Option 4 is immutable closures like d3-interpolate’s interpolate.gamma. These are nice because then you don’t need extra parens in the default case:

d3.join(rates, collection.features);

With a custom reducer:

d3.join.reduce((a, b) => a.properties.rate = +b.rate)(collection.features, rates);

With a custom key and reducer (everything is named!):

d3.join
    .key(d => d.id)
    .reduce((a, b) => a.properties.rate = +b.rate)
    (collection.features, rates)

With this approach join.key can easily take two functions if you wanted separate keys for left and right. (You could have separate join.leftKey and join.rightKey, but I don’t think it’s necessary.) You can’t call join.key as an accessor as you can in option 3 so there’s no issue with what sort of return value makes sense—it always constructs a new join operator.

Also there’s the question of what join(A, B) should return. Nothing? Maybe an array of results returned by the reducer, similar to d3.cross? With the same default reducer of (a, b) => [a, b]?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions