Implement sorted merge join

If both data sets are stored sorted on the join key, then its possible to perform the join on the map side. The general idea is to:
- Build up an index of keys to file location/offset of one of the data sets.
- Use the other data set as normal input to a map job.
- For each key, look up the the corresponding file/offset from the index.
- Directly read the file, seeking to the offset.

There are already implementations in both pig and hive, and would be a nice addition to scoobi.

Pigs implementation - http://wiki.apache.org/pig/PigMergeJoin
Hives implementation - https://issues.apache.org/jira/browse/HIVE-1194


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement sorted merge join #197

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement sorted merge join #197

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions