functionality chaining

There is currently a long list of 'post-processing' operations (about, normalize, deduplicate, join, topology, edges) performed by the CIMReader after reading in a set of CIM files.

These are currently hard-coded via `options` to support the `using ch.ninecode.cim` argument for sql import of CIM files in **python** and **R** (i.e. using the non-compiled API). It would be better if these operations were broken out into separate modules/packages and a generic mechanism to chain the operations was implemented.

This has implications such as:
- how to provision these extra modules (currently CIMReader via Maven Central has all the functionality baked in) so that --jars or --packages on the spark-shell or spark-submit command line can include the necessary code
- how to provide parameters to the post-processing code (e.g. ch.ninecode.cim.do_topo_islands and ch.ninecode.cim.force_retain_fuses for the CIMNetworkTopologyProcessor)
- how to specify that these modules operate either on the raw Elements RDD or the SparkSQL/named RDD after subsetting
- how to inform the CIMRelation code of the post-processing tasks and their ordering that need to be performed
- how to allow for user-generated post-processors that are *not* part of the CIMSpark codebase
- what is/are the interface specifications between the CIMReader and post-processing modules


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

functionality chaining #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

functionality chaining #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions