Skip to content

functionality chaining #15

@derrickoswald

Description

@derrickoswald

There is currently a long list of 'post-processing' operations (about, normalize, deduplicate, join, topology, edges) performed by the CIMReader after reading in a set of CIM files.

These are currently hard-coded via options to support the using ch.ninecode.cim argument for sql import of CIM files in python and R (i.e. using the non-compiled API). It would be better if these operations were broken out into separate modules/packages and a generic mechanism to chain the operations was implemented.

This has implications such as:

  • how to provision these extra modules (currently CIMReader via Maven Central has all the functionality baked in) so that --jars or --packages on the spark-shell or spark-submit command line can include the necessary code
  • how to provide parameters to the post-processing code (e.g. ch.ninecode.cim.do_topo_islands and ch.ninecode.cim.force_retain_fuses for the CIMNetworkTopologyProcessor)
  • how to specify that these modules operate either on the raw Elements RDD or the SparkSQL/named RDD after subsetting
  • how to inform the CIMRelation code of the post-processing tasks and their ordering that need to be performed
  • how to allow for user-generated post-processors that are not part of the CIMSpark codebase
  • what is/are the interface specifications between the CIMReader and post-processing modules

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions