-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
There is currently a long list of 'post-processing' operations (about, normalize, deduplicate, join, topology, edges) performed by the CIMReader after reading in a set of CIM files.
These are currently hard-coded via options
to support the using ch.ninecode.cim
argument for sql import of CIM files in python and R (i.e. using the non-compiled API). It would be better if these operations were broken out into separate modules/packages and a generic mechanism to chain the operations was implemented.
This has implications such as:
- how to provision these extra modules (currently CIMReader via Maven Central has all the functionality baked in) so that --jars or --packages on the spark-shell or spark-submit command line can include the necessary code
- how to provide parameters to the post-processing code (e.g. ch.ninecode.cim.do_topo_islands and ch.ninecode.cim.force_retain_fuses for the CIMNetworkTopologyProcessor)
- how to specify that these modules operate either on the raw Elements RDD or the SparkSQL/named RDD after subsetting
- how to inform the CIMRelation code of the post-processing tasks and their ordering that need to be performed
- how to allow for user-generated post-processors that are not part of the CIMSpark codebase
- what is/are the interface specifications between the CIMReader and post-processing modules