Binary incompatibility after deleting LineageEvent classes

The change in https://github.com/MarquezProject/marquez/pull/1593 made the `marquez-api` jar incompatible with code that had depended on the `LineageEvent` class and its related classes. Any code that depended on those models must now be rewritten to rely on the `OpenLineage.*` models, which have a very different construction model, thus require a major effort to rewrite. 

Moreover, the current OpenLineage API has introduced new fields in the `InputDataset` and `OutputDataset` models, which were never present in the Marquez implementation of the OpenLineage models. The `LineageEvent` model is annotated with `@JsonIgnoreProperties` so any new fields in the JSON are simply dropped during deserialization. Therefore, simply reverting the `LineageEvent` models would make the Marquez backend incompatible with the new OpenLineage models as new facets would be dropped from the model before storing.

I think we should revert #1593 and alter the models to support unknown fields. Some options for this are
1. Add a `Map<String, Object>` field annotated with `@JsonAnySetter` so that any unknown fields are added to the map, rather than dropped.
    * This is little work up front and offers backward and forward compatibility, as any unknown fields are automatically supported. There is some maintainability concern, as we need to update the Marquez model alongside the OL one. 
2. Extend or wrap (using `@JsonUnwrapped`) Jackson `ObjectNode` so that objects are automatically deserialized into JsonNodes and setters/getters are written to work with expected properties in a compatible API
    * This is the most up-front work, but offers the most compatibility and least maintenance. Each model is backward and future compatible with any event POSTed and will always be serialized back into an exact replica of the original event. Accessor methods must be hand-written to replace the lombok-generated ones in order to maintain API compatibility.
3. Wrap new `OpenLineage` model classes with existing Marquez models
    * This provides the binary compatibility we need, while avoiding the maintenance issue of synchronizing the Marquez models with the OpenLineage ones. The payload would always be deserialized into `OpenLineage` models (so we can receive and store the data even if the Marquez model is never updated). However, we still need to maintain the compatibility layer (the accessor methods) and we are still limited to the fields defined in the version of the OL library deployed with Marquez. Moreover, the OL API for constructing events is a bit cumbersome to use in a case like this. Each model class must be instantiated by an instance of the `OpenLineage` class, which is instantiated with the appropriate `producer` field. Thus, we can't simply instantiate a new `Job` or `JobFacet` and expect the accompanying `OpenLineage.Job` or `OpenLineage.JobFacets` class to be instantiated, as there needs to be a shared `OpenLineage` instance to actually create the instances. This is easy enough to accomplish for model instances that are created purely from Marquez (e.g., a static utility instance), but makes it very difficult to build a processing workflow, such as one that clones a model and adds a new facet (and maintains the original models' `producer` fields) before handing off to another processor.
4. Write custom deserializer to automatically add raw JSON string to LineageEvent object
    * This is the least work and solves the most immediate problem- that data serialized and stored in the `lineage_events` table is incomplete. However, it makes processing objects that have unknown fields impossible- e.g., a workflow that copies a `LineageEvent` and adds another facet to the `Run` before passing on to storage or another processor would immediately lose information. It also does not offer any additional maintainability support, as the Marquez models must always be updated to synchronize with the OL models.

Of the four options, the first offers the most compatibility with the most flexibility while maintaining forward/backward compatibility and relatively low maintainability concern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Binary incompatibility after deleting LineageEvent classes #1650

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Binary incompatibility after deleting LineageEvent classes #1650

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions