You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[server][common][controller][vpj] Materialized view projection and filter support
Add projection and filtering support for materialized view (MV) to be more efficient about unwanted data for view
consumers. Projection can be enabled by setting projection fields in the materialized view parameters. Similarly
filtering can be enabled by setting filter by fields. These two features can be enabled separately or together.
If enabled together the filter by fields will be included in the projecting fields automatically. Here is an example
MV configuration to illustrate the ideas:
Record containing fields: {a, b, c, d, e}
Projecting fields: {b, c}
Filtering fields: {a}
The only filtering option for now is to skip if none of the filter by fields changed. The filtering is also only applied
during hybrid ingestion since it doesn't make sense to have a change filter on batch push. With the above setup we will
project and write all batch data to the MV ({a, b, c}). RT updates (full PUT or UPDATE) will project and write the
resulting record to the MV ({a, b, c}) only if the value of field (a) is different from the old value. All DELETE events
will be written to the MV (no filtering).
In order to achieve the above behavior there are several changes:
1. Previously we've used pub sub message headers to perform forwarding to handle chunks during NR pass-through in remote
regions. This strategy will not work with projection because in order for us to perform projection on batch data in
remote regions, we will need the remote partition leaders to assemble the chunks during NR pass-through. We are
replacing the forwarding strategy with InMemoryChunkAssembler. To ensure leaders don't resume in-between chunks we will
also buffer and delay writing the chunks to drainer until we have a fully assembled record and produced it to view
topic(s).
2. Added enforcement in controller to ensure view configs are immutable. Projection schema is generated when adding a
new materialized view and stored with the view config. Since there can only be one schema version per view, the znode
size should be manageable with compression. If this becomes a concern we can also store it separately or generate it on
the fly. We also verify the filtering by fields and projection fields to ensure they exist in latest superset or value
schema and have default values.
3. Projection is performed in ComplexVeniceWriter as part of complexPut so both VPJ and leaders can use the same code
for projection. Filtering is performed in MaterializedViewWriter since current offering of change filter is applicable
only to hyrbid writes.
Copy file name to clipboardExpand all lines: clients/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/ActiveActiveStoreIngestionTask.java
0 commit comments