Skip to content

[cc][dvc][test] Add support for SpecificRecord deserialization in DVRT, and add isCaughtUp API for DVRT CDC #1790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
May 20, 2025

Conversation

kvargha
Copy link
Contributor

@kvargha kvargha commented May 13, 2025

Problem Statement

  1. Currently, DVRT only supports deserializing keys and values into Avro GenericRecords. This is a problem as for complex schemas this can lead to poor deserialization performance. Additionally, regular CDC client supports deserializing values to SpecificRecords, but DVRT CDC doesn't.
  2. In DVRT CDC, there is no way for the user to tell whether the client has caught up.
  3. During the start phase of DVRT CDC, subscription to the DVC can die silently.
  4. We allow users to call start multiple times in DVRT CDC.
  5. The javadoc for BootstrappingVeniceChangelogConsumer hasn't been updated since the introduction of DVRT CDC.
  6. DaVinciClientRecordTransformerTest::testRecordTransformer has been failing consistently in the CI.

Solution

  1. Add support for SpecificRecord deserialization for keys and values in DVRT to improve performance, and so DVRT CDC can benefit from it. Additionally, I've moved deserialization and serialization to FastAvro as it can perform 90% better when deserializing complex schemas. Please note that the regular CDC client doesn't support SpecificRecord for keys, but we are adding it for DVRT CDC since a user is requesting it.
  2. To provide context to the user on whether they're caught up in DVRT CDC, the isCaughtUp API is added to the BootstrappingVeniceChangelogConsumer interface. Since the original BootstrappingVeniceChangelogConsumer implementation is extended from VeniceAfterImageConsumerImpl, it already supports isCaughtUp.
  3. To prevent subscription to DVC in DVRT CDC from dying silently, I re-organized the futures and if subscription fails to DVC we will complete the future returned to the user exceptionally.
  4. If a user calls start multiple times on DVRT CDC, we now throw an exception. Additionally, if a user passes in an empty set to start we will subscribe to all partitions. I've also added sychronized to start.
  5. Updated the javadoc for BootstrappingVeniceChangelogConsumer, explaining how it behaves differently compared to the regular CDC client.
  6. To make DaVinciClientRecordTransformerTest::testRecordTransformer pass consistently, we need to ensure that this test runs first.

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.
    Any DVRT users will now be able to deserialize keys and values into SpecificRecords, leading to better performance and user experience.

Copy link
Contributor

@xunyin8 xunyin8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good, just one minor comment.

xunyin8
xunyin8 previously approved these changes May 19, 2025
@kvargha kvargha enabled auto-merge (squash) May 19, 2025 23:08
@kvargha kvargha merged commit 9c86b25 into linkedin:main May 20, 2025
58 checks passed
@kvargha kvargha deleted the dvrt-cdc-fixes branch May 20, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants