Skip to content

[CRD-v1] Redesign for KafkaMirrorMaker2 CRD #11842

@katheris

Description

@katheris

Related problem

The current version of the KafkaMirrorMaker2 CRD requires the following example CR:

kind: KafkaMirrorMaker2
metadata:
  name: my-mirror-maker-2
spec:
  version: 4.0.0
  replicas: 1
  connectCluster: "cluster-b"
  clusters:
  - alias: "cluster-a"
    bootstrapServers: cluster-a-kafka-bootstrap:9092
  - alias: "cluster-b"
    bootstrapServers: cluster-b-kafka-bootstrap:9092
    config:
      config.storage.replication.factor: -1
      offset.storage.replication.factor: -1
      status.storage.replication.factor: -1
  mirrors:
  - sourceCluster: "cluster-a"
    targetCluster: "cluster-b"
    sourceConnector:
      tasksMax: 1
      config:
        replication.factor: -1
        offset-syncs.topic.replication.factor: -1
        sync.topic.acls.enabled: "false"
        refresh.topics.interval.seconds: 600
    checkpointConnector:
      tasksMax: 1
      config:
        checkpoints.topic.replication.factor: -1
        sync.group.offsets.enabled: "false"
        refresh.groups.interval.seconds: 600
    topicsPattern: ".*"
    groupsPattern: ".*"

This will deploy a Kafka Connect cluster that is storing it's data in cluster-b and deploy the MirrorSourceConnector and MirrorCheckpointConnector mirroring from cluster-a to cluster-b.

Currently the names for the config, offset and status topics are not required, but if the user does want to set them they must be set under the cluster that is listed as the targetCluster in each mirror. Strimzi also enforces that the connectCluster property be set to the same as the targetCluster of each mirror. This is not strictly required by Kafka but is generally recommended for at least the MirrorSourceConnector and MirrorCheckpointConnector.

Although in theory the API allows the user to specify many different clusters and many different mirroring routes in the same file, the reality is that since the target must match connectCluster all routes must replicate to the same Kafka cluster.

Based on chatting to users of Strimzi the KafkaMirrorMaker2 CR doesn't seem the easiest for users. The introduction of the v1 API gives us the opportunity to change the KafkaMirrorMaker2 CRD in a way to make it more intuitive.

Suggested solution

I propose the following CR:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: my-mirror-maker-2
spec:
  version: 4.0.0
  replicas: 1
  targetCluster:
    alias: "cluster-c"
    bootstrapServers: cluster-c-kafka-bootstrap:9092
    config:
      config.storage.topic: my-mirror-maker-config
      config.storage.replication.factor: -1
      offset.storage.topic: my-mirror-maker-offset
      offset.storage.replication.factor: -1
      status.storage.topic: my-mirror-maker-status
      status.storage.replication.factor: -1
  sourceClusters:
  - alias: "cluster-a"
    bootstrapServers: cluster-a-kafka-bootstrap:9092
  - alias: "cluster-b"
    bootstrapServers: cluster-b-kafka-bootstrap:9092
  mirrors:
  - sourceCluster: "cluster-a"
    sourceConnector:
      tasksMax: 1
      config:
        replication.factor: -1
        offset-syncs.topic.replication.factor: -1
        sync.topic.acls.enabled: "false"
        refresh.topics.interval.seconds: 600
    checkpointConnector:
      tasksMax: 1
      config:
        checkpoints.topic.replication.factor: -1
        sync.group.offsets.enabled: "false"
        refresh.groups.interval.seconds: 600
    topicsPattern: ".*"
    groupsPattern: ".*"
  - sourceCluster: "cluster-b"
    sourceConnector:
      tasksMax: 1
      config:
        replication.factor: -1
        offset-syncs.topic.replication.factor: -1
        sync.topic.acls.enabled: "false"
        refresh.topics.interval.seconds: 600
    checkpointConnector:
      tasksMax: 1
      config:
        checkpoints.topic.replication.factor: -1
        sync.group.offsets.enabled: "false"
        refresh.groups.interval.seconds: 600
    topicsPattern: ".*"
    groupsPattern: ".*"

So the key changes are:

  • The user specifies a single targetCluster in their CR, and this cluster is used for configuring the storage of the underlying Connect cluster
  • The user is required to specify the topic names for the targetCluster
  • The user specifies a set of sourceClusters rather than generic clusters
  • For each mirror the user only specifies the sourceCluster, since the targetCluster is set at the CR level
  • As before the connector names are generated as <SOURCE_CLUSTER_ALIAS>-><TARGET_CLUSTER_ALIAS>, e.g. cluster-a->cluster-c

This API better guides users to deploy MirrorMaker2 in the recommended way. If a user wants to deploy a more complex or non-recommended topology they can always use the KafkaConnect and KafkaConnector CRs directly.

MirrorHeartbeatConnector

The MirrorHeartbeatConnector is confusing to configure using the existing CR. For a MirrorHeartbeatConnector that is related to a MirrorSourceConnector replicating from cluster-a to cluster-c, the underlying Connect cluster should be associated with the source cluster (cluster-a), not the target (cluster-c) (since that is where it produces messages). However the source.alias and target.alias must be set as cluster-a and cluster-c respectively, otherwise the contents of the messages in the heartbeat topic don't make sense (they include these aliases). With the existing CR this means when using the MirrorHeartbeatConnector the user must specify connection details for the source cluster under the target cluster alias (see this comment for an example #11695 (comment)).

Since the MirrorHeartbeatConnector is not that commonly used and is confusing to configure we have a few options to make the situation better:

  1. Removing it entirely from the KafkaMirrorMaker2 CRD, and instead provide an example file for how to configure it using the standard KafkaConnect and KafkaConnector CRs
  2. Providing a new KafkaMirrorMaker2Heartbeat CR that is tailored for the MirrorHeartbeatConnector
  3. Update the heartbeatConnector section of the KafkaMirrorMaker2 CR to allow the user to specify the Connect topics and other Connect cluster configs directly there and having Strimzi create a second Connect cluster connected to the source Kafka cluster if the heartbeatConnector is configured.
  4. Require the user to specify the Connect topics and other Connect cluster for the sourceCluster under the sourceClusters section when the heartbeatConnector is configured.

Given the fact that the MirrorHeartbeatConnector is not so commonly used, I would propose option 1.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions