Skip to content

The broker cannot join the cluster after it is started #2833

@yangzw1024

Description

@yangzw1024

我使用开源版 1.5.5 在虚拟机上部署了一个集群,但是节点在初次启动 automq 进程时无法加入集群,终止进程后,再次以同样的命令启动后又正常加入了集群。

如下,我有一个 10.20.0.127 的新节点作为 broker 启动,但是在初次启动时会一直报错, 节点始终无法正常加入集群。当我 kill 进程,再重新启动时,节点就可以正常加入集群了。这两启动之间外部环境和启动命令没有任何变化,这是什么原因呢?broker 节点和 controller节点都存在同样的问题,controller 节点初次启动时也无法组成集群自动退出,再次启动后正常。以下为 broker 节点的日志,附件为 broker 两次启动的完整日志

broker 启动命令:

KAFKA_S3_ACCESS_KEY='XXXXXXXXXXXXXXXXXXXX' KAFKA_S3_SECRET_KEY='XXXXXXXXXXXXXXXXXXXXXXXXXXX' ./bin/kafka-server-start.sh -daemon config/kraft/broker.properties --override cluster.id=sis-automq-test --override node.id=1005 --override controller.quorum.voters=0@10.20.9.210:9093,1@10.20.11.91:9093,2@10.20.1.32:9093 --override controller.quorum.bootstrap.servers=10.20.9.210:9093,10.20.11.91:9093,10.20.1.32:9093 --override advertised.listeners=PLAINTEXT://10.20.0.127:9092 --override s3.data.buckets='0@s3://shareit-tmp-sg?region=ap-southeast-3&endpoint=https://obs.ap-southeast-3.myhuaweicloud.com' --override s3.wal.path='0@s3://shareit-tmp-sg?region=ap-southeast-3&endpoint=https://obs.ap-southeast-3.myhuaweicloud.com' --override s3.ops.buckets='1@s3://shareit-tmp-sg?region=ap-southeast-3&endpoint=https://obs.ap-southeast-3.myhuaweicloud.com' --override log.dirs='/root/kraft-logs'  --override  s3.telemetry.metrics.exporter.type=prometheus --override  s3.metrics.exporter.prom.host=0.0.0.0 --override  s3.metrics.exporter.prom.port=10106

server.log

第一次启动:

[2025-09-12 16:19:17,065] INFO [Producer clientId=__automq_client_auto_balancer_metrics_reporter_producer] Node -1 disconnected. (org.apache.kafka.clients.NetworkClient)
[2025-09-12 16:19:17,066] WARN [Producer clientId=__automq_client_auto_balancer_metrics_reporter_producer] Connection to node -1 (localhost/127.0.0.1:9092) could not be establ
ished. Node may not be available. (org.apache.kafka.clients.NetworkClient)
[2025-09-12 16:19:17,066] WARN [Producer clientId=__automq_client_auto_balancer_metrics_reporter_producer] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (or
g.apache.kafka.clients.NetworkClient)
[2025-09-12 16:19:17,144] INFO [MetadataLoader id=1005] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apac
he.kafka.image.loader.MetadataLoader)
[2025-09-12 16:19:17,172] INFO [RaftManager id=1005] Fetching snapshot OffsetAndEpoch(offset=357460, epoch=54) from Fetch response from leader 0 (org.apache.kafka.raft.KafkaRa
ftClient)

第二次启动:

[2025-09-12 16:25:38,779] INFO [Producer clientId=__automq_client_auto_balancer_metrics_reporter_producer] Node -1 disconnected. (org.apache.kafka.clients.NetworkClient)
[2025-09-12 16:25:38,779] WARN [Producer clientId=__automq_client_auto_balancer_metrics_reporter_producer] Connection to node -1 (localhost/127.0.0.1:9092) could not be establ
ished. Node may not be available. (org.apache.kafka.clients.NetworkClient)
[2025-09-12 16:25:38,780] WARN [Producer clientId=__automq_client_auto_balancer_metrics_reporter_producer] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (or
g.apache.kafka.clients.NetworkClient)
[2025-09-12 16:25:38,822] INFO [ObjectStorage-0-5] List objects finished, count: 0, cost: 39ms (com.automq.stream.s3.operator.AbstractObjectStorage)
[2025-09-12 16:25:38,839] INFO Reset S3 WAL (com.automq.stream.s3.wal.impl.object.ObjectWALService)
[2025-09-12 16:25:38,839] INFO Shutdown S3 WAL. (com.automq.stream.s3.wal.impl.object.ObjectWALService)
[2025-09-12 16:25:38,841] INFO S3 WAL record accumulator is closed. (com.automq.stream.s3.wal.impl.object.RecordAccumulator)
[2025-09-12 16:25:38,852] INFO Acquire permission for node: 1005, epoch: 1757665537321, failover: false (com.automq.stream.s3.wal.impl.object.ObjectReservationService)
[2025-09-12 16:25:38,918] INFO Start S3 WAL. (com.automq.stream.s3.wal.impl.object.ObjectWALService)
[2025-09-12 16:25:39,036] INFO [BrokerLifecycleManager id=1005] The broker is in RECOVERY. (kafka.server.BrokerLifecycleManager)
[2025-09-12 16:25:39,186] INFO [ObjectStorage-0-7] List objects finished, count: 0, cost: 13ms (com.automq.stream.s3.operator.AbstractObjectStorage)
[2025-09-12 16:25:39,186] INFO Register nodeId=1005 nodeEpoch=1757665537321 with new WAL configs: 0@s3://shareit-tmp-sg?region=ap-southeast-3&endpoint=https://obs.ap-southeast
-3.myhuaweicloud.com (kafka.log.stream.s3.wal.BootstrapWalV1)
[2025-09-12 16:25:39,196] INFO Reset S3 WAL (com.automq.stream.s3.wal.impl.object.ObjectWALService)
[2025-09-12 16:25:39,197] INFO S3Storage start completed (com.automq.stream.s3.S3Storage)
[2025-09-12 16:25:39,197] INFO [CompactionManager id=1005] Next Compaction started in 600000 ms (com.automq.stream.s3.compact.CompactionManager)
[2025-09-12 16:25:39,198] INFO S3Client started (kafka.log.stream.s3.DefaultS3Client)
[2025-09-12 16:25:39,206] INFO [BrokerServer id=1005] Waiting for the broker to be unfenced (kafka.server.BrokerServer)
[2025-09-12 16:25:39,214] INFO [BrokerLifecycleManager id=1005] The broker has been unfenced. Transitioning from RECOVERY to RUNNING. (kafka.server.BrokerLifecycleManager)
[2025-09-12 16:25:39,214] INFO [BrokerServer id=1005] Finished waiting for the broker to be unfenced (kafka.server.BrokerServer)
[2025-09-12 16:25:39,219] INFO authorizerStart completed for endpoint PLAINTEXT. Endpoint is now READY. (org.apache.kafka.server.network.EndpointReadyFutures)
[2025-09-12 16:25:39,219] INFO [SocketServer listenerType=BROKER, nodeId=1005] Enabling request processing. (kafka.network.SocketServer)
[2025-09-12 16:25:39,222] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.DataPlaneAcceptor)
[2025-09-12 16:25:39,225] INFO [BrokerServer id=1005] Waiting for all of the authorizer futures to be completed (kafka.server.BrokerServer)
[2025-09-12 16:25:39,225] INFO [BrokerServer id=1005] Finished waiting for all of the authorizer futures to be completed (kafka.server.BrokerServer)
[2025-09-12 16:25:39,226] INFO [BrokerServer id=1005] Waiting for all of the SocketServer Acceptors to be started (kafka.server.BrokerServer)
[2025-09-12 16:25:39,226] INFO [BrokerServer id=1005] Finished waiting for all of the SocketServer Acceptors to be started (kafka.server.BrokerServer)
[2025-09-12 16:25:39,226] INFO [BrokerServer id=1005] Transition from STARTING to STARTED (kafka.server.BrokerServer)
[2025-09-12 16:25:39,226] INFO Kafka version: 3.9.0 (org.apache.kafka.common.utils.AppInfoParser)
[2025-09-12 16:25:39,226] INFO Kafka commitId: 6636f8791a0d7a66 (org.apache.kafka.common.utils.AppInfoParser)
[2025-09-12 16:25:39,226] INFO Kafka startTimeMs: 1757665539226 (org.apache.kafka.common.utils.AppInfoParser)
[2025-09-12 16:25:39,227] INFO [KafkaRaftServer nodeId=1005] Kafka Server started (kafka.server.KafkaRaftServer)
[2025-09-12 16:25:39,241] INFO [MetadataLoader id=1005] InitializeNewPublishers: initializing failover-listener with a snapshot at offset 362422 (org.apache.kafka.image.loader
.MetadataLoader)
[2025-09-12 16:25:39,814] INFO [Producer clientId=__automq_client_auto_balancer_metrics_reporter_producer] Cluster ID: sis-automq-test (org.apache.kafka.clients.Metadata)
[2025-09-12 16:25:39,925] INFO [Producer clientId=__automq_client_auto_balancer_metrics_reporter_producer] ProducerId set to 4001 with epoch 0 (org.apache.kafka.clients.produc
er.internals.TransactionManager)

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions