Skip to content

[Bug] Auto topic delete of partitioned topics has a MetadataCacheImpl race condition #24879

@darinspivey

Description

@darinspivey

Search before reporting

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

User environment

We are using partitioned topics of 6 partitions, and we have auto-topic-deletion turned on. When the GC fires to delete the partitions, as expected, they mostly fire at the same time. This appears to sometimes cause a race condition when they all attempt to update the topic's metadata. This has a cascading effect of:

  • The metadata update never completes, even with jitter and retries
  • The topic deletion eventually fails as a whole because the race conditions causes 1 or more partitions to remain
  • The partition(s) becomes orphaned, and the parent topic never can be deleted because GC does not appear to try again after a period of time.

System info

/pulsar $ pulsar version
Current version of pulsar is: 4.0.6
Git Revision 4538ef7645c45a3c8686092128fde6c5d61c762b
Git Branch branch-4.0
Built by Lari Hotari <lhotari@apache.org> on Laris-MBP.lan at 2025-07-30T13:37:25+0300

/pulsar $ uname -a
Linux pulsar-broker-0 6.12.46-66.121.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 22 16:35:59 UTC 2025 x86_64 GNU/Linux

Relevant helm values (Chart pulsar-4.2.0)

    brokerDeleteInactiveTopicsEnabled: 'true'
    brokerDeleteInactiveTopicsFrequencySeconds: '60'
    brokerDeleteInactiveTopicsMaxInactiveDurationSeconds: '60'
    brokerDeleteInactiveTopicsMode: delete_when_no_subscriptions
    brokerDeleteInactivePartitionedTopicMetadataEnabled: 'true'
    allowAutoTopicCreation: 'true'
    defaultNumPartitions: '6'
    allowAutoTopicCreationType: 'partitioned'
  • Note: other settings for auto-deleting subscriptions works fine. The problem is happening when the topic attempts auto-deletion (TLDR delete_when_no_subscriptions is working fine).

Relevant Logs

Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 Moving to FencedForDeletion state
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Remove ManagedLedger
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 (v. Optional.empty)
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 (v. Optional.empty)
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Successfully deleted managed ledger
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Topic deleted
Oct 18 06:31:03 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist.
Oct 18 06:31:03 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75].
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 Moving to FencedForDeletion state
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 Moving to FencedForDeletion state
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 Moving to FencedForDeletion state
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Remove ManagedLedger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Remove ManagedLedger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Remove ManagedLedger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Successfully deleted managed ledger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Topic deleted
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Successfully deleted managed ledger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Topic deleted
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Successfully deleted managed ledger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Topic deleted
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466079 ms
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466180 ms
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist.
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 10 ms. Mandatory stop: false. Elapsed time: 703 ms
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist.
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466983 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 20 ms. Mandatory stop: false. Elapsed time: 902 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75].
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 10 ms. Mandatory stop: false. Elapsed time: 184 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 19 ms. Mandatory stop: false. Elapsed time: 213 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist.
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75].
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783467281 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75].
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 Moving to FencedForDeletion state
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 Moving to FencedForDeletion state
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Remove ManagedLedger
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 (v. Optional.empty)
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Remove ManagedLedger
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 (v. Optional.empty)
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 (v. Optional.empty)
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 (v. Optional.empty)
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Successfully deleted managed ledger
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Topic deleted
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Successfully deleted managed ledger
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Topic deleted
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783474475 ms
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Inactive topic deletion failed
java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
	at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
	at org.apache.pulsar.client.admin.internal.BaseResource$4.failed(BaseResource.java:237) ~[org.apache.pulsar-pulsar-client-admin-original-4.0.6.jar:4.0.6]
	at org.glassfish.jersey.client.JerseyInvocation$1.failed(JerseyInvocation.java:898) ~[org.glassfish.jersey.core-jersey-client-2.42.jar:?]
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Inactive topic deletion failed
java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
	at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
	at org.apache.pulsar.client.admin.internal.BaseResource$4.failed(BaseResource.java:237) ~[org.apache.pulsar-pulsar-client-admin-original-4.0.6.jar:4.0.6]

The logs clearly show that partitions enter the deletion GC around the same time, and most partitions clean up fine (some logs of other partition's successes may have been omitted here). The errors to delete the parent metadata fail as normal until the last partition is gone. However, the metadata update failure appears to never allow that to happen.

Issue Description

What happened

Using auto topic deletion with partitioned topics fails sporadically due to a race condition. The failures cause orphaned topic partitions and the parent topic is never deleted.

What did you expect to happen

All partitions can successfully be deleted, along with the parent metadata, even if they all are GC'd at the same time.

Why is this a bug

The logs clearly show a race condition that orphans 1 or more topic partitions. This causes the topic to never be deleted, and it will never attempt to be GC'd again. This appears to be a distributed bug, as partitions can be held across many brokers all with their own GC loops. Therefore, setting execution threads to 1 isn't a solution either. The value used for jitter is also not exposed, but even if it was, I'm not sure that would be helpful. It seems reasonable that the system's retries that already exist should be able to complete the metadata update at some point, given that it tries for many minutes after the initial failure.

Reproducing the Error

Because this is a race condition, it's not possible to reproduce on demand. However, I see this problem in our production servers (which have hundreds of topics) on a daily basis with up to dozens of orphaned partitions.

Error messages

Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO  org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783474475 ms

Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout

Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Inactive topic deletion failed
java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout

Reproducing the issue

Topic auto deletion is a great feature, but without the ability to rely 100% on all partitions of a topic being deleted, we will not be able to use it because it makes the system unclean.

Additional information

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugThe PR fixed a bug or issue reported a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions