-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Search before reporting
- I searched in the issues and found nothing similar.
Read release policy
- I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.
User environment
We are using partitioned topics of 6 partitions, and we have auto-topic-deletion turned on. When the GC fires to delete the partitions, as expected, they mostly fire at the same time. This appears to sometimes cause a race condition when they all attempt to update the topic's metadata. This has a cascading effect of:
- The metadata update never completes, even with jitter and retries
- The topic deletion eventually fails as a whole because the race conditions causes 1 or more partitions to remain
- The partition(s) becomes orphaned, and the parent topic never can be deleted because GC does not appear to try again after a period of time.
System info
/pulsar $ pulsar version
Current version of pulsar is: 4.0.6
Git Revision 4538ef7645c45a3c8686092128fde6c5d61c762b
Git Branch branch-4.0
Built by Lari Hotari <lhotari@apache.org> on Laris-MBP.lan at 2025-07-30T13:37:25+0300
/pulsar $ uname -a
Linux pulsar-broker-0 6.12.46-66.121.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 22 16:35:59 UTC 2025 x86_64 GNU/Linux
Relevant helm values (Chart pulsar-4.2.0)
brokerDeleteInactiveTopicsEnabled: 'true'
brokerDeleteInactiveTopicsFrequencySeconds: '60'
brokerDeleteInactiveTopicsMaxInactiveDurationSeconds: '60'
brokerDeleteInactiveTopicsMode: delete_when_no_subscriptions
brokerDeleteInactivePartitionedTopicMetadataEnabled: 'true'
allowAutoTopicCreation: 'true'
defaultNumPartitions: '6'
allowAutoTopicCreationType: 'partitioned'
- Note: other settings for auto-deleting subscriptions works fine. The problem is happening when the topic attempts auto-deletion (TLDR
delete_when_no_subscriptionsis working fine).
Relevant Logs
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 Moving to FencedForDeletion state
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Remove ManagedLedger
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 (v. Optional.empty)
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2 (v. Optional.empty)
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Successfully deleted managed ledger
Oct 18 06:31:02 pulsar-broker-1 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Topic deleted
Oct 18 06:31:03 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist.
Oct 18 06:31:03 pulsar-broker-1 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-2] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75].
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 Moving to FencedForDeletion state
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 Moving to FencedForDeletion state
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 Moving to FencedForDeletion state
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Remove ManagedLedger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Remove ManagedLedger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Remove ManagedLedger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4 (v. Optional.empty)
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Successfully deleted managed ledger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Topic deleted
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Successfully deleted managed ledger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Topic deleted
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Successfully deleted managed ledger
Oct 18 06:31:05 pulsar-broker-2 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Topic deleted
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466079 ms
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466180 ms
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist.
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 10 ms. Mandatory stop: false. Elapsed time: 703 ms
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist.
Oct 18 06:31:06 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783466983 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 20 ms. Mandatory stop: false. Elapsed time: 902 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-3] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75].
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 10 ms. Mandatory stop: false. Elapsed time: 184 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 19 ms. Mandatory stop: false. Elapsed time: 213 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75] Delete topic metadata failed because another partition exist.
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-4] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75].
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783467281 ms
Oct 18 06:31:07 pulsar-broker-2 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-0] Skip to delete partitioned topic: Another partition exists for [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75].
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [pulsar-inactivity-monitor-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Global topic inactive for 60 seconds, closed repl producers
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 Moving to FencedForDeletion state
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Successfully delete authentication policies for topic persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [broker-topic-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 Moving to FencedForDeletion state
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Remove ManagedLedger
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 (v. Optional.empty)
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Remove ManagedLedger
Oct 18 06:31:13 pulsar-broker-0 pulsar-broker [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 (v. Optional.empty)
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1 (v. Optional.empty)
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleted path: /managed-ledgers/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5 (v. Optional.empty)
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Successfully deleted managed ledger
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Topic deleted
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Successfully deleted managed ledger
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker [bookkeeper-ml-scheduler-OrderedScheduler-0-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Topic deleted
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783474475 ms
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Inactive topic deletion failed
java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
at org.apache.pulsar.client.admin.internal.BaseResource$4.failed(BaseResource.java:237) ~[org.apache.pulsar-pulsar-client-admin-original-4.0.6.jar:4.0.6]
at org.glassfish.jersey.client.JerseyInvocation$1.failed(JerseyInvocation.java:898) ~[org.glassfish.jersey.core-jersey-client-2.42.jar:?]
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-1] Inactive topic deletion failed
java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
at org.apache.pulsar.client.admin.internal.BaseResource$4.failed(BaseResource.java:237) ~[org.apache.pulsar-pulsar-client-admin-original-4.0.6.jar:4.0.6]
The logs clearly show that partitions enter the deletion GC around the same time, and most partitions clean up fine (some logs of other partition's successes may have been omitted here). The errors to delete the parent metadata fail as normal until the last partition is gone. However, the metadata update failure appears to never allow that to happen.
Issue Description
What happened
Using auto topic deletion with partitioned topics fails sporadically due to a race condition. The failures cause orphaned topic partitions and the parent topic is never deleted.
What did you expect to happen
All partitions can successfully be deleted, along with the parent metadata, even if they all are GC'd at the same time.
Why is this a bug
The logs clearly show a race condition that orphans 1 or more topic partitions. This causes the topic to never be deleted, and it will never attempt to be GC'd again. This appears to be a distributed bug, as partitions can be held across many brokers all with their own GC loops. Therefore, setting execution threads to 1 isn't a solution either. The value used for jitter is also not exposed, but even if it was, I'm not sure that would be helpful. It seems reasonable that the system's retries that already exist should be able to complete the metadata update at some point, given that it tries for many minutes after the initial failure.
Reproducing the Error
Because this is a race condition, it's not possible to reproduce on demand. However, I see this problem in our production servers (which have hundreds of topics) on a daily basis with up to dozens of orphaned partitions.
Error messages
Oct 18 06:31:14 pulsar-broker-0 pulsar-broker INFO [metadata-store-10-1] INFO org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl - Update key /admin/partitioned-topics/ourtenant/ourapp/persistent/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75 conflicts. Retrying in 5 ms. Mandatory stop: false. Elapsed time: 1760783474475 ms
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [delayer-47-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75/partitions?force=false&deleteSchema=true] Failed to perform http delete request: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
Oct 18 06:36:14 pulsar-broker-0 pulsar-broker WARN [metadata-store-10-1] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://ourtenant/ourapp/ourapp.v1.725feb06-ab85-11f0-9600-c297ef652e75-partition-5] Inactive topic deletion failed
java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.PulsarAdminException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Request timeout
Reproducing the issue
Topic auto deletion is a great feature, but without the ability to rely 100% on all partitions of a topic being deleted, we will not be able to use it because it makes the system unclean.
Additional information
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!