Make empty CQ init faster in case of clean shutdown #13856

gomoripeti · 2025-05-05T21:20:31Z

Proposed Changes

At CQ startup variable_queue went through each seqid from 0 to next_seq_id looking for the first message even if there were no messages in the queue (no segment files).

In case of a clean shutdown the value next_seq_id is stored in recovery terms. This value can be utilized by the queue index to provide better seqid bounds in absence of segment files.

Before this patch starting an empty classic queue with next_seq_id = 100_000_000 used to take about 26 seconds. With this patch it takes less than 1ms.

Fixes the empty classic queue part of #12848

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

Bug fix (non-breaking change which fixes issue [Questions] Classic queue conversion takes very long time after upgrade to 4.0 #12848)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause an observable behavior change in existing systems)
Documentation improvements (corrections, new content, etc)
Cosmetic change (whitespace, formatting, etc)
Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

I have read the CONTRIBUTING.md document
I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
I have added tests that prove my fix is effective or that my feature works
All tests pass locally with my changes
If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website
If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

Further Comments

gomoripeti · 2025-05-05T21:26:17Z

deps/rabbit/src/rabbit_classic_queue_index_v2.erl

+         bounds/2, next_segment_boundary/1]).
+
+%% Only used by tests
+-export([bounds/1]).


only backing_queue_SUITE:bq_queue_index test case uses bounds/1. If I understand correctly this test case tests the index module itself. I kept bounds/1 as the v1 index also has a function with the same signature (although that is not tested any more by backing_queue_SUITE and it will go away eventually) Maybe bq_queue_index should be modified to test bounds/2 instead, sometimes the NextSeqIdHint being undefined and sometimes an integer?

@gomoripeti that sounds reasonable to me. Let's do that in a follow-up PR?

michaelklishin

The Dialyzer failure is reproducible:

done (warnings were emitted)
  Checking whether the PLT /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/.rabbit.plt is up-to-date... yes
  Proceeding with analysis...
rabbit_classic_queue_index_v2.erl:439:1: Function recover_index_v1_clean/6 has no local return
rabbit_classic_queue_index_v2.erl:455:1: Function recover_index_v1_dirty/7 has no local return
rabbit_classic_queue_index_v2.erl:476:1: Function recover_index_v1_common/3 has no local return
rabbit_classic_queue_index_v2.erl:486:40: The call rabbit_classic_queue_index_v2:bounds
         (State0 ::
              #qi{queue_name :: #resource{},
                  dir :: nonempty_binary(),
                  write_buffer ::
                      #{non_neg_integer() =>
                            'ack' |
                            {binary(),
                             non_neg_integer(),
                             'memory' | 'rabbit_msg_store' |
                             'rabbit_queue_index' |
                             {'rabbit_classic_queue_store_v2',
                              non_neg_integer(),
                              non_neg_integer()},
                             #message_properties{expiry ::
                                                     'undefined' |
                                                     pos_integer(),
                                                 needs_confirming ::
                                                     boolean()},
                             boolean()}},
                  write_buffer_updates :: 0,
                  cache ::
                      #{non_neg_integer() =>
                            'ack' |
                            {binary(),
                             non_neg_integer(),
                             'memory' | 'rabbit_msg_store' |
                             'rabbit_queue_index' |
                             {'rabbit_classic_queue_store_v2',
                              non_neg_integer(),
                              non_neg_integer()},
                             #message_properties{expiry ::
                                                     'undefined' |
                                                     pos_integer(),
                                                 needs_confirming ::
                                                     boolean()},
                             boolean()}},
                  confirms :: sets:set(_),
                  segments :: #{non_neg_integer() => pos_integer()},
                  fds ::
                      #{non_neg_integer() =>
                            {'file_descriptor', atom(), _}},
                  on_sync :: fun((sets:set(_)) -> 'ok'),
                  on_sync_msg :: fun()},
          'undefined') breaks the contract 
          (State, non_neg_integer() | 'undefiend') ->
             {non_neg_integer(), non_neg_integer(), State}
             when State :: state()
rabbit_classic_queue_index_v2.erl:1198:1: Function bounds/1 has no local return
rabbit_classic_queue_index_v2.erl:1199:19: The call rabbit_classic_queue_index_v2:bounds
         (State :: any(),
          'undefined') breaks the contract 
          (State, non_neg_integer() | 'undefiend') ->
             {non_neg_integer(), non_neg_integer(), State}
             when State :: state()
 done in 0m13.12s
done (warnings were emitted)

gomoripeti · 2025-05-07T12:59:18Z

thanks for the heads up, indeed there is a typo undefiend in the type spec.

At CQ startup variable_queue went through each seqid from 0 to next_seq_id looking for the first message even if there were no messages in the queue (no segment files). In case of a clean shutdown the value next_seq_id is stored in recovery terms. This value can be utilized by the queue index to provide better seqid bounds in absence of segment files. Before this patch starting an empty classic queue with next_seq_id = 100_000_000 used to take about 26 seconds. With this patch it takes less than 1ms.

Make empty CQ init faster in case of clean shutdown (backport #13856)

lhoguin · 2025-05-09T08:41:29Z

deps/rabbit/test/backing_queue_SUITE.erl

+
+    %% set a very high next_seq_id as if 100M messages have been
+    %% published and consumed
+    Terms2 = lists:keyreplace(next_seq_id, 1, Terms, {next_seq_id, 100_000_000}),


We should probably test that the bounds returned by the index are correct.

I will work on a follow-up PR to update backing_queue_SUITE:bq_queue_index to test bounds/2 when there are messages in the queue. But what is a correct index range estimate for an empty queue? All bounds are correct overestimations. Maybe one property that can be checked that both LowSeqId and HighSeqId are '=< NextSeqId`?

We only care about v2 so Low = High = Next?

gomoripeti commented May 5, 2025

View reviewed changes

michaelklishin requested changes May 6, 2025

View reviewed changes

gomoripeti force-pushed the faster_empty_cq_init branch from ea2e5d2 to 150172f Compare May 7, 2025 13:00

michaelklishin added the backport-v4.1.x label May 8, 2025

michaelklishin added this to the 4.2.0 milestone May 8, 2025

michaelklishin merged commit d27d5c4 into rabbitmq:main May 8, 2025
269 of 271 checks passed

mergify bot mentioned this pull request May 8, 2025

Make empty CQ init faster in case of clean shutdown (backport #13856) #13870

Merged

12 tasks

michaelklishin added a commit that referenced this pull request May 8, 2025

Merge pull request #13870 from rabbitmq/mergify/bp/v4.1.x/pr-13856

b804543

Make empty CQ init faster in case of clean shutdown (backport #13856)

lhoguin reviewed May 9, 2025

View reviewed changes

gomoripeti mentioned this pull request May 22, 2025

Add tests for rabbit_classic_queue_index_v2:bounds/2 #13932

Merged

12 tasks

mergify bot mentioned this pull request May 23, 2025

Add tests for rabbit_classic_queue_index_v2:bounds/2 (backport #13932) #13937

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make empty CQ init faster in case of clean shutdown #13856

Make empty CQ init faster in case of clean shutdown #13856

Uh oh!

gomoripeti commented May 5, 2025

Uh oh!

gomoripeti May 5, 2025

Uh oh!

michaelklishin May 8, 2025

Uh oh!

lhoguin May 9, 2025

Uh oh!

michaelklishin left a comment

Uh oh!

gomoripeti commented May 7, 2025

Uh oh!

Uh oh!

lhoguin May 9, 2025

Uh oh!

gomoripeti May 9, 2025

Uh oh!

lhoguin May 9, 2025

Uh oh!

Uh oh!

Make empty CQ init faster in case of clean shutdown #13856

Make empty CQ init faster in case of clean shutdown #13856

Uh oh!

Conversation

gomoripeti commented May 5, 2025

Proposed Changes

Types of Changes

Checklist

Further Comments

Uh oh!

gomoripeti May 5, 2025

Choose a reason for hiding this comment

Uh oh!

michaelklishin May 8, 2025

Choose a reason for hiding this comment

Uh oh!

lhoguin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

michaelklishin left a comment

Choose a reason for hiding this comment

Uh oh!

gomoripeti commented May 7, 2025

Uh oh!

Uh oh!

lhoguin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

gomoripeti May 9, 2025

Choose a reason for hiding this comment

Uh oh!

lhoguin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!