Skip to content

Conversation

@cezarmoise
Copy link
Contributor

@cezarmoise cezarmoise commented Jun 11, 2025

With more nodes in cluster, it may take longer for the manager to start.
The check for 6+ nodes was not enough, there are tests with 5 nodes where the server may fail to start in 3 minutes.
Extend the timeout to 5 minutes should cover most cases of slow start and only leave actual issue to throw an exception here.

Example failure with 5 nodes: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/cezar/job/dtest-byo/70/testReport/manager_restore_tests/TestScyllaMgmtRestore/FullDtest___full_split001___test_restore_different_size_cluster_rclone_5_3_/
Example failure with 4 nodes: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/cezar/job/dtest-byo/75/testReport/manager_backup_tests/TestScyllaMgmtBackup/FullDtest___full_split001___test_shutting_down_node_during_backup_native_/

With more nodes in cluster, it may take longer for the manager to start.
The check for 6+ nodes was not enough, there are tests with 5 nodes where
the server may fail to start in 3 minutes.
Extend the timeout to 5 minutes should cover most cases of slow start and only
leave actual issue to throw an exception here.
@mykaul
Copy link
Contributor

mykaul commented Jun 12, 2025

I hope you've also filed a bug on manager, being slow to start!

@fruch
Copy link
Contributor

fruch commented Jun 12, 2025

I hope you've also filed a bug on manager, being slow to start!

it's already analyzed and figured out in:
#622

manager dtests are using the scylla under test also for the the manager itself
hence making a bigger cluster, slows down the initial schema changes manager is doing.

the actual solution would be refactor the tests to use separate scylla 1 node cluster for manager, but I don't see anyone that would follow that anytime soon...

@cezarmoise cezarmoise self-assigned this Jun 12, 2025
@cezarmoise
Copy link
Contributor Author

cezarmoise commented Jun 16, 2025

@fruch fruch merged commit bd1fe23 into scylladb:master Jun 16, 2025
4 checks passed
@fruch
Copy link
Contributor

fruch commented Jun 16, 2025

@cezarmoise
you'll need to take latest ccm now into dtest

@cezarmoise cezarmoise deleted the increase-manager-start-timeout branch June 17, 2025 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants