-
Notifications
You must be signed in to change notification settings - Fork 473
Use eatmydata in CI #33108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use eatmydata in CI #33108
Conversation
3912933
to
6a6f301
Compare
7d08525
to
a40247e
Compare
ab0f0e9
to
e584425
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be amazing if it works reliably!
But I'm wondering about those tests that kill containers. I guess not all of them use S3, right? Where do these write their Persist data?
I have not seen any misbehavior in tests that kill containers. I could disable eatmydata for them if we are worried. |
Well, it might be that it works most of the time, but sometimes would produce mysterious CI fails. So, I'd say we need to make those tests that kill containers not use eatmydata. |
I'm not sure. Maybe it's an even better test to not have fsync in those tests and make sure Materialize can still recover? fsync is not always implemented properly in the OS/filesystem/disk. Otherwise we are baking the behavior of our specific CI agents into our assumptions (ext4 with ordered writes, reliable SSDs, etc) (Apparently the POSIX definition says fsync may just do nothing, which is kind of the motivation for Eat My Disk. Fun video, but unfortunately bad quality: https://www.youtube.com/watch?v=LMe7hf2G1po) |
Based on comparing against Nightly on main this seems to speed up by ~1 minute on average in tests. The main thing it does is use https://github.com/stewartsmith/libeatmydata to make fsync a no-op because we don't persist anything of importance in tests anyway.
The feature benchmark run does a good job showing the difference of using eatmydata:
Google Sheet: https://docs.google.com/spreadsheets/d/1ZOOZJBM0OmsyjsoEIdn1lT_crjDQUfjmMdqs04RJlbc/edit?gid=2146535294#gid=2146535294
In https://buildkite.com/materialize/nightly/builds/12652#01983260-bac3-452a-ae1e-db06ddb71b08
This actually makes me wonder: Are we maybe fsyncing too much? If we are only writing temporary state that a restarted Materialize won't use, maybe we should use eatmydata in production? :D
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.