Skip to content

Conversation

@varmint708
Copy link
Contributor

Possible race condition when the file is being updated by different process while simultaneously being checksummed by chkbit. (discovered and tested on macos, gave quite a scare there, might need some testing on other platforms)

Possible race condition when the file is being updated by different
process while simultaneously being checksummed by chkbit.
@laktak
Copy link
Owner

laktak commented Oct 12, 2025

Thanks for looking into this but AFAIK this is not a problem you can actually fix, at least not in this way.

Backups expect you not to write to the files you are backing up. If you do you will get inconsistent data. Imagine you backup part 1 of a database, then someone writes to part 1 before you backup the rest. This is not much different for chkbit when we calculate the checksum.

For backups on modern filesystems this can be solved with snapshots. I haven't looked into doing this with chkbit.

How did you test your fix?

One workaround would maybe getting the mtime before and after calculating the checksum and then rerun the checksum if the mtime has changed. Though even then it depends a lot on the filesystem and how often the mtime is updated.

@varmint708
Copy link
Contributor Author

Thanks for looking into this but AFAIK this is not a problem you can actually fix, at least not in this way.

you are right, looking at it again, it doesn't fix it. anyways will cancel this PR.

Backups expect you not to write to the files you are backing up. If you do you will get inconsistent data. Imagine you backup part 1 of a database, then someone writes to part 1 before you backup the rest. This is not much different for chkbit when we calculate the checksum.

Agreed, i have a few local processes writing simultaneously, i should probably re-think my design there.

For backups on modern filesystems this can be solved with snapshots. I haven't looked into doing this with chkbit.

ok quickly tested this theory, used tmutil snapshot and mounted it in /tmp/ and did chkbit, that solved it for now. i checked backup util and they also prefer snapshot first, so could be a good idea to add this :). Thanks.

How did you test your fix?

i have local process which continuously writes to disk in a specific folder. if i run chkbit on this folder it returns DMG sometimes, but if i rsync it to another folder first and run chkbit there, all is good, or even running on backup disk shows no damage, so test was simple, rerun chkbit in this folder 10~12 times to see if it shows DMG.

One workaround would maybe getting the mtime before and after calculating the checksum and then rerun the checksum if the mtime has changed. Though even then it depends a lot on the filesystem and how often the mtime is updated.

Agreed, this may alleviate the problem a bit, so we can try this before snapshotting.

@varmint708 varmint708 closed this Oct 12, 2025
varmint708 added a commit to varmint708/chkbit that referenced this pull request Oct 14, 2025
Check if mtime before and after hash calculation is matching, not sure how bad it is going to be from performace point of view, but this is temporary fix as discused in PR 35 here: laktak#35
varmint708 added a commit to varmint708/chkbit that referenced this pull request Oct 14, 2025
Check if mtime before and after hash calculation is matching,
not sure how bad it is going to be from performace point of
view, but this is temporary fix as discused in PR 35 here:
laktak#35
@varmint708 varmint708 deleted the fix-race-condition branch October 14, 2025 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants