Check integrity early for smaller files #164

JoeyBF · 2024-08-11T07:40:37Z

This makes the save mechanism sturdier by catching malformed files early and overwriting them. Note that quasi-inverses are the only files that we stream in without reading completely, because they are so massive; other files fit in memory easily, so it's not a problem to load them early.

Compare with #107

JoeyBF · 2024-10-23T05:08:46Z

Any feedback for this PR?

hoodmane · 2024-08-11T11:31:29Z

ext/src/save.rs

+            std::fs::remove_file(&path)
+                .unwrap_or_else(|e| panic!("Error when deleting {path:?}: {e}"));


Perhaps you should instead move these to a corrupted files directory? It might be useful to be able to look at them to find out what went wrong, this just destroys the evidence.

Sure, that sounds like a good idea. Or maybe rename them with a ".old" suffix, and silently delete the previous .old version if there is one

hoodmane · 2024-10-23T09:44:19Z

ext/src/save.rs

+    fn should_check_early(&self) -> bool {
+        !matches!(
+            self.kind,
+            SaveKind::AugmentationQi | SaveKind::NassauQi | SaveKind::ResQi
+        )
+    }


This makes sense as an improvement over the status quo but there may be a better design of the file format that would allow checking only the section of the file that is used? For instance you could place a checksum every 4kb or imitate the zip file format. Another possibility is there might be an existing container format that is appropriate and that could make the checksums transparent to our code.

I looked around and didn't see a generic container that would do that for us, but one interesting option would be to always output zstd-compressed data. Zstd has options to do frame-level and block-level checksumming, and we already support reading compressed files.

Also, I just checked with our stem 200 data and, apart from the quasi-inverses, the biggest files only take a few MB, with the vast majority taking less than 1 kB. In fact, the first thing that the code does after opening those files is reading the entire contents, so we're really just prefetching

hoodmane · 2024-10-23T09:44:29Z

Thanks!

JoeyBF added 2 commits August 11, 2024 03:31

Check integrity early for smaller files

491d87f

Slight cleanup

014eec5

hoodmane approved these changes Oct 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check integrity early for smaller files #164

Check integrity early for smaller files #164

Uh oh!

JoeyBF commented Aug 11, 2024

Uh oh!

JoeyBF commented Oct 23, 2024

Uh oh!

hoodmane Aug 11, 2024

Uh oh!

JoeyBF Oct 26, 2024

Uh oh!

hoodmane Oct 23, 2024

Uh oh!

JoeyBF Oct 27, 2024

Uh oh!

hoodmane commented Oct 23, 2024

Uh oh!

Uh oh!

		std::fs::remove_file(&path)
		.unwrap_or_else(\|e\| panic!("Error when deleting {path:?}: {e}"));

Check integrity early for smaller files #164

Are you sure you want to change the base?

Check integrity early for smaller files #164

Uh oh!

Conversation

JoeyBF commented Aug 11, 2024

Uh oh!

JoeyBF commented Oct 23, 2024

Uh oh!

hoodmane Aug 11, 2024

Choose a reason for hiding this comment

Uh oh!

JoeyBF Oct 26, 2024

Choose a reason for hiding this comment

Uh oh!

hoodmane Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

JoeyBF Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

hoodmane commented Oct 23, 2024

Uh oh!

Uh oh!