-
Notifications
You must be signed in to change notification settings - Fork 84
fix: xmp jpeg write #1156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: xmp jpeg write #1156
Conversation
Adding the manifest url to large XMP segments could cause the XMP data to exceed the limit of 65503 bytes. There were 3 causes of this issue: 1. Extra whitespace from formatting. The quick_xml writer shiftwidth could be much larger than that of the original. Potentially adding several KB of whitespace. 2. Manifest url added multiple times. The manifest url was added to multiple keys within the XMP data. 3. Lack of padding awareness. According to the spec, XMP segments should have 2KB to 4KB of padding. The padding should be adjusted to keep the XMP data length the same with the addition of the remote manifest url. This is important for data hashes as it keeps the offsets of the JPEG metadata the same.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1156 +/- ##
==========================================
+ Coverage 79.29% 79.31% +0.02%
==========================================
Files 147 147
Lines 40701 40743 +42
==========================================
+ Hits 32274 32316 +42
Misses 8427 8427 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -620,7 +620,7 @@ impl RemoteRefEmbed for JpegIO { | |||
.unwrap_or((None, MIN_XMP)); | |||
|
|||
// add provenance and JPEG XMP prefix | |||
let xmp = format!("{XMP_SIGNATURE}\0 {}", add_provenance(xmp, &manifest_uri)?); | |||
let xmp = format!("{XMP_SIGNATURE}\0{}", add_provenance(xmp, &manifest_uri)?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra space caused there to be a leading space in the XMP data
sdk/src/utils/xmp_inmemory_utils.rs
Outdated
// Minimal padding should be 2 KB to 4 KB. This is used if no XMP packet is found. | ||
let mut target_length = 4096; | ||
// Remove trailing xpacket if present for easier manipulation | ||
let xpacket_end_single = "<?xpacket end='w'?>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single quotes are what the spec has, but we specifically test for double quotes. The test images from cameras also use double quotes.
<?xpacket end="w"?>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be more flexible. You may also encounter end='r'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could be a good citizen and respect the 'r' which means read only.
@@ -130,13 +149,14 @@ fn add_xmp_key(xmp: &str, key: &str, value: &str) -> Result<String> { | |||
if !added { | |||
// didn't exist, so add it | |||
elem.push_attribute((key, value)); | |||
added = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of added = true
is why the remote manifests url was being added multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tracking this down!
sdk/src/utils/xmp_inmemory_utils.rs
Outdated
)] | ||
#[cfg_attr(target_os = "wasi", wstd::test)] | ||
async fn test_broken_xmp_read_write_stream() { | ||
let source_bytes = include_bytes!("../../tests/fixtures/broken.jpg"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not our file so I don't think we can include it in a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's too bad. It would be cool to have a picture from summit in the repo. :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ask Jen if we can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to keep our repo images small, this one is over 6mb.
sdk/src/utils/xmp_inmemory_utils.rs
Outdated
let xpacket_end_double = "<?xpacket end=\"w\"?>"; | ||
let xmp_body = if let Some(pos) = xmp.rfind(xpacket_end_single) { | ||
target_length = orig_length - xpacket_end_single.len(); | ||
xmp[..pos].trim_end() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented above, I would just search for "<?xpacket" since the rest of the tag may vary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need to be flexible reading here. I've also found cases where the end just wasn't there and other tools think it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the end is completely missing then I can't just look for <?xpacket
since that could be the beginning. The test add_xmp
does this.
xmp[..pos].trim_end() | ||
} else if let Some(pos) = xmp.rfind(xpacket_end_double) { | ||
target_length = orig_length - xpacket_end_double.len(); | ||
xmp[..pos].trim_end() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change I suggested about removes the else if case.
@@ -177,6 +198,8 @@ fn add_xmp_key(xmp: &str, key: &str, value: &str) -> Result<String> { | |||
} | |||
} | |||
} | |||
let padding_length = target_length.saturating_sub(writer.get_ref().get_ref().len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this take into account the existing padding to keep the length the same if possible? This seems like it would not take into account the existing padding. Is there somewhere you account to the existing padding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Line 107
target_length = orig_length - xpacket_end_length;
3fa75eb
to
c3710b7
Compare
Changes in this pull request
Adding the manifest url to large XMP segments could cause the XMP data
to exceed the limit of 65503 bytes.
There were 3 causes of this issue:
could be much larger than that of the original. Potentially adding
several KB of whitespace.
multiple keys within the XMP data.
have 2KB to 4KB of padding. The padding should be adjusted to keep
the XMP data length the same with the addition of the remote manifest
url. This is important for data hashes as it keeps the offsets of the
JPEG metadata the same.
Checklist
TO DO
items (or similar) have been entered as GitHub issues and the link to that issue has been included in a comment.