Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An example of XMP metadata that triggers the error "Error writing PNG: XMP metadata contains an invalid null character" in avifpng.c #1333

Open
wantehchang opened this issue Apr 11, 2023 · 23 comments
Assignees

Comments

@wantehchang
Copy link
Collaborator

Yannis: Please take a look at the XMP metadata in this image and see if avifpng.c is rejecting it correctly.

Steps to reproduce:

  1. Download moon.jpg from https://crbug.com/1405727.
  2. avifenc moon.jpg moon.avif
  3. avifdec moon.avif moon.png

The avifdec command fails with the error message:

Error writing PNG: XMP metadata contains an invalid null character

@wantehchang
Copy link
Collaborator Author

Running avifdec in the debugger, I found that the null character is at the end of the avif->xmp.data buffer.

@wantehchang
Copy link
Collaborator Author

The XMP specification says "When XMP is encoded as UTF-8, there are no zero bytes in the XMP Packet." It seems that the XMP metadata in moon.jpg has an XMP packet that is incorrectly null-terminated.

@y-guyon
Copy link
Collaborator

y-guyon commented Apr 12, 2023

Error writing PNG: XMP metadata contains an invalid null character

comes from

// The iTXt XMP payload may not contain a zero byte according to section 4.2.3.3 of
// the PNG specification, version 1.2.
if (memchr(avif->xmp.data, '\0', avif->xmp.size)) {
fprintf(stderr, "Error writing PNG: XMP metadata contains an invalid null character\n");
goto cleanup;

I assume the issue is that avifenc can generate a file that avifdec refuses. Solutions I can think of:

  • Truncate the XMP chunk during PNG import when it ends by a null character (suggested solution)
  • Truncate the XMP chunk during PNG export when it ends by a null character
  • Truncate the XMP chunk during AVIF export when it ends by a null character
  • Remove the XMP chunk during PNG import when it contains a null character
  • Remove the XMP chunk during PNG export when it contains a null character
  • Remove the XMP chunk during AVIF export when it contains a null character
  • Replace null characters in XMP chunk by spaces or newlines or some other delimiter during AVIF export
  • Remove the check during PNG export
  • Add the check during PNG import

Opinions?

@kmilos
Copy link
Contributor

kmilos commented Apr 12, 2023

IMHO, it makes sense to try and sanitize the XMP chunk at the point of import, from any format (this is not tied to PNG spec only, this is in the XMP spec), so you don't even produce an "invalid" AVIF in the first place. I think this is the approach we took in exiv2 as well as libexpat was throwing a similar error and refused to parse it from a sample JPEG...

@y-guyon
Copy link
Collaborator

y-guyon commented Apr 12, 2023

@kmilos Thanks for the info. Do you have any pointer to exiv2 or libexpat code implementing what you described?

@kmilos
Copy link
Contributor

kmilos commented Apr 12, 2023

Fond it - take a look at Exiv2/exiv2#2133 and the related issue report.

libexpat just assumes its input is not terminated and simply barfs if it is, it has no workarounds for it. I remember even looking into to the libexpat code to convince myself that the nul character was explicitly in the list of invalid XML characters they check for...

@wantehchang
Copy link
Collaborator Author

We need to first make sure the terminating null byte comes from moon.jpg rather than our code in avifJPEGRead(). That is, we should verify that libjpeg does not null-terminate the data in marker->data and include the appended null byte in marker->data_length. I believe this is the case, but it would be good for another person to doublecheck it.

Once we know avifJPEGRead() is correct, then we change it to handle this kind of XMP metadata error. We can either report this error or fix up this error. If we fix up this error, I suggest we only strip the null byte at the end because it is very likely a programming error. If a null byte occurs in the middle of an XMP packet, that is more likely corrupted data and we should not try to fix that up.

We can leave the check during PNG export unchanged.

@kmilos
Copy link
Contributor

kmilos commented Apr 12, 2023

This has nothing to do w/ libjpeg per se, who knows how that original file was created....

@jzern
Copy link
Collaborator

jzern commented Apr 12, 2023

As noted, exiv2 (0.27.5) fails to parse the file:

$ exiv2 -p x print moon.jpg 
Error: XMP Toolkit error 201: Error in XMLValidator
Warning: Failed to decode XMP metadata.

exempi (version 2.6.3) will dump the packet, however.

$ exempi -x moon.jpg 
dump_xmp for file moon.jpg
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Exempi + XMP Core 6.0.0">
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about=""
...

@kmilos
Copy link
Contributor

kmilos commented Apr 12, 2023

As noted, exiv2 (0.27.5) fails to parse the file:

Yep, the fix is in exiv2 0.27.6

Btw, exiftool -v5 can also provide insight into all of the chunks for most formats, and for moon.jpg I do see the culprit in the file:

JPEG APP1 (3629 bytes):
    011c: 68 74 74 70 3a 2f 2f 6e 73 2e 61 64 6f 62 65 2e [http://ns.adobe.]
    012c: 63 6f 6d 2f 78 61 70 2f 31 2e 30 2f 00 3c 3f 78 [com/xap/1.0/.<?x]
    013c: 70 61 63 6b 65 74 20 62 65 67 69 6e 3d 22 ef bb [packet begin="..]
    [snip]
    0f2c: 20 20 20 20 20 20 20 20 20 3c 3f 78 70 61 63 6b [         <?xpack]
    0f3c: 65 74 20 65 6e 64 3d 22 77 22 3f 3e 00          [et end="w"?>.]

@y-guyon
Copy link
Collaborator

y-guyon commented Apr 13, 2023

We need to first make sure the terminating null byte comes from moon.jpg rather than our code in avifJPEGRead(). That is, we should verify that libjpeg does not null-terminate the data in marker->data and include the appended null byte in marker->data_length. I believe this is the case, but it would be good for another person to doublecheck it.

I confirm that at the following point:

avifImageSetMetadataXMP(avif, standardXMPData, standardXMPSize);

standardXMPData[standardXMPSize-1] is 0 when reading moon.jpg and 62 when reading paris_exif_xmp_icc.jpg.

@wantehchang
Copy link
Collaborator Author

Miloš, Yannis: Thank you very much for confirming our code is correct and the error is in the moon.jpg file.

Should we report an error if an XMP packet is not valid UTF-8? Or just check if the XMP packet contains a zero byte (similar to the check in avifpng.c)?

Our error message can point to tools that can be used to inspect and correct the errors in the XMP packet.

@y-guyon
Copy link
Collaborator

y-guyon commented Apr 14, 2023

The XMP specification says "When XMP is encoded as UTF-8, there are no zero bytes in the XMP Packet."

I found that sentence only in XMP spec Part 3 section 1.1.2 GIF (Graphic Interchange Format). Does it apply to any XMP chunk in any format (including JPEG)? The PNG spec forbids it too, I did not check other formats.

Should we report an error if an XMP packet is not valid UTF-8? Or just check if the XMP packet contains a zero byte (similar to the check in avifpng.c)?

I started writing a change involving a new avifFixImageXMP() in avifutil.h, that is called from avifJPEGRead() and avifPNGRead(). It just trims trailing \0s. It could also be called during AVIF decoding.
If \0 is forbidden in XMP chunks in JPEG and in AVIF, we can add a check at AVIF encoding. Otherwise, I suggest we just leave it like that: carry any non-trailing \0 over to AVIF and JPEG exports.

@kmilos
Copy link
Contributor

kmilos commented Apr 14, 2023

I started writing a change involving a new avifFixImageXMP() in avifutil.h, that is called from avifJPEGRead() and avifPNGRead().

I feel you should do it at a single/common entry point (e.g. avifImageSetMetadataXMP() or some such), not for every possible image format (easy to forget when new ones are added)...

If \0 is forbidden in XMP chunks in JPEG and in AVIF,

It is forbidden in UTF-8 XML period, nothing to do w/ images. As I said, libexpat just flatly refuses to parse null-terminated XML, no matter where it comes from. From https://en.wikipedia.org/wiki/Valid_characters_in_XML:

Note that the code point U+0000, assigned to the null control character, is the only character encoded in Unicode and ISO/IEC 10646 that is always invalid in any XML 1.0 and 1.1 document.

@y-guyon
Copy link
Collaborator

y-guyon commented Apr 14, 2023

I feel you should do it at a single/common entry point (e.g. avifImageSetMetadataXMP() or some such), not for every possible image format (easy to forget when new ones are added)...

I considered that, but wondered whether XMP chunks provided by users through the API should be altered. For example, someone setting 100 bytes of XMP, then accessing avifImage.xmp.data[99]. If the behavior is described in avif.h, I guess that is fine.
avifImageSetMetadataXMP() could also check for non-trailing \0 and return an avifBool.
The main issue I had with this approach is that there is currently no single entry point. Extended XMP in avifJPEGRead() does not use avifImageSetMetadataXMP().

not for every possible image format (easy to forget when new ones are added)...

I agree. Although it is just a few formats in libavif (for now...).

It is forbidden in UTF-8 XML period, nothing to do w/ images.

Great, this simplifies the fix. Thanks for the reference.

@wantehchang
Copy link
Collaborator Author

wantehchang commented Apr 15, 2023

Yannis: Since I filed this issue, I wanted to clarify that after confirming our code is correct and the error is in the moon.jpg file, I think our current policy is fine (importing metadata faithfully, and only checking for errors that break the output format, such as zero bytes for a PNG iTXt chunk). As the bug reporter, I think it is okay to close this issue.

If you'd like to remove trailing null characters on import, I suggest we remove only one (to handle the off-by-one bug of including the terminating null character of a null-terminated string). Fixing up errors will hide them and make it harder to track down the source of the errors, so if an extra terminating null character is not a common error, we don't really need to fix it up.

@kmilos
Copy link
Contributor

kmilos commented Apr 15, 2023

Well, then the bug is that you only check/warn for PNG, while it should be for all output formats.

@y-guyon
Copy link
Collaborator

y-guyon commented Apr 17, 2023

I think an avifenc success followed by an avifdec error is a concern and should be fixed. Either both should succeed or both should fail. avifenc should not generate files that avifdec cannot convert to a widespread format such as PNG. I suggest avifenc to choke on \0, because we already check Exif. The alternative would be to make avifdec succeed with a warning such as The XMP chunk was discarded because it contains an invalid character.

In parallel, removing a single trailing null character on any import seems acceptable.

@kmilos
Copy link
Contributor

kmilos commented Apr 17, 2023

As an alternative, you could just not fail at all on either avifdec or avifenc, and just copy the XMP chunk around as is, and defer the problem to other 3rd party apps, with preferably just a "YMMV" style warning given to the user if this condition was detected.

Or add an avifenc option to chomp invalid XML if keeping the original is imperative (or the other way around - default to valid output, but give an option to keep original)...

@StefanOltmann
Copy link

I found this topic very interesting!

I encountered the same issue when I discovered that the pre-installed files IMG_0001.jpg and IMG_0002.jpg in the iOS simulator also have the illegal NUL terminator.

Initially, I considered whether the issue should be addressed in https://github.com/Ashampoo/kim or not. However, I realized that even the command "exiftool -xmp -b IMG_0001.JPG > out.xmp" exports the NUL character at the end. Therefore, it seems correct that the XMP string is exported as is - with all errors.

In the end, I made the decision that the proper approach would be to handle this issue within the XMP parser. I fixed it in my case in https://github.com/Ashampoo/xmpcore/pull/4/files.

I want to express my gratitude for all the insights shared during this discussion. Thank you!

@kmilos
Copy link
Contributor

kmilos commented Jul 10, 2023

However, I realized that even the command "exiftool -xmp -b IMG_0001.JPG > out.xmp" exports the NUL character at the end. Therefore, it seems correct that the XMP string is exported as is - with all errors.

Not quite. Note that the -b option just dumps the raw byte stream without any parsing, it doesn't care/know if it's XML or not. If exiftool were to parse & copy instead, it might drop it...

@StefanOltmann
Copy link

Not quite. Note that the -b option just dumps the raw byte stream without any parsing, it doesn't care/know if it's XML or not. If exiftool were to parse & copy instead, it might drop it...

Yes, that's correct. If ExifTool parses the actual XMP and writes new it looks very different.

In my case I have two libraries: Kim can read EXIF and extract the raw XMP string and XMP Core is for actual parsing of XMP.

At first I was unsure where I should fix the problem and wanted Kim to output a trimmed XMP string. But after seeing that ExifTool extracts the untouched original I decided to do the same. The XMP parser must be the one resilient to this problems.

ExifTool ignores the invalid char if parsing the data, but it exports the RAW data exactly as it is. I think this is the correct behavior.

@StefanOltmann
Copy link

I just noticed that also the XMP of the embedded preview JPG of my RAF files produced by my X-T4 have NUL characters at the end. In that case two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants