-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a SHA256 hash of file content #4195
Comments
SHA256 can be embedded in the PDF.js only if it will be part of the signature verification (see #1076). Implementing it as you specified above sounds like a custom solution and does not match the PDF32000 specification.
The overall solution must be released under GPL license and we cannot do that (so it kinda compatible but for GPL folks).
It would be best if author will contribute to the project under Apache 2 license. (Not sure if we need all algorithms though) PDF32000 lists: SHA1, SHA256, SHA384, SHA512 and RIPEMD160 (and MD5) only. |
Yes, this is custom solution to know if file once opened in PDF.js is exactly the same (byte for byte) as it was opened at some time before. As you are already providing a custom checksum function, I do not see the reason why not implement that as well. Maybe custom checksum function could be extended so that it can get a optional parameter which tells which algorithm to use: SHA256, your current one (the default) or something else. |
What custom checksum function? If you mean https://github.com/mozilla/pdf.js/blob/master/src/core/core.js#L488 , then it's a psedo-unique pdf identifier (for corrupted pdf documents). Requirement for this function was to not download entire document. What you are asking is to download whole data, which is unacceptable for our use case. |
I am saying, that by default |
Or, what about a way to plug-in custom fingerprint functions from outside? So this could be a very simple API where you could assign function to some API object, like |
That's something out of scope of this project. You can easily incorporate the solution above into your build process, e.g. via applying a simple patch for core.js. As mentioned in #1076, we will be glad to accept a pull request that will implement (at least partially) a digital signature verification. Closing as won't fix for now. |
We are creating an open source cloud service (http://peerlibrary.org) where users can import PDFs and have them displayed with PDF.js. Because users can request to load external PDFs we are in need to know if PDF loaded from the external URL is same as the one it was initially imported - to verify that file has not changed. So currently we load file and compute SHA256 hash first to verify and then open with PDF.js if it passes. It would be much better if we could use PDF.js directly for this so that we can reuse all the worker and PDF transmission capabilities. From what I see this would be easy to add (simply add another API call, message to the worker, which then uses
GetData
to get data and compute hash). Is this something which would core PDF.js be interested to have if I make a pull request?We tested many implementations of SHA256 hash function and digest.js seems fastest, because it uses typed arrays. Would be use of that library be OK for PDF.js? Or does this limit too much browsers which PDF.js wants to support?
Additionally, library uses GPLv3 which is incompatible with Apache 2 license of PDF.js. I can try to obtain permission from the author for inclusion under Apache 2 license (only one author has contributed all the code).
If not, is there currently an easy way to extend PDF.js with additional API method from outside the code, without modifying the code (but maybe just extending prototypes)? Is there some plugin architecture in place?
We would like to use a secure hash function to have assurance file has not been changed in any malicious way after initial import, so we would not like to use MD5 hash. Our service might be used for sensitive content someday.
The text was updated successfully, but these errors were encountered: