Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFDocument.SaveToMetaData() will crash spuriously #96

Closed
GerHobbelt opened this issue Oct 6, 2019 · 2 comments
Closed

PDFDocument.SaveToMetaData() will crash spuriously #96

GerHobbelt opened this issue Oct 6, 2019 · 2 comments
Labels
🐛bug Something isn't working 🕵investigate Needs further analysis to find the root cause.
Milestone

Comments

@GerHobbelt
Copy link
Collaborator

GerHobbelt commented Oct 6, 2019

Running the app at a machine load of sustained 100% (thanks to Qiqqa OCR background processes).

The crash occurs when you edit the BibTeX of a document in the Sniffer, but CPU load is so high that the responsiveness is reduced, so by the time Qiqqa has queued the document for storage in a background task (passing an implicit object reference of pdf_document) your next keypress == user edit comes in and it can happen that the Write action coincides such that .NET internals will cause the JSON serializer to barf because "the list has changed" -- which it did!

[EDIT] in my case it was the annotations list, but PDFDocument has several List<string> amd the error is very time-dependent hence hard to reproduce. Anyway, the key point here is that in a multithreaded environment, one thread Add or Remove-ing to a List while another thread does a List walk (via the iterator, obviously), then .NET internals MAY detect this modifying-while-sequencing anomaly and barf a hairball.

Either we throw critical sections all over the place or we change the application such that it passes a COPY of the pdf_document across thread boundaries.

Re that lingering user edit due to slow responsiveness: that SHOULD be dealt with already by yet another Save instruction being queued in that same thread-crossing queue as before.

[EDIT] what I meant here is: will that parallel (and possibly shortly afterwards completed) edit be recognized as yet another change and trigger a store-to-disk action again? AFAICT Qiqqa SHOULD cope with this correctly, except there's a timestamp involved that doesn't guarantee a difference detect when set follows reset within the same time unit (1/100th of a second? 1ms? OS dependent? The time-marker should be critical sectioned and produce an identifiable increment for every invocation, so that even edge cases such as this scenario will get detected properly. #97)

Hence at least SaveToMetaData() has to be made thread-safe by copy-instancing the pdf document and the metadata therein, i.e. a (semi)DEEP COPY is called for AFAICT.

@GerHobbelt GerHobbelt added 🐛bug Something isn't working 🕵investigate Needs further analysis to find the root cause. labels Oct 6, 2019
@GerHobbelt GerHobbelt added this to the v82 milestone Oct 6, 2019
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 6, 2019
…a reference to the save logic. (**Incidentally, there are other thread crossings for pdf_document so we'll have to investigate this further as it's not just SAVE activity that's endangered by spurious crashes in annotation, tags or inks lists.**) Everybody should go through the QueueToStorage() API, by the way.
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 9, 2019
- added WPFDoEvents API for waiting for the UI thread to run its course (UI responsiveness), which is a strengthened version of DoEvents()
- added WPFDoEvents APIs for mouse cursor Hourglass/Busybee override and reset: during application startup, this is used to give visual feedback to the user that the work done is taking a while (relatively long startup time, particularly for large libraries which are auto-opened due to saved tab panel sets including their main library tabs)
- fixed a couple of crashes, particularly one in the RemarkOnException handler which crashed due to an exception being reported during application shutdown in one particular test run. (hard to reproduce issue, while we were hunting for causes of jimmejardine#98 / jimmejardine#96)
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 9, 2019
@GerHobbelt
Copy link
Collaborator Author

The crashes in Qiqqa (as reported in this issue and others, e.g. #98) are at least in part due to thread-unsafe coding of the PDFDocument class, while references to PDFDocument instances are poassed to multiple threads and async event handlers.

After some contemplation, it was decided to make PDFDocument threadsafe by introducing mandatory lock()ing inside every public interface.

To ensure serialization and regular within-thread larger code thunks working on a PDFDocument instance is still performant and backwards compatible, the original PDFDocument class was moved into a "ThreadUnsafe" namespace and renamed to PDFDocument_ThreadUnsafe so every bit that uses this unsafe codebase directly will be utterly aware that it is doing so. The new PDFDocument implementation is a wrapper (facade pattern) around this old unsafe code, while the new PDFDocument API is made threadsafe by properly applying lock-ing everywhere.

Remaining suspect areas

Inks, Annotations and Highlights still pass their lists by reference to callers instead of (deep) copying, hence these bits are still thread-unsafe while the code in PDFDocument might look safe there as there's a basic lock() provided, but as anyone savvy in multiprocess coding knows, that cute critical section is not enough.

This is a TODO that should be addressed in a separate issue.

GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 15, 2019
…e (apart from the inks, annotations and highlights attribute lists). Fixes jimmejardine#96.

- improve the internal code of `CloneExistingDocumentFromOtherLibrary_SYNCHRONOUS()` to help ensure we will be able to copy/transfer all metadata from the sourcing document to the clone. Code simplification.
- code simplification/performance: remove useless LINQ overhead
- `SignalThatDocumentsHaveChanged()`: improve code by making sure that a reference PDF is only provided when there really is only a single PDF signaled! Also see the spots where this API is being used.
- belated commit of AlphaFS upgrade/refactoring for jimmejardine#106 ; a multi-pass code review has been applied to uncover all places where (file/directory) paths are manipulated/processed in the Qiqqa code. TODO: Streams.
- redesign the ADD/REMOVE/GET-password APIs for PDFDocument as the indirection in the Qiqqa made it bloody hard to make thread-safe otherwise. (Fixes jimmejardine#96 / jimmejardine#98)
- ditto for the GC-memleaking bindables of PDFDocument: all the binding stuff has been moved to the PDFDocument thread-safe wrapper class. (TODO: investigate if we can use WeakReferences more/better to prevent memleaks due to infinitely marked PDFDocument instances...)
- tweak: `AddLegacyWebLibrariesThatCanBeFoundOnDisk()`: at startup, try to load all .known_web_libraries files you can find as those should provide better (= original) names of the various Commercial Qiqqa Web Libraries. (This is running ahead to the tune of jimmejardine#109 / jimmejardine#103)
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 15, 2019
…ion code in CitationManager and shallow-copy the Attributes of a DictionaryBasedObject.
@GerHobbelt
Copy link
Collaborator Author

Closing and decluttering the issue list so it stays workable for me: fixed in https://github.com/GerHobbelt/qiqqa-open-source mainline=master branch, i.e. v82pre3 experimental release and beyond.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛bug Something isn't working 🕵investigate Needs further analysis to find the root cause.
Projects
None yet
Development

No branches or pull requests

1 participant