Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BibTeX Sniffer: Sometimes clicking on a PDF does show the PDF but DOES NOT import it in your library #54

Closed
GerHobbelt opened this issue Aug 17, 2019 · 6 comments
Labels
🐛bug Something isn't working
Milestone

Comments

@GerHobbelt
Copy link
Collaborator

Happens on rare occasions, e.g. bad connections or some other "weird failures".

The recurring theme here is that going back&forth in the browser pane is of no use: the PDF will load/show, but will NOT import. 😭

@GerHobbelt
Copy link
Collaborator Author

GerHobbelt commented Aug 17, 2019

Fixed in commit SHA-1: c28eb11. I hope.

The side-effect is that for proper PDF fetches (via the PDFInterceptor class), the PDF is fetched from the website twice. However, this is not a problem as Qiqqa includes dedup logic hence does download the second copy, but then will discard it as URL+fingerprint will match with the PDF imported just before.


Snippet from the GoogleBibTexSnifferControl code:

// When PDFs are viewed in Gecko/Firefox and somehow things went wrong the first time around,
// but **not enough wrong** so to speak, then the PDF is **cached** by Gecko/FireFox and it WILL NOT
// show up as one of the URIs being fetched for a page reload! The PDF will only show up **here**,
// as a completely loaded document.
//
// Meanwhile the Acrobat Reader in there will cause the `ObjWebBrowser.CurrentPageHTML` to render
// something like this:
//
// <html><head><meta content="width=device-width; height=device-height;" name="viewport"></head>
// <body marginheight="0" marginwidth="0"><embed type="application/pdf" 
//    src ="https://escholarship.org/content/qt0cs6v2w7/qt0cs6v2w7.pdf" 
//    name ="plugin" height="100%" width="100%"></body></html>
//
// !Yay!          /sarcasm!/

GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Aug 17, 2019
- Gecko these days crashes on ContentDispositionXXXX member accesses: Exception thrown: 'System.Runtime.InteropServices.COMException' in Geckofx-Core.dll

  I'm not sure why; the only change I know of is an update of MSVS2019.  :-S

- implement the logic for the BibTeXSniffer 'Has OCR' checkbox filter criterium. It's useful but the zillion file-accesses slow the response down too much to my taste.   :-S
@GerHobbelt
Copy link
Collaborator Author

GerHobbelt commented Aug 22, 2019

The fix impacts #52. Double-check that one before marking this one fixed.

@GerHobbelt
Copy link
Collaborator Author

GerHobbelt commented Aug 26, 2019

Related: #56 -- another case of not fetching the PDF

@GerHobbelt
Copy link
Collaborator Author

I bet this got fixed as part of the #56 fix activity in commit SHA-1: b3f1f2d

@GerHobbelt GerHobbelt changed the title BibTeX Sniffer: Sometimes clicking on a PDF does show the PDF but DOES NOT import it in your library ✅BibTeX Sniffer: Sometimes clicking on a PDF does show the PDF but DOES NOT import it in your library Aug 27, 2019
@GerHobbelt
Copy link
Collaborator Author

Marking this one FIXED given the commit mentioned above: web imports shouldn't be silent no more. And when they are, I'ld better file a fresh PR.

GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Aug 28, 2019
…sty PDF URIs which weren't recognized as such before. Right now we're pretty aggressive as we fetch almost everything that crosses our path; once fetched we check if's actually a valid PDF file after all. CiteSeerX and other sites now deliver once again...
@GerHobbelt
Copy link
Collaborator Author

Closing and decluttering the issue list so it stays workable for me: fixed in https://github.com/GerHobbelt/qiqqa-open-source mainline=master branch, pending #15 / any maintainer rights/actions.

GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 2, 2019
- Gecko these days crashes on ContentDispositionXXXX member accesses: Exception thrown: 'System.Runtime.InteropServices.COMException' in Geckofx-Core.dll

  I'm not sure why; the only change I know of is an update of MSVS2019.  :-S

- implement the logic for the BibTeXSniffer 'Has OCR' checkbox filter criterium. It's useful but the zillion file-accesses slow the response down too much to my taste.   :-S
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 2, 2019
…sty PDF URIs which weren't recognized as such before. Right now we're pretty aggressive as we fetch almost everything that crosses our path; once fetched we check if's actually a valid PDF file after all. CiteSeerX and other sites now deliver once again...
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 3, 2019
…sty PDF URIs which weren't recognized as such before. Right now we're pretty aggressive as we fetch almost everything that crosses our path; once fetched we check if's actually a valid PDF file after all. CiteSeerX and other sites now deliver once again...
@GerHobbelt GerHobbelt added the 🐛bug Something isn't working label Oct 4, 2019
@GerHobbelt GerHobbelt changed the title ✅BibTeX Sniffer: Sometimes clicking on a PDF does show the PDF but DOES NOT import it in your library BibTeX Sniffer: Sometimes clicking on a PDF does show the PDF but DOES NOT import it in your library Oct 4, 2019
@GerHobbelt GerHobbelt added this to the v82 milestone Oct 4, 2019
GerHobbelt added a commit that referenced this issue Nov 5, 2019
- Gecko these days crashes on ContentDispositionXXXX member accesses: Exception thrown: 'System.Runtime.InteropServices.COMException' in Geckofx-Core.dll

  I'm not sure why; the only change I know of is an update of MSVS2019.  :-S

- implement the logic for the BibTeXSniffer 'Has OCR' checkbox filter criterium. It's useful but the zillion file-accesses slow the response down too much to my taste.   :-S
GerHobbelt added a commit that referenced this issue Nov 5, 2019
…'t recognized as such before. Right now we're pretty aggressive as we fetch almost everything that crosses our path; once fetched we check if's actually a valid PDF file after all. CiteSeerX and other sites now deliver once again...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant