Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No images displayed (using DocxConverter & JatsParser) #10

Closed
LoicE5 opened this issue May 13, 2021 · 16 comments
Closed

No images displayed (using DocxConverter & JatsParser) #10

LoicE5 opened this issue May 13, 2021 · 16 comments

Comments

@LoicE5
Copy link

LoicE5 commented May 13, 2021

  • OJS version : 3.2.1-4
  • DocxConverter version : 1.1.0.0
  • JatsParser version : 2.1.9.2

Hi @Vitaliy-1,

I am currently implementing the following workflow in OJS with two of your plugins : DocxConverter & JatsParser.

  1. The author submits the article in DOCX
  2. The DOCX article is converted to JATS with DocxConverter
  3. (optionnal) The editor may edit the article with Texture Editor plugin
  4. On publication stage, JatsParser generates the full text HTML & a PDF galley

This entire process is working fine, with one exception : figures.

When I do upload the document, no images are displayed in the HTML or in the PDF.

Here's some processes that I tried

  1. Uploading the DOCX file and converting it to JATS without adding any supplementary file.
  2. Uploading the DOCX file and converting it to JATS, then adding the images as JPEG files as attached files to the XML.
  3. Uploading the DOCX file, and the images at the submission stage (at the same level). Converting the DOCX to JATS and publishing.
  4. Uploading the DOCX file and converting it to JATS, then adding the images to the every directory of the article (for testing purposes) via FTP.

In every case, the DOCX document comes from MS Word or Google Docs.

I don't get any Fatal Errors in the Apache Log.

Would you have any advices regarding images display in DocxConverter & JatsParser ?

Thanks a lot by advance for your help !

Loïc

@Vitaliy-1
Copy link
Owner

Hi @LoicE5,

I see the error related to JATS XML parsing in the log. Have you noticed fig tags in the JATS XML that is produced with DOCX Converter plugin (like https://jats.nlm.nih.gov/archiving/tag-library/1.1d1/n-ib40.html)?

@LoicE5
Copy link
Author

LoicE5 commented May 17, 2021

There are <fig> tags in the converted JATS XML. Here's a screenshot :

image

I made a GDrive folder with all the files for this example : https://drive.google.com/drive/folders/1HnFWpQysnU_V9mi-KhUWwU2Yk0qHoTfR?usp=sharing

(The docx document comes from GDocs in this folder, I also tried with MS Word.)

The issue might also be coming from JatsParser Plugin... Maybe a routing issue ?

Here's the test journal with the element inspector :

image
(a logical URL for the image would be http://192.168.1.10/ojs2_n4/index.php/monjournal/article/view/129/image1.jpg, but it returns 404).

In this process, we submit images as complementary files attached to the XML document during the production stage.
They are stored in /var/www/ojs_n4-files/journals/1/articles/129/submission/proof.

I also noticed that files in the arborescence are renamed, including images... I tried to directly inject image1.jpg,image2.jpg,image3.jpg in the arborescence, without any success.

image

To sum up:

  • How DocxConverter is supposed to manage images included in it ? Should we upload the images independently of the conversion ?
  • When we add an image through Texture and upload the images again in the dependancy grid (as it is mentionned in the readme of Texture), what else do we have to do so that the images will be displayed in the HMTL and pdf with JatsParser ?

Thanks a lot for your help ! I will be happy to give you more details if you need them :)

Loïc

@Vitaliy-1
Copy link
Owner

Thanks! Yes, look like JATS Parser Plugin is unable to pick up the correct path to the images in the system.

How DocxConverter is supposed to manage images included in it ? Should we upload the images independently of the conversion ?
The plugin copies images from DOCX archive and attaches them to the resulted file automatically during conversion.

When we add an image through Texture and upload the images again in the dependancy grid (as it is mentionned in the readme of Texture), what else do we have to do so that the images will be displayed in the HMTL and pdf with JatsParser ?
You shouldn't do anything else.

Can you also share with me any DOCX file with images inside that aren't correctly parsed to try the conversion chain to reproduce the error?

@LoicE5
Copy link
Author

LoicE5 commented May 17, 2021

Here's some files that I created specifically for test purposes, that contains images :


DOCX created with Google Docs
image1.jpg
image2.jpg
image3.jpg


DOCX created with MS Word
image1.jpeg
image2.jpeg
image3.jpeg

@LoicE5
Copy link
Author

LoicE5 commented May 21, 2021

Hi @Vitaliy-1 ,

After some researches, I may have found a workaround that would allow images to be displayed in the DOM. This is using Texture.

When you attach images to the XML file, then edit it with Texture, you can inspect the images using browser developer's tools.

image

We then get a path with 4 GET parameters :

http://<domain>/index.php/<journal_name>/texture/media?submissionId=139&fileId=392&stageId=5&fileName=image1.jpeg

  • submissionId is the n° of the submission. It is visible in the <meta> tags of the page, and in the URL
  • fileId is the Id of the XML file that we're working with. This is a number used multiples times accross JatsParserPlugin & the entire OJS structure, as a $fileId variable
  • stageId is the current submission stage (copyediting, production...). In our case, it will always be Production and equals to 5.
  • fileName is the actual name of the file that we've atatched (renamed "image1" on my end to match the values of the generated JATS). It can easily be retrieved from the DOM

It would then be possible using the custom header plugin to use JavaScript to dynamically replace existing img's src's to the path above, with the relevant GET parameters (using a forEach loop).

However, I cannot see any way to echo the $fileId in the document (It would be my only way to get this value in JS).

image

I've located the place where the $fullText is extrapolated from the displayFullText function. However, it seems difficult to extrapolate the $fileId from this function as long as it's not declared, and the args input array doesn't seems to include the value that I'm looking for...

image

Do you have any idea about how I could echo in the HTML DOM the $fileId variable (in a display:none tag) ?

Thanks a lot for your help (and your involvment in the PKP community) !

Loïc

@Vitaliy-1
Copy link
Owner

Hi @LoicE5,

I hope to take a look at the issue and examples tomorrow.

JATS Parser's controller (handler) extends OJS's ArticleHandler and adds a method for operations with dependent files: https://github.com/Vitaliy-1/JATSParserPlugin/blob/90f9eb4813de35e275de78cb616fde7516c48554/FullTextArticleHandler.inc.php#L22. I'd start the debugging from this method to see if/where it fails. If I recall correctly, downloadFullTextAssoc operation accepts submission id, JATS XML file id and image file id as the first 3 arguments. Let's say original XML file has and ID 1000; the file is linked to the submission with ID 100; dependant image, attached to the file has an ID 1001; then the image should be available at:

.../journal/article/downloadFullTextAssoc/100/1000/1001

Another crucial place is where the image path is built for the actual HTML and PDF: https://github.com/Vitaliy-1/JATSParserPlugin/blob/90f9eb4813de35e275de78cb616fde7516c48554/JatsParserPlugin.inc.php#L727. This method is called just before fullText is assigned to the template. If the real path to the image is missing in the constructed HTML, probably the problem is somewhere here.

Do you have any idea about how I could echo in the HTML DOM the $fileId variable (in a display:none tag) ?

The path is constructed here: https://github.com/Vitaliy-1/JATSParserPlugin/blob/90f9eb4813de35e275de78cb616fde7516c48554/JatsParserPlugin.inc.php#L739, the file id is the last there ($dependentFile->getFileId()). I don't assign fileId to the template, thus it cannot be called from there directly.

Texture plugin has its own handler and the logic may be slightly different.

Let me know if you need more details.

@Vitaliy-1
Copy link
Owner

@Vitaliy-1
Copy link
Owner

@LoicE5, I was able to reproduce the problem only for JATS Parser Plugin v. 2.1.9-3, which is intended to work with OJS 3.3 and added a fix to stable-3.3.0 branch of the plugin. See issue: Vitaliy-1/JATSParserPlugin#59 and referenced commit.

However, the test of stable-3_2_1 branch, which corresponds to JATS Parser v. 2.1.9-2, doesn't show any problems. I did the test with the files attached above: uploaded to the production stage, converted with DOCX Converter Plugin, added as a full text and published with activated JATS Parser Plugin. Are you sure about the versions? If yes, can you do some debugging according to the hints I've posted above?

@LoicE5
Copy link
Author

LoicE5 commented May 25, 2021

Hi @Vitaliy-1,

Thanks a lot for your reply and your precious help.

The versions are, after re-check :

  • OJS → 3.2.1-4
  • DocxConverter → 1.1.0.0
  • JatsParser 2.1.9.2 (branch stable 3_2_1)

I’ve tested the path you mentioned above (.../journal/article/downloadFullTextAssoc/100/1000/1001) and it’s working at a glance.

I've been looking to the code and I think that I'll get deeper into it, following your hints, with @letailli in the upcoming days.

I'll probably come back to you with more observations and maybe, I hope, some possible solutions but if you have one your side in the meantime do not hesitate to share :)

Thanks a lot for your work !

Loïc

@Vitaliy-1
Copy link
Owner

Vitaliy-1 commented May 26, 2021

I’ve tested the path you mentioned above (.../journal/article/downloadFullTextAssoc/100/1000/1001) and it’s working at a glance.

This narrows the problem down to the part of code that replaces the path to the image: https://github.com/Vitaliy-1/JATSParserPlugin/blob/stable-3_2_1/JatsParserPlugin.inc.php#L755-L763
E.g., I'm escaping filename with https://www.php.net/manual/en/function.rawurlencode.php, which may cause troubles of filename contains non-alphanumeric characters.

@Vitaliy-1
Copy link
Owner

Hi @LoicE5,

Did you manage to figure out where the problem is?

@LoicE5
Copy link
Author

LoicE5 commented Jun 24, 2021

Hi @Vitaliy-1,

I inspected your code and I have not been able to solve the issue.
The problem is, I believe, that the images are printed in the document with a relative url ("image1.jpg","image2.jpeg"...).
A fix could be to replace these relative paths into full paths as you mentioned below (.../journal/article/downloadFullTextAssoc/100/1000/1001).

Thanks for your help anyway, and sorry for my late reply :)

Have a nice day!

@LoicE5 LoicE5 closed this as completed Nov 23, 2021
@Vitaliy-1 Vitaliy-1 reopened this Nov 23, 2021
@Vitaliy-1
Copy link
Owner

Vitaliy-1 commented Nov 23, 2021

Hi @LoicE5,

Images are handled the same way is in the HTML galley. This path is replaced with the absolute before showing on the front-end.
I was able to reproduce the error recently in OJS 3.3, will check in near time.

@Vitaliy-1
Copy link
Owner

Finally had some time today to explore and found the problem in how a localized file name of the image is handled. It seems to arise not from JATSParser or DOCXConverter plugin. Hope to have some more information tomorrow.

@Vitaliy-1
Copy link
Owner

The problem was in the Texture plugin: pkp/texture#103

@letailli
Copy link

Thanks a lot @Vitaliy-1. We made a (very quick) fix with JS for our urgent problem. We'll come back to it for a future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants