Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert images to jpg/png to make them printable #2177

Open
legoktm opened this issue Aug 23, 2024 · 9 comments
Open

Convert images to jpg/png to make them printable #2177

legoktm opened this issue Aug 23, 2024 · 9 comments

Comments

@legoktm
Copy link
Member

legoktm commented Aug 23, 2024

Description

Like we use LibreOffice to convert documents to PDFs so they can be printed, we should do the same for images.

How will this impact SecureDrop users?

  • Broaden the set of printable file types to include most images

How would this affect the SecureDrop Workstation threat model?

  • Likely introduce a new dependency on an image conversion tool, probably imagemagick. This tool runs inside a dispVM, so I don't think it would introduce any more attack surface than LibreOffice would.
    • @rocodes also mentioned that another option might be to use Dangerzone as the conversion mechanism.

User Stories

  • As a journalist, I want to print webp/gif/raw/etc. files, so that I can share them offline.
  • As a journalist, I want to be able to print images and not need to care about how the source sent them to me.
@rocodes
Copy link
Contributor

rocodes commented Aug 23, 2024

Good things about including imagemagick in the large template, not just for printing:

  • print more filetypes
  • view more filetypes
  • other potentially-useful built in functionality (image blur?)

I think ideally we'd be able to stop supporting multiple separate export paths, so we wouldn't just install imagemagick for printing purposes, but along a single consolidated print/export path (that might/should involve dangerzone) :)

@deeplow
Copy link
Contributor

deeplow commented Aug 28, 2024

This also has overlap with Dangerzone. It currently uses ImageMagick / LibreOffice with mimetypes hardcoded. Whatever solution we converge on, so both projects could benefit by converging approaches. It is essentially the same problem: how do I generally know what is printable and which tool to print it with.

@legoktm
Copy link
Member Author

legoktm commented Aug 29, 2024

Yeah, I feel that this is kind of a problem set that we might better delegate to Dangerzone instead of having two similar project/implementations. I don't know how far/close DZ is ready to being plugged in and if we're ready for that.


I was poking at imagemagick's identify command that can be used to well, identify a file:

user@dev ~/g/f/s/e/t/f/samples_supported (main)> identify -verbose Sample_jpeg.jpg | grep -i mime
  Mime type: image/jpeg

But then if you try it on a docx:

user@dev ~/g/f/s/e/t/f/samples_supported (main)> identify -verbose Sample_docx.docx | grep -i mime
convert /tmp/magick-OvuQ7lXsiWvGrVV0SnLLYw6UMoAYvo50 as a Writer document -> /tmp/magick-OvuQ7lXsiWvGrVV0SnLLYw6UMoAYvo50.pdf using filter : writer_pdf_Export
  Mime type: application/pdf
  Mime type: application/pdf

Oops, it is "smart" and converts the docx to pdf and then says oh, it's a pdf. But how is it converting docx to pdf?

$ magick -list delegate

Path: /etc/ImageMagick-7/delegates.xml

Delegate                Command
-------------------------------------------------------------------------------
        doc =>          "libreoffice' --convert-to pdf -outdir `dirname '%i'` '%i' 2> '%u'; /usr/bin/mv '%i.pdf' '%o"
       docx =>          "libreoffice' --convert-to pdf -outdir `dirname '%i'` '%i' 2> '%u'; /usr/bin/mv '%i.pdf' '%o"

(trimmed because there's a bunch more formats)

It's just using libreoffice under the hood 🙃

I couldn't find any way to turn off the conversion with identify, maybe if we just overwrite the delegates.xml file to strip out the libreoffice ones? Or...we just use imagemagick's wrapper over libreoffice and then only need to worry about imagemagick instead of also libreoffice?

@deeplow
Copy link
Contributor

deeplow commented Aug 30, 2024

That is incredibly useful! Thanks for the pointer. Flagging @freedomofpress/dangerzone-maintainers as a potential collaboration / integration opportunity.

@deeplow
Copy link
Contributor

deeplow commented Aug 30, 2024

If integrated with Dangerzone it would be nice if the conversion were to happen in the viewer itself instead of spinning up an intermediate VM. This is different than how Dangerzone typically operates in Qubes.

@almet
Copy link

almet commented Aug 30, 2024

Heya,

So if I understand correctly, the goal would be to use dangerzone to convert images to a printable format.

This is currently doable with the command line utility (it only produces PDFs, no jpg/png as mentioned in the issue):

dangerzone-cli --output-filename output.pdf input.format

Some of the formats listed in the issue aren't supported (webp, for instance, is not). Here is the list of currently supported extensions (from the docs):

  • PDF (.pdf)
  • Microsoft Word (.docx, .doc)
  • Microsoft Excel (.xlsx, .xls)
  • Microsoft PowerPoint (.pptx, .ppt)
  • ODF Text (.odt)
  • ODF Spreadsheet (.ods)
  • ODF Presentation (.odp)
  • ODF Graphics (.odg)
  • Hancom HWP (Hangul Word Processor) (.hwp, .hwpx)
    • Not supported on Qubes OS
      
  • EPUB (.epub)
  • Jpeg (.jpg, .jpeg)
  • GIF (.gif)
  • PNG (.png)
  • SVG (.svg)
  • other image formats (.bmp, .pnm, .pbm, .ppm)

That might be a good opportunity to look adding support for more formats.


if the conversion were to happen in the viewer itself instead of spinning up an intermediate VM

I'm not sure to understand what you mean by this? (what do you call "the viewer" ?). Do you need the security guarantees provided by dangerzone?

@deeplow
Copy link
Contributor

deeplow commented Aug 30, 2024

Thanks for the comments. The main thing that sparked the conversation is that we're kind of already needing to convert documents and that's something Dangerzone already does. Integration with Dangerzone is something we're aiming for at some point, and we just found another place where it is relevant and would de-duplicate our effort.

if the conversion were to happen in the viewer itself instead of spinning up an intermediate VM

I'm not sure to understand what you mean by this? (what do you call "the viewer" ?). Do you need the security guarantees provided by dangerzone?

In Qubes, one doesn't need Dangerzone typically because you can "Open document in Disposable VM", which opens a viewer directly. This is the "the Viewer" I am referring to. But it is not fully relevant here yet since we don't yet have nailed down the print architecture. But my idea is that if we ever want to print a file from "the viewer" VM, then Dangerzone would ideally do the "doc to pixels" phase there and then send the result to another VM to reassemble the doc ("pixels to PDF") and later print / export. This would not reduce the risk since the file is already open in said VM.

This is not how Dangerzone currently works. Instead, it starts a VM / container for the first conversion part. So it would require some plumbing on Dangerzone's side to have the "server" component run locally and have the client on a secondary more trusted VM. Hopefully this makes more sense. But we can schedule a call at some point when trying to tackle this.

@almet
Copy link

almet commented Aug 30, 2024

Thanks for the explanation, it makes more sense indeed 👍

I have one question left: you're talking about Qubes here, and my understanding is that it's not the only "environment" on which securedrop-client can run. On these other environments, would you also need to use dangerzone (maybe in a more "traditional" way)?

Now, on Qubes, the part that's currently running in the DZ container (that you named "the server") can probably be resumed to the doc_to_pixels.py file.

I'm not sure if this use case should be handled by the dangerzone project directly, as it seem very specific to how Qubes work, but we should definitely make it possible by offering APIs that makes this as simple as possible.

@deeplow
Copy link
Contributor

deeplow commented Sep 3, 2024

Thanks for the comments. The main thing that sparked the conversation is that we're kind of already needing to convert documents and that's something Dangerzone already does. Integration with Dangerzone is something we're aiming for at some point, and we just found another place where it is relevant and would de-duplicate our effort.

if the conversion were to happen in the viewer itself instead of spinning up an intermediate VM

I'm not sure to understand what you mean by this? (what do you call "the viewer" ?). Do you need the security guarantees provided by dangerzone?

In Qubes, one doesn't need Dangerzone typically because you can "Open document in Disposable VM", which opens a viewer directly. This is the "the Viewer" I am referring to. But it is not fully confirmed that this is the path we want since we don't yet have nailed down the print architecture, but should it happen, then the idea idea is that if we ever want to print a file from "the viewer" VM, then Dangerzone would ideally do the "doc to pixels" phase there and then send the result to another VM to reassemble the doc ("pixels to PDF") and later print / export. This would not reduce the risk since the file is already open in said VM.

This is not how Dangerzone currently works. Instead, it starts a VM / container for the first conversion part. So it would require some plumbing on Dangerzone's side to have the "server" component run locally and have the client on a secondary more trusted VM. Hopefully this makes more sense. But we can schedule a call at some point when trying to tackle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants