Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate PDF support options #3

Open
michaelrsweet opened this issue Mar 8, 2020 · 32 comments
Open

Investigate PDF support options #3

michaelrsweet opened this issue Mar 8, 2020 · 32 comments
Assignees
Labels
enhancement New feature or request priority-medium
Milestone

Comments

@michaelrsweet
Copy link
Owner

It would be nice to have a solution for providing direct PDF printing support, both for raster printers as well as native PDF printers/print solutions and legacy PostScript devices. Currently the available options are not ideal:

  • CoreGraphics is only available on Apple operating systems
  • MuPDF has an unstable API (with no apparent timeline for having a stable API) and its usage of AGPL can be problematic
  • Ghostscript's code base is difficult to embed and its usage of AGPL can be problematic
  • Poppler (as used by cups-filters) is C++-based and only solves the PDF-to-PostScript and PDF-to-PDF part of the problem (which might be sufficient) and its usage of GPL can be problematic
  • Xpdf (which is what Poppler is based on) is C++-based and its usage of GPL can be problematic

Most of these have commercial licensing options if you need to do a closed-source implementation.

Feature list:

  • (Preferred) C API
  • (Preferred) Apache/MIT/BSD license and/or FRAND commercial license pricing
  • Support for current PDF 2.0 (ISO 32000-2), including color management and document security features
  • Support for rasterizing to the basic Apple/PWG Raster color spaces and bit depths: 1-bit black, 8-bit grayscale w/2.2 gamma ("sGray"), and 24-bit sRGB. (Even better if it supports AdobeRGB, CMYK, etc. with ICC-based color profiles)
  • Support for rendering as PDF, accounting for page-ranges and print-color-mode/color-supported (for monochrome output).
  • Support for rendering as Level 2 PostScript, accounting for page-ranges and print-color-mode/color-supported (for monochrome output) - this is just "nice to have" as most PostScript laser printers also support PCL raster.
@michaelrsweet michaelrsweet added the enhancement New feature or request label Mar 8, 2020
@michaelrsweet michaelrsweet self-assigned this Mar 8, 2020
@michaelrsweet michaelrsweet added this to the Future milestone Mar 17, 2020
@michaelrsweet
Copy link
Owner Author

Current status:

  • Xpdf/poppler have the pdftoppm utility which can be used to pipe page images to the raster driver, pdfinfo to get page (impression) count. Standard Linux utility, GPL so no problem using from PAPPL (separate programs). Also pdftops utility to produce PostScript.

  • MuPDF has the mutool utility which can draw to PWG Raster or get the impression count. But MuPDF is AGPL so not good for Apache-licensed code.

I'm not sure that poppler/Xpdf support PDF 2.0 yet.

@michaelrsweet michaelrsweet modified the milestones: Future, v1.0 Mar 30, 2020
@tillkamppeter
Copy link
Contributor

For Poppler there is a pdftoraster filter using only stable APIs of Poppler in cups-filters. I will change the license of cups-filters to Apache 2.0 + (L)GPL2 exception soon, I have already the permissions of the contributors. I am also thinking about moving the core functionality of each filter into libcupsfilters, so that the filters can be easily used from a Printer Application.
cups-filters also contains the pdftops filter which calls (in Poppler mode) the /usr/bin/pdftops utility of Poppler.

@tillkamppeter
Copy link
Contributor

There is also the PDF library of the Chrome browser PDFium, but it is also C++.

@andreas-gruenbacher
Copy link

andreas-gruenbacher commented May 4, 2020

How about support for ordered and error diffusion dithering in those libraries?

@michaelrsweet
Copy link
Owner Author

@tillkamppeter libpoppler is GPL2, so linking anything to it makes it GPL... I haven’t looked closely at PDFium.

@andreas-gruenbacher There isn’t much point in supporting ED in PAPPL for a modern inkjet printer - there is a lot more than just dithering involved. Plus projects like Gutenprint wouldn’t use it anyways. The current PAPPL code supports threshold, clustered, and dispersed dot dithering for 1-bit B&W raster output - that covers most monochrome laser and thermal label printers where colour management and dithering algorithms are not as critical.

@michaelrsweet
Copy link
Owner Author

@tillkamppeter After looking at PDFium more closely, I’d have to say that’s a hard no - the license is fine but the API, code, and documentation leave much to be desired.

@andreas-gruenbacher
Copy link

andreas-gruenbacher commented May 5, 2020

The current PAPPL code supports [...] dithering

I did have the Brother PT / QL label printers in mind which seem to be 1-bit monochrome, except for the more unusual red / black printing mode supported by the QL-800 series. Dithering after rasterization should be fine, thanks.

@tillkamppeter
Copy link
Contributor

cups-filters calls Ghostscript and MuPDF (mutool) via command line, also Poppler's pdftops, meaning that we have a license decoupling at least for these cases, so for these the code of cups-filters can be used.

pdftoraster uses libpoppler. Here one could have this one, small executable under GPL and let the Printer Application call this executable.

Generally for output quality and upstream maintenance quality Ghostscript is probably the most recommended.

@tillkamppeter
Copy link
Contributor

This needs to get solved to allow retro-fitting of classic drivers and PostScript PPD files into Printer Applications and so also using the CUPS Snap as standard printing stack for OS distributions.

@michaelrsweet
Copy link
Owner Author

@tillkamppeter Both Ghostscript and MuPDF are AGPL which extends the license to include anything you run with it (linked or otherwise) to support a network service, which means that even running it as a separate program from PAPPL is problematic!

(for that matter ippsample's ipptransform command has a similar issue, but as that is just a sample implementation I am less concerned about it)

Running poppler/Xpdf's pdftopnm or pdftoraster programs (GPL2) is an option, but that is no necessarily something I want to expose in PAPPL by default.

@tillkamppeter
Copy link
Contributor

Would this mean that we have to re-invent the PDF interpreter wheel starting a new one from scratch?

@michaelrsweet
Copy link
Owner Author

@tillkamppeter Not necessarily, just that I wouldn't want a printer application to, by default, provide PDF support using a system-installed GPL program unless it opted into the behavior. It's hard enough these days to make sure you have all of the licensing details correct without having some joker's library you are using bring in something you don't expect! :)

@tillkamppeter
Copy link
Contributor

If the Printer Application is snapped and not classically installed, it could come with its own PDF interpreter, in most cases of the driver developer's choice, then it does neither pick a system-installed program for that nor one provided with the Ubuntu Core under which Snaps are built.
For retro-fit of Ghostscript's drivers I will have to use Ghostscript ...

@tillkamppeter
Copy link
Contributor

What one could perhaps do to make PAPPL more easy to use to driver developers is, in addition to the current raster driver with forced split into start job -> start page -> write line -> stop page -> ... -> stop job and raw printing also have the possibility of feeding the incoming data or the output of a filter into a print_with_driver() function where the driver developer can insert a function call, optionally of an external utility he supplies, and this function or utility converts the whoile job's data into a data stream which the printer understands.
In case of retro-fitting classic drivers the driver developer would call his existing CUPS filters here, some printer manufacturer making his proprietary driver would insert his own, proprietary PDF-to-printer's-PDL filter, ...

@tillkamppeter
Copy link
Contributor

I suggest to add the following item to the pappl_pdriver_data_s data strucuture:

pappl_printfunc_t     post_filter;

If the driver developer who uses PAPPL lets this point to a function and not to NULL, all jobs, after being filtered to driver_data.format will be passed through this function to convert into the final, printer-specific format which the printer needs. Naturally this needs also job-process.c to be adapted.

The driver developer can set this function and set all the raster printing functions rendjob, ..., rwrite to NULL if for example his filter turns PDF into the printer's format. He would set driver_data.format = "application/pdf"; and driver_data.post_filter = my_pdf_to_acme; for example. His code will provide my_pdf_to_acme() then and he can decide how to get PDF into what his Acme LaserStar needs. So he could call Ghostscript in that function and he can license his code to what is most suitable. This should not affect PAPPL then any more.

@michaelrsweet
Copy link
Owner Author

@tillkamppeter The printfile function is intended for direct printing/filtering by the driver in PAPPL. Moreover, ALL of the methods are intended to produce printer-ready output, so I will not be adding a post_filter callback.

The point is not to reproduce the CUPS filter chain but to simplify the development of Printer Applications for legacy printers, the vast majority of which are either some form of simple raster format (BJC, ESC/p, PCL, etc.) or PostScript.

@tillkamppeter
Copy link
Contributor

I do not find a printfile() nor a print_file() function in PAPPL ...

@michaelrsweet
Copy link
Owner Author

Sorry, the member is called "print". From "pappl/printer.h":

struct pappl_pdriver_data_s             // Print driver data
{
  pappl_identfunc_t     identify;               // Identify-Printer function
  pappl_printfunc_t     print;                  // Print (file) function
...
} pappl_pdriver_data_t;

@tillkamppeter
Copy link
Contributor

Yes, I have seen it. Looking at the rest of the code of PAPPL and at Jai's sample PCL driver it looked for me that it is only for raw (unfiltered) printing.
So assuming I have a native PostScript printer and its PPD. I also know that a driverless IPP printer (and so a Printer Application) has to accept at least one format of PDF, Apple Raster, PWG Raster, and PCLm. For the PostScript printer it is best that I accept PDF.
So I set

driver_data.format = "application/pdf";
driver_data.print = my_ps_print;
driver_data.rendjob = NULL;
...
driver_data.rwrite = NULL;

In my code I create a function named my_ps_print() then which takes PDF as input and converts it to PostScript. It uses the PPD file (with the help of the new libppd of cups-filters) to convert the options to PostScript code to embed in the output. The code is merely the pdftops filter of cups-filters (I could move the code of this filter into libcupsfilters for that).
The Printer Application Snap would have to contain my Printer Application executable, the PPD file, libpappl, libcupsfilters, libppd, any further libraries needed and auxiliary programs like Ghostscript or Poppler's pdftops utility.
Would this be the way to retro-fit a PostScript printer?

@michaelrsweet
Copy link
Owner Author

@tillkamppeter Actually, you set driver_data.format to "application/postscript" and then add a filter callback for PDF to PostScript with papplSystemAddMIMEFilter. The filter callback can then write to the device provided to the callback in whatever format is convenient for the printer.

The only difference is that PAPPL doesn't string multiple filters together like CUPS does.

@tillkamppeter
Copy link
Contributor

tillkamppeter commented Jul 15, 2020

OK, thanks. with this I think I will be able to do a PostScript Printer Application. If the user sends a PDF job, my function will turn PDF into PostScript and apply the PPD's code snippets, all what the pdftopdf and pdftops filters have done in cups-filters. Can I make this Printer Application accept ONLY PDF as input format? Especially it should not accept PostScript, as this would make the job being passed through unfiltered and any options or attributes being ignored.

I would proceed similarly for arbitrary printer driver retro-fits, for example Foomatic. Here the added filter would take the role of pdftopdf and foomatic-rip, but the output format is anything non-standard binary, proprietary which does never exist as an input format and different from printer model to printer model (usually generated by built-in drivers of Ghostscript), so again here I only want to accept PDF as input, but to what would I have to set driver_data.format?

@JaiLuthra1
Copy link
Contributor

JaiLuthra1 commented Jul 15, 2020 via email

@tillkamppeter
Copy link
Contributor

@JaiLuthra1, if driver_data.format is set to "application/postcript" and the user sends a job in PostScript, my filter function will not be called and the PostScript will be passed through unfiltered, making some printers (usually non-PostScript but PCL-capable printers) print tons of text pages. driver_data.format must be something which does not exist as input format, which a user cannot supply as input, to force my function to be executed.

@JaiLuthra1
Copy link
Contributor

@tillkamppeter What I proposed would solve your issue of accepting ONLY PDF. If you want to allow the user to send in other inputs(PostScript, for example), you would not want print function simply pass the input file AS IS. Not too sure but you might want to do filtering in the print function too if you have non-raster output.

@tillkamppeter
Copy link
Contributor

@JaiLuthra1, I am fine with a Printer Application accepting only PDF. If setting driver_data.format to "application/postcript" forces the Printer Application to accept PostScript (what I do not want) and so such jobs skip my filter function, making the print function block these jobs would be perhaps the only solution.

@michaelrsweet
Copy link
Owner Author

@tillkamppeter I think what @JaiLuthra1 is trying to say is that the "print" function is responsible for handling the native format, and that function can add any printer-specific commands needed for the job.

@michaelrsweet michaelrsweet modified the milestones: v1.0, v1.1 Oct 20, 2020
@michaelrsweet michaelrsweet modified the milestones: v1.1, Future Jan 15, 2021
@sicklittlemonkey
Copy link

@michaelrsweet

@tillkamppeter libpoppler is GPL2, so linking anything to it makes it GPL... I haven’t looked closely at PDFium.

Just FYI, Google used libpoppler in their GCP Connector released under a BSD-style license When I logged an issue back in 2016 it was closed with the comment "linking to poppler doesn't change the license of this source code", so their employees would seem to have a company handbook written by lawyers with a different view on this. ; - )

@michaelrsweet
Copy link
Owner Author

@sicklittlemonkey Yeah, Google's response was clearly incorrect. BSD and GPL2 aren't incompatible, but when you combine them the combined software is covered by the GPL2.

@milkman0007
Copy link

How is PAPPL handling PDFs now? Also, is there a way to crop and resize a PDF before it prints? The issue I am looking to solve is this: I will be printing out of a 4-inch thermal printer. The file is a preformatted label intended to be printed on an 8.5 x 11 paper, like return labels from eBay or Amazon. There are cropping software you can upload those to, which will crop it to print on 4x6, but is there a way to do that on the driver level?

@milkman0007
Copy link

Here is the site that does it I was referring to
https://www.labelresizer.com/howto

and here is a one example of how pdfs are being cropped
https://prnt.sc/ifP098e1ICE0

@michaelrsweet
Copy link
Owner Author

PAPPL doesn’t specifically handle PDFs, you need to supply extra software for that. And I won’t be adding built-in support for cropping, etc.

@tillkamppeter
Copy link
Contributor

If you need a library for printing-related file conversions, including cropping or scaling PDF pages to given page sizes, putting several PDF pages onto one sheet, converting many formats to PDF, converting PDF to output formats, ... have a look at libcupsfilters. It has many filter functions for the different conversions, and also special filter functions, for example to chain individual filter functions, to call an external filter executable, ... The filter functions cannot only be controlled by command line options but also by IPP attributes and they also get the printer capabilities via printer IPP attributes.

I use it in OpenPrinting's Printer Applications, actually in pappl-retrofit, a library for retro-fitting classic CUPS drivers into Printer Applications. It is also used for the current version of CUPS' filters, the cups-filters package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority-medium
Projects
None yet
Development

No branches or pull requests

6 participants