Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline text line height not being calculated correctly #1646

Open
troygrosfield opened this issue Jan 5, 2016 · 33 comments
Open

Multiline text line height not being calculated correctly #1646

troygrosfield opened this issue Jan 5, 2016 · 33 comments
Labels
Bug Any unexpected behavior, until confirmed feature. Conversion

Comments

@troygrosfield
Copy link

The line height isn't being calculated correctly for ImageDraw's multiline_text() function [1]. It's making an assumption that a capital "A" is as large as the font can be. However, if I use the word "Apple" you'll notice that the "p" in word creates a different text size.

The issue

Current code

line_spacing = self.textsize('A', font=font)[1] + spacing
>>> self.textsize('A', font=font)[1]
170
>>> self.textsize('APPLE', font=font)[1]
170
>>> self.textsize('Apple', font=font)[1]
244 // <-- problem

How the text correctly looks on the web

screen shot 2016-01-05 at 12 13 35 pm

How the image is rendered via pillow (incorrect line height)

screen shot 2016-01-05 at 12 13 49 pm

Proposed Change

import string
...
line_spacing = self.textsize(string.ascii_letters, font=font)[1] + spacing

Gets characters above and below the text baseline. Use every upper and lower letter in the alphabet since you'll notice in the font above, the lower case "l" is the character with the highest point.

When that change is made, you'll see the correct text height in the image:

screen shot 2016-01-05 at 12 50 51 pm

[1] https://github.com/python-pillow/Pillow/blob/master/PIL/ImageDraw.py#L269

@troygrosfield troygrosfield changed the title Multiline text height not being calculated correctly Multiline text line height not being calculated correctly Jan 5, 2016
@wiredfool
Copy link
Member

The Pillow spacing is baseline to baseline, which is the appropriate measure for line spacing.

For larger images, the default 4px of extra is not anywhere near appropriate, which is really the issue that you're seeing.

It's possible that some A glyphs will be completely wrong for this metric, so we might be better off with something like '.', or even just the em height.

@troygrosfield
Copy link
Author

The Pillow spacing is baseline to baseline, which is the appropriate measure for line spacing.

What are you referring to as "baseline to baseline"? Seems like "extra" spacing would be exactly that...extra. The current multiline_text(...) method produce an image as follows that has line letters overlapping:

screen shot 2016-01-05 at 3 26 32 pm

I'm not sure I understand why users would want text to overlap like that. Seems like unexpected behavior, right?

@v-python
Copy link

I'm just looking to use Pillow to do multiline text generation, and see several issues in the archives about this issue, as well as this one.

It seems Pillow assumes 1pt = 1px... or 72dpi, regardless of the actual dpi of the image. It would be nice if it declared that somewhere obvious, like at the top of the ImageFont documentation. Or alternately, instead of using point as a measure anywhere, just used pixel, which would make it obvious that point-to-pixel calculations are left to the user.

Font size specification is typically the vertical span of characters, some of which is allotted to the ascenders (cap height), some to accent marks (above cap height), and some to descenders, but font designers have wide latitude in actual character heights. Still, they shouldn't exceed the font size span by much, if ever. Some fonts do violate that rule.

The "get size" operation is insufficient to use in positioning multiline text, because it doesn't tell where the baseline is in the w×h result. For multiline text, you have to know where the baseline is. The baselines must be evenly spaced.

There are only two reasonable algorithms for calculating the even spacing... (1) simply use the font metrics, or (2) allow the user to specify the baseline to baseline spacing.

Minimum multiline text height then, should be (numlines - 1)* baseline spacing + height of descenders if any in the bottom line + height of ascenders in the top line. This is only useful if placing/fitting text inside a box separate from any other text (consider label making).

Otherwise, it is much more important that the baselines of all the text on the page have an appropriate rhythm... if font sizes change during text flow, then the baseline spacing for any text that includes larger text should be adjusted by multiples of the specified spacing, not simply incrementally enough to contain the larger text. Multi-size text is not something that PILLOW supports, of course, in its layout... it hardly supports one size text, as these various bug reports indicate. But there's hope: people are supporting PILLOW :)

Single line getsize really should return 3 values: X width, Y distance above baseline, Y distance below baseline. For compatibility reasons, it probably has to be a different API, rather than suddenly returning 3 values, while it returned 2 before.

Multiline getsize really should return a similar result, probably considering the baseline of the bottom line as the "main" baseline, and the Y distance above baseline would include all the other lines of text, and baseline spacing.

@v-python
Copy link

v-python commented Jan 10, 2016

Over on #1540 it says:

>>> from PIL import ImageFont
>>> t = ImageFont.truetype('DejaVuSans.ttf', 40)
>>> t.getmetrics()
(38, 10)
>>> t.font.ascent
38
>>> t.font.descent
10
>>> t.font.height
47
>>> t.font.x_ppem
40
>>> t.font.y_ppem
40
>>> t.getsize('A')
(27, 38)
>>> t.getsize('AB')
(54, 38)
>>> t.getsize('M')
(35, 38)
>>> t.getsize('y')
(24, 46)
>>> t.getsize('a')
(25, 38)

Not only is it confusing that Pillow seems to assume that 1point = 1pixel, but the "point size" specification of the font above, 40, is then exceeded by the ascent + descent, and also we have ascent + descent > height. These are strange characteristics of fonts.

See http://designwithfontforge.com/en-US/The_EM_Square.html for typical font size specifications with nice pictures. Note that the em-square, which is the "point size" of the font, is (generally) bigger than any of the characters in width, ascent+descent, etc. So the relationships shown above make it extremely difficult for me to understand the relationship between the "size" that I pass to select a font, and the size of the resulting font. I don't find any documentation that expresses what the numbers in Pillow mean, with respect to what is typical in the industry.

@v-python
Copy link

Maybe my previous comment should have been a new bug, but I'm trying to understand these numbers in the context of figuring out how to deal with multiline text in my application. I would expect, for example, to be able to fit 7 lines of 10 point type in a one inch height. With the metrics shown, it seems like multiline_getsize might say it can, but that there would be significant overlap, as the resulting font seems to far exceed the em square / point size specified.

@v-python
Copy link

So when I look at DejaVu Sans in another application, which assumes 96dpi screen resolution and 1pt = 4/3 pixel, I tell it to put 40 point text on the screen, and I get cap height (ascent?) of 39, top of cap to bottom of descender (height?) of 50, giving a descent of 11. Of course, since it has a factor of 4/3, these would scale to 29.25, 37, and 8.25 in points. Note how all these sizes are smaller than the specified 40 points for the font size. When I add accent marks, I get to 59 pixels or 44.25 points for overall height from top of capital with accent mark to bottom of adjacent descender. So it seems that DejaVu Sans exceeds its em square for accents on capital letters. It is not unique in that regard, but note that unaccented letters all fit in the em square. So it seems that Pillow (or the font package it uses) is somehow confusing the metrics it obtains from the .ttf file, or intentionally scaling (some of) them, but if that is the case, it is not clear which ones, how much, or why it wants to deviate from standard font metrics stored in the .ttf file.

@v-python
Copy link

I've come to the following conclusions:

Pillow docs may actually be correct with regard to point sizes, but leave out one huge assumption, that in dealing with fonts, the point size is interpreted with regards to a 96dpi canvas, regardless of what the resolution of the canvas actually is. Early reading of docs and code convinced me that Pillow was interpreting 1 point == 1 pixel, like early Macintoshes. And it was very clear that the actual canvas/image resolution was being ignored.

But when I change to the assumption that a 96dpi canvas is assumed, then the font metric numbers start to make sense: what is reported by Pillow is very close to what can be measured in a familiar graphics program.

So then the problem that this bug is about is that the multiline_text[size] is using the cap height as the line spacing, and it should use the font size. None of the listed metric-getters return the point size or pixel size of the font, given the font object. It would be extremely useful to multiline_text[size] to have a way to obtain the pixel size of the font, to use that for the line spacing default. And rather than specifying an extra "spacing" in pixels, it would simpler to allow the user to specify the baseline-to-baseline distance, typically called "line height" in points (which should be translated to pixels as the assumed 96dpi canvas resolution, so that common typography expressions like "10 point font on 12 point leading" (while both needing to be scaled to the actual resolution of the canvas) can be implemented simply and consistently, rather than "10 point font on 16 pixel leading".

The other confusing point [yep, point] is that the x_ppem and y_ppem are actually returning Points per em, rather than Pixels per em, even though all the other metrics and sizes in the list above are returning Pixels!

Sorry for my "stream of consciousness" inputs above, while attempting to analyze and debug and understand. To summarize, there are:

Deficiencies in the documentation regarding units of parameters and returned metric values. It should be clarified which are in points and which are in pixels.

Deficiencies in textsize and multiline_textsize return values, because there is no indication of where the baseline is. This may only be curable by defining a better API with useful return values.

Deficiencies in the parameters to multiline_textsize, because the line spacing should be able to be specified (a new keyword parameter could be added to override "spacing" when it exists, so this is curable, but a new API could omit the "spacing" parameter of the current API)

Bug in the multiline_textsize code that is using the cap height in pixels in lieu of the font size in points (converted to pixels for the calculation) as the baseline-to-baseline spacing.

Complete absence in the documentation about the 96dpi canvas assumption used by fonts.

Missing a way to rediscover (a metric API return value) the "font size" (in pixels or points or both, but please document which) of a particular font object, for use in multiline_textsize.

N.B. The "height" appears to be approximately the sum of the "ascent" and "descent" but is not the same as the "font size converted to pixels".

@wiredfool
Copy link
Member

These are strange characteristics of fonts.

Yes, but that's what we're getting from the underlying code, with the note that some of the items are converted from 26.6 fractional pixels. ( http://www.freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_Size_Metrics)

The TTF code WRT metrics is really a very thin shim on what freetype is doing, and the docs and conventions there are different than one would expect, including noting that height and base + ascender are potentially different values, and that one is more useful than the other.

From a quick check, it appears that there is no concept of points in this implementation, it's all pixels. (size is passed directly into core.getfont, and from there, it calls FT_Set_Pixel_Sizes)

The api to the text functions is pretty limited, and one of the major limitations is the assumption that the input location is the upper left corner of the rendered text. It would be better if it was the baseline of the (first) line of the text. This api works pretty well to blat a bit of text onto an image, but less well as the text renderer for high quality blocks of text.

Spacing = 4 is a strange default, likely chosen so that small sized blocks of text would look reasonable. It is, in effect, a leading that's really big for small fonts and ridiculously small for larger ones. (it doesn't really help that there are 2 definitions of leading that are used in the wild)

The core baseline-baseline size could go based on the y-em size, but that's likely to be a minor change.

@v-python
Copy link

"Yes, but..." :)

I did stumble on to some references to freetype documentation, but didn't find the link you pointed at, thanks for that.

I stand by my conclusions above. While I can't argue with the fact that the freetype documentation calls x_ppem and y_ppem "pixels", it is very clear from examination of the same font in different software, that whatever it is, it isn't the same pixels as the pixels returned for ascender and descender. Because the pixel values returned for ascender and descender are consistent with what Pillow puts in the image, we can conclude that they are, in fact, pixel values. But just because the freetype documentation claims x_ppem and y_ppem are in pixels, that doesn't make it true.... And likewise, the font size passed in to Pillow seems to be documented as points, but that is only true if the assumption is made somewhere that points = 1/72" and pixel = 1/96".

There is a link from your link that points off to the FT_FaceRec structure: http://www.freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_FaceRec

It appears that that structure is pretty well copied from the scalable font file, for scalable fonts... the units are in "font units", and mention is made of 2048 or 1000 of those font units for different formats of scalable fonts. This is consistent with other documentation for those font formats.

When my other graphics software (but it is an application, not a library, and not programmable, so it is fine for manually doing stuff, but to automate things, that is why I'm looking at Pillow) looks at the DejaVu Sans font the numbers it derives from the font make sense with common understanding of the scalable fonts, formats, and terminology. When Pillow/Freetype reports metrics, the metrics don't all make sense.

The DejaVu Sans font has an EM-square that easily contains the ascent - descent for the characters. The metrics reported by Pillow do not, and that is inconsistent with the common understanding of the scalable fonts, formats, and terminology. Therefore, there is a bug either in Pillow, Freetype, or the documentation of one or the other, or both.

Now the values in the structure pointed to at the link I posted, would be very interesting to see and correlate with the values posted in #1540 (and I quoted in an earlier message).

The units_per_EM is the very basic measure of the font, and defines the font unit in which all the other dimensions are given in that chart. For scalable fonts, this is the starting point for scaling. From the descriptions of all the glyphs in the font, all given in units_per_EM, the font is scaled to any specified size (typically measured in points, with points being 1/72"). Of course, for generating actual bitmaps, the resolution of the bitmap display is another consideration. Typical desktop LCD displays today have 96dpi; earlier CRTs generally had lower resolution; apparently the first Mac had 72dpi, allowing the idea that 1pixel = 1point (which is true, sometimes, but certainly not always).

So if one is dealing with unit-free bitmaps, one should make/document the necessary assumption about the conversion factor between points and pixels, whatever it is. When reading an image, Pillow often learns its DPI; when writing an image, Pillow often allows specification of its DPI, but in-between times, Pillow seems to treat the images as unit-free bitmaps. And for font sizing, it seems to assume (but not document) 96dpi.

Somewhere, either in Pillow or Freetype, there are some conversion factors being applied, between font units, points, and pixels. Some of those conversion factors are either calculated or documented in error, because the EM square for DejaVu Sans is larger than ascent-descent. It also seems that descent, a negative value in font metrics, is getting converted to a positive number, so that ascent+descent is the span of character sizes. Height, in the font, is expected to be >= ascent-descent (from that font metric structure linked to above; when expressed in the same units). So when the basic characteristics of fonts are reported inconsistently, we can conclude there are bugs in the code and/or documentation. The font works well in many applications, so the bug is not in the font.

I'd be happy to look at the code in both Pillow and Freetype, if someone points me at it, but I am not familiar with either code base, and it can take a long time to find things in unfamiliar code.

Right off, though, if you look at the relationships between the numbers in the font definition, in font units, then those relationships should be preserved if an equal scale factor is applied to convert them all to points, pixels, mm, inches, or whatever unit is of interest. But the reported metrics quoted above are confused, so an equal scale factor has not been applied.

@v-python
Copy link

I found some font tools from M$ at https://www.microsoft.com/typography/tools/tools.aspx
Using ttfdump to dump the DejaVuSans.ttf file, I find the following information:

From 'head' table:
unitsPerEm: 2048
xMin: -2090
yMin: -850
xMax: 3442
yMax: 2389

This x/y Min/Max is the box that would bound all the characters in the font. So we see that some characters do exceed, even far exceed, the size of the EM square. Not clear which ones, though, from this data, or how commonly used those characters are.

From the 'hhea' table:

yAscender:            1901
yDescender:           -483
yLineGap:             0
advanceWidthMax:      3554
minLeftSideBearing:   -2090
minRightSideBearing:  -1455
xMaxExtent:           3442

These top two should correlated to the ascent and descent from Pillow. The x_ppem and y_ppem should correlate to the 2048 from the preceding chart by the same scale factor.

@v-python
Copy link

v-python commented Jan 11, 2016

This page helps: http://chanae.walon.org/pub/ttf/ttf_glyphs.htm particularly the following quoted text:

the internal leading:

this concept comes directly from the world of traditional typography. It represents the amount of space within the "leading" which is reserved for glyph features that lay outside of the EM square (like accentuation). It usually can be computed as:

internal leading = ascent - descent - EM_size

the external leading:

this is another name for the line gap.

So maybe I had some misperceptions about where accent marks would go with respect to the EM square.

So I wrote some code:

f1 = ImageFont.truetype("DejaVuSans.ttf", 40 )
f2 = ImageFont.truetype("DejaVuSans.ttf", 400 )
f3 = ImageFont.truetype("DejaVuSans.ttf", 4000 )
f4 = ImageFont.truetype("DejaVuSans.ttf", 40000 )
print( f1.getmetrics())
print( f2.getmetrics())
print( f3.getmetrics())
print( f4.getmetrics())

which produces:

(37, 9)
(371, 94)
(3713, 943)
(37129, 9434)

I wrote some other code using https://github.com/behdad/fonttools because with it, one can actually get the numbers straight out of the font tables. For DejaVuSans.ttf, it produces the unitsPerEm as 2048, the ascent as 1901, and the descent as -483.

For a 40 pixel font, the ratio of EM to 40 pixels is 2048 / 40 = 51.2.
Applying this ratio to all the numbers we get:

EM: 2048 / 51.2 = 40.0
ascent: 1901 / 51.2 = 37.12890625
descent: 483 / 51.2 = 9.43359375

So we can see that the larger fonts get sizes successively closer to the ratios here.

So I don't know why my Pillow 3.0.0 on Windows produces (37, 9) for my metrics, when #1540 got (38,10). I don't have the other metrics available in my released version of Pillow, I guess.

Now the other interesting numbers to come out of fonttools are the actual character bounding boxes.
Note that the bounding box is given as 4 entries, all as font unit dimensions from the origin position of the character, yMax, yMin, xMax, xMin. For all of capital A B Y, the yMax was 1493 and the yMin was 0. For lowercase y, yMax was 1120, and yMin was -426.

Note that these numbers do not reach the overall "ascent" or "descent" for the font as a while.

Scaling them by our ratio, we get:

1493 / 51.2 = 29.16015625
1120 / 51.2 = 21.875
426 / 51.2 = 8.3203125

So we would expect a cap A and lowercase y generated with Pillow to have these dimensions, in pixels, for a 40pt font. So I wrote some more code:

f1 = ImageFont.truetype("DejaVuSans.ttf", 40 )
f2 = ImageFont.truetype("DejaVuSans.ttf", 400 )
f3 = ImageFont.truetype("DejaVuSans.ttf", 4000 )
f4 = ImageFont.truetype("DejaVuSans.ttf", 40000 )
for fN in ( f1, f2, f3, f4 ):
  print( fN.getmetrics())

samp = 'Yy'
for N, fN in enumerate(( f1, f2, f3, f4 )):
  im = Image.new('1', (50 \* ( 10 *\* N ), 50 \* ( 10 *\* N )), 'white')
  tx, ty = fN.getsize( samp )
  print('size:', tx, ty )
  draw = ImageDraw.Draw( im )
  draw.text(( 0, 0 ), samp, font=fN, fill=0 )
  im.save('out%d.tif' % N, dpi=(72.0, 72.0), compression='group4')

The first Y is 28 pixels tall. The first y is 21 above the baseline, and 8 below.
The second Y is 291 pixels tall. The second y is 218 above the baseline, and 83 below.
The third Y is 2916 pixels tall. The third y is 2188 above the baseline and 832 below.
The fourth Y is 29160 pixels tall. The fourth y is 21875 above the baseline and 8320 below.

So I alter my conclusions. The Pillow "font size", specified by "points" in the documentation, assumes 1 point = 1 pixel. FreeType also assumes 1 point = 1 pixel. The characters generated exactly fit the font characteristics, with this assumption, that 1 point = 1 pixel.

What doesn't fit:
The assumption of 1 point = 1 pixel should be clearly stated in the font API documentation.

ascent and descent is not specific to the characters generated. It would be nice if this were also stated in the documentation. All textsize is really measuring is the width. This makes it fast, but less accurate. I have no idea if either FreeType or or newer versions of Pillow expose the bounding boxes for the characters, but they are available in the font file. [hmm. I hadn't tested this exactly, it was more based on what other people said, and the gross assumption in multiline_textsize. See next comment]

The use of "ascent + 4" in multiline_textsize for the line height is pure foolishness. The line height used internally should be the height metric exposed in #1540 (except I'm not sure why those numbers are different from what I'm seeing in Pillow 3.0.0) (The height metric is the sum of the current values returned by getmetric(), at least approximately, or is the second return value of textsize("Yy") [rather than textsize("A")].).

The "spacing" parameter should default to 0, and be documented as extra "pixels" between lines should the user specify it.

Of course, changing the internal line height in multiline_textsize, and the default value for spacing would be incompatible changes.

The workaround:

Horrendous documentation for multiline_textsize, that documents the current foolishness, so that the user can work-around it, by calculating and supply a value for spacing as follows:

textsize("Yy")[1] - textsize("A")[1] + userSpecifiedExtraPixels
Now that experimentation and measurement have determined what is really going on,

@v-python
Copy link

v-python commented Jan 11, 2016

There was already too much in the previous comment. I wrote some more code, regarding the Y dimension of the height:

f1 = ImageFont.truetype("DejaVuSans.ttf", 40 )
f2 = ImageFont.truetype("DejaVuSans.ttf", 400 )
f3 = ImageFont.truetype("DejaVuSans.ttf", 4000 )
f4 = ImageFont.truetype("DejaVuSans.ttf", 40000 )

for N, fN in enumerate(( f1, f2, f3, f4 )):
  print( fN.getmetrics())
  for sampN in ('Yy', 't', 'a', 'i', 'y'):
    tx, ty = fN.getsize( sampN )
    print('size:', sampN, tx, ty )

which produces:

(37, 9)
size: Yy 48 45
size: t 16 37
size: a 25 37
size: i 11 37
size: y 24 45
(371, 94)
size: Yy 481 454
size: t 157 371
size: a 245 377
size: i 111 371
size: y 237 454
(3713, 943)
size: Yy 4810 4545
size: t 1568 3713
size: a 2451 3770
size: i 1111 3713
size: y 2367 4545
(37129, 9434)
size: Yy 48106 45449
size: t 15684 37129
size: a 24512 37695
size: i 11113 37129
size: y 23672 45449

Two surprises:

  1. Not all lower case letters report the same height, I sort of expected all to be the same "ascent" value.
  2. y produces the same height as Yy.

This indicates that Freetype probably tries to return actual heights, and there are bugs. The actual numbers for the bounding boxes for some characters obtained using fontTools is as follows:

DejaVuSans.ttf
  ascent: 1901  descent: -483  lineGap: 0  unitsPerEm: 2048
  global bounding box: yMax: 2389  yMin: -850  xMax: 3442  xMin -2090
  height: 2384  internal leading: 336
A: yMax: 1493  yMin: 0  xMax: 1384  xMin 16
B: yMax: 1493  yMin: 0  xMax: 1260  xMin 201
Y: yMax: 1493  yMin: 0  xMax: 1255  xMin -4
a: yMax: 1147  yMin: -29  xMax: 1069  xMin 123
t: yMax: 1438  yMin: 0  xMax: 754  xMin 55
i: yMax: 1556  yMin: 0  xMax: 377  xMin 193
y: yMax: 1120  yMin: -426  xMax: 1151  xMin 61
Aring: yMax: 1901  yMin: 0  xMax: 1384  xMin 16
aring: yMax: 1798  yMin: -29  xMax: 1069  xMin 123

So t and i should eventually diverge in height. "y" shouldn't produce the same height as "Yy".

@v-python
Copy link

And if single line textsize becomes accurate with respect to the particular characters both above and below the baseline, then the baseline still needs to be exposed, and only the top and bottom lines would need Y calculations. The top line should calculate its size above its baseline, and the bottom line should calculate its size below its baseline, and those two numbers added to lineheight * (numberlines-1).

@v-python
Copy link

The workaround posted above is limited in usefulness:

Horrendous documentation for multiline_textsize, that documents the current foolishness, so that the user can work-around it, by calculating and supply a value for spacing as follows:

textsize("Yy")[1] - textsize("A")[1] + userSpecifiedExtraPixels
Now that experimentation and measurement have determined what is really going on.

The above works well enough for English text, but for text with accent marks the following would be better:

sum( getmetrics()) - textsize("A")[1] + userSpecifiedExtraPixels

Note that both of them depend on textsize("A")[1] because that is what is erroneously used inside the code in lieu of "line height". sum( getmetrics()) works, but the returned line height from the font would be better in versions that expose it.

@v-python
Copy link

Well, neither of those workarounds quite solve the whole problem, because the baseline isn't visible to multiline_text[size], and so the height is variable, depending on the text on each line, and it is hard to know how to fit it to evenly spaced lines, and multiline_text[size] just don't achieve that goal. The code in the repository is better than the released Pillow 3.0.0, and the workarounds work with the repository code to some extent, but not really good enough to help the user position text of various sizes on the same baseline.

Looking deeper into the PIL code, I see that ImageDraw.getsize in ImageDraw calls ImageFont.getsize in ImageFont.py calls font_getsize in _imagingft.c, which in turn, calls the FreeType library, renders glyphs, and determines their metrics. And since it gets those metrics from the .ttf file, they are correct... the question is, where is the information lost or garbled, such that multiline_text[size] can't figure out how to evenly space the lines? Note that multiline_textsize returns a bounding width/height, but it has no sense of origin or baseline, and the actual widths/heights are somewhat related to the characters within, but there is a gap at the top, usually, but it didn't seem all that predictable in size either.

From the bottom up:
font_getsize actually calculates the vertical half of a bounding box: yMax, yMin. It effectively does that in the horizontal direction too: some characters are designed to overlap into margins, so as well as calculating the span of (simply) kerned characters, it also notices any overlap into the margins, and adds that it. For embedding text into an image, or a box inside an image, that seems appropriate.

In the Y direction, it also calculates the distance from the baseline to the top of the tallest character, calling it yoffset. It is not exactly clear how this differs from yMax, but the calculation is done separately from different metrics. yMax is calculated from the bounding box of each glyph, whereas yoffset uses a glyph metric called horiBearingY.

But does it return a standard bounding box based on the baseline, even though it calculates all the right numbers and could? No! Instead, it calculates the Y height as yMax-yMin, losing the concept of baseline. On the other hand, it subtracts the yoffset from the ascent, and returns that as part of a second tuple of info, the first being the width×height. The other number in the second tuple, called offset, is the negative offset of the first character in the string, if it is negative. Some characters seem to poke a pixel to the left of the origin at some font sizes, and not at others. If the first one does, that number, a negative number, is return as the first item in the offset tuple, and otherwise zero.
So, from this pair of tuples returned from font_getsize, we could reverse calculate a baseline position.

Now the code that gets this pair of tuples, ImageFont.getsize, simply adds them together and returns that as the width and length. For the X direction, that actually makes the width narrower, because X offset may be negative. One example I found of where the X offset was -1, there were several white pixel columns on the other end, so whether this narrowing matters, or is compensatory, I'm not sure. But when the text is drawn with ImageDraw.text, one pixel was lost off the left edge.

For the Y direction, the sum results in their being white space above the text, when the the Y offset is positive. This means that the "width×height" doesn't really reflect the height of the text, but it does mean that for the top line, the baseline would match the baseline of the top line of other text blocks written at the same Y position. While that sounds good, and probably is for some use cases, the existence or non-existence of descenders in the top line of text will affect the position of the next line of text, and so forth. So the baselines don't stay aligned in multiple blocks of text, and the baseline-to-baseline height varies, and by calling only ImageFont.getsize, ImageDraw.text and especially ImageDraw.multiline_text has insufficient information to prevent that, or even to know where the baseline is, as presently coded. Since ImageDraw.text is only dealing with a single line of text, some use cases might want to fit one line into a box with the largest possible font size being used. That means the font size should be selected based on the actual data, and a line of text with all lower-case letters and no descenders could use a bigger font size than a line of text with descenders, or with accents, or with capitals, or with accented capitals.

Other cases, where alignment of lines of text is important, should draw the text on evenly spaced baselines... but neither ImageDraw.text[size] nor ImageDraw.multiline_text[size] provide sufficient information to the user to allow the user to determine the baseline. If the user uses ImageDraw.text consistently, and calculates the top of the text area based on the ascent from getmetrics(), then text can be aligned appropriately. But to achieve that, the user cannot use multiline_textsize to estimate the size, nor multiline_text to draw it, having to implement their own code to achieve the goal, and dangling their text from the invisible, conceptual baseline+ascent, instead of the invisible, conceptual baseline! Since most users learned to write on ruled pads of paper, putting their characters on baselines, this is a more natural perspective.

Happily, the ImageFont.getsize API could be called from ImageDraw.multiline_text, and allow it to learn enough to position text appropriate. Sadly, the current code and API probably cannot be salvaged to be useful. Hopefully, my next comment will include replacement code for ImageDraw.text[size] and ImageDraw.multiline_text[size], but will probably have to be called something else, if it gets included in Pillow, as I see no way to match the APIs and produce useful results.

Abandoning the existing APIs actually means that the replacement code could work with many versions of PIL, as an add-in, or monkey-patch, so there is a benefit to that, too.

@v-python
Copy link

See #1660 for my PillowPatch.py code, which provides reasonable text drawing capabilities for Pillow, by monkey patching them in to Pillow.

@QasimK QasimK mentioned this issue Jan 14, 2016
@PanderMusubi
Copy link

In which release will this be fixed?

@aclark4life
Copy link
Member

Wow lot's of info here but no formal resolution?

@aclark4life aclark4life added this to the 4.1.0 milestone Jan 5, 2017
@aclark4life aclark4life added the Bug Any unexpected behavior, until confirmed feature. label Jan 5, 2017
@aclark4life
Copy link
Member

@troygrosfield Anything you can do with this one based on the info from @v-python ?

@troygrosfield
Copy link
Author

@aclark4life, I just hacked a workaround that worked for me and moved on. Haven't touched much of this since.

@v-python
Copy link

v-python commented Jan 6, 2017

#1660 has a "monkey" patch that still works on the latest Pillow; I tried to integrate it, but having never used GitHub before, and the build system for Pillow not being Windows friendly, I got stalled. #1915 was someone else trying to integrate the code using Linux, it got some review comments, I think they got it to work, but not sure it got past the comments and the need for tests, and then someone tweaked one of the files, and it rotted a little... I doubt it is major, since the "monkey" patch still works... If the process for building and testing on Windows were documented enough that I could follow it, I'd try to follow up. I'd really like to see it integrated before the "monkey" patch breaks...

@radarhere radarhere removed this from the 4.2.0 milestone Oct 4, 2017
@392781
Copy link

392781 commented Aug 14, 2020

I guess in the same vein as the original issue, I'm running into monospace width problems where using ImageDraw.Draw.text draws a monospace text with differing lengths between lines like so:

image

As you can see, there is a variation in character alignment that's in some cases not just a monospace width apart but more like .3 or .5 of a space.

Is this an issue that's been documented before? I couldn't find it in the issues listed on the Pillow GitHub.

@nulano
Copy link
Contributor

nulano commented Aug 14, 2020

@392781 What font are you using?

@392781
Copy link

392781 commented Aug 15, 2020

@nulano Fixedsys Excelsior 3.01. I have a suspicion it's a problem with the font. However, after testing it in a regular editor it seems to be monospaced properly so I'm not sure what's going on.

@392781
Copy link

392781 commented Aug 15, 2020

I may have found a partial answer here: https://github.com/kika/fixedsys

In the README it mentions the original font working only for a specific font size with antialiasing turned off. It still doesn't explain why the font seems to work for different sizes when used in an editor but is broken when loaded into Pillow.

@nulano
Copy link
Contributor

nulano commented Aug 15, 2020

Can you share your code? I have only found one reproduction so far:

fnt = ImageFont.truetype("FSEX302.ttf", 43, layout_engine=ImageFont.LAYOUT_BASIC)
draw.fontmode = "1"  # or image = Image.new("1", ...); draw = ImageDraw.Draw(image)
draw.text((0,0), "abc[\n###[\n@@@[\nMMM[", font=fnt)

I can't reproduce using either Raqm, or draw.fontmode = "L" (default for any image except mode 1). I'll take a closer look at the example I found for now.

@nulano
Copy link
Contributor

nulano commented Aug 15, 2020

Checking font.getsize with a debugger I have found that this is in fact an issue with the font.

For example, the width of # is 70 in font units with a 10 font unit bearing (gap) according to FT_Load_Glyph with FT_LOAD_NO_SCALE. This corresponds to 7+1px at font size 16px. Scaling the font to 43px gives width 7px/16*43=18.8125 and bearing 1px/16*43=2.6875, which are rounded to 19px and 3px respectively by FreeType's hinter (otherwise the glyph would be blurry).

The @ symbol has the width 80 font units with no bearing, which corresponds to 21.5px and is rounded down to 21px. This differs from the 19+3 above and breaks the spacing of the font.

I think the font could avoid this by setting the size of all glyphs to 80 font units (i.e. include the gaps in the glyph bounds).

This problem disappears if you use Raqm layout which doesn't use hinting during text layout; other programs likely use a different hinting algorithm.

@392781
Copy link

392781 commented Aug 15, 2020

@nulano Thanks for the info (and going above and beyond tbh, I know next to nothing about font processing). I kinda figured it might be the font. I was using it for an ascii art generator I've been working on. Kinda unfortunate because the problem cropped up after I added improvements to it. I was avoiding the issue by using a really large font size and then just rescaling the image.

@nulano
Copy link
Contributor

nulano commented Aug 15, 2020

The examples in your repo look really good! I would suggest you try installing Raqm, that might sidestep the spacing issue as it ignores hinting for text layout, only rendering (after the glyphs are laid out).

@392781
Copy link

392781 commented Aug 15, 2020

I'll look into it :^) Also thanks for your kind words... It's a small project that I've always wanted to finish (or at least polish off and maybe add to every once in a while).

@makew0rld
Copy link

makew0rld commented Apr 4, 2021

The Pillow spacing is baseline to baseline, which is the appropriate measure for line spacing.

Comment

@wiredfool sorry to come back to this so late, but is it still true that this is how Pillow does line spacing? Is it different if raqm is used?

@FeeeeK
Copy link

FeeeeK commented Oct 5, 2021

Will this ever be fixed? This strange solution creates a lot of problems when you need to resize images to fit text.

from PIL import Image, ImageDraw, ImageFont

font = ImageFont.truetype("tnr.ttf", size=80, encoding="unic")
text = "{{{{\n}}}}"

size = font.getsize_multiline(text)
image = Image.new("RGB", size, color="#FFF")
draw = ImageDraw.Draw(image)
draw.multiline_text((0, 0), text, "#000", font)
image.save("a.png")

a

@arnavmehta7
Copy link

arnavmehta7 commented Sep 23, 2022

Hey, when I try to specify the size in truetype the result is not really correct. It is not same as fontsize. Anyway of correcting this out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Any unexpected behavior, until confirmed feature. Conversion
Projects
None yet
Development

Successfully merging a pull request may close this issue.