Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datasets] UnicodeEncode error for € when drawing with default PIL font. #416

Closed
charlesmindee opened this issue Aug 18, 2021 · 2 comments
Labels
module: datasets Related to doctr.datasets type: bug Something isn't working
Milestone

Comments

@charlesmindee
Copy link
Collaborator

charlesmindee commented Aug 18, 2021

🐛 Bug

UnicodeEncode error for € when drawing with default PIL font.

To Reproduce

Steps to reproduce the behavior:

  1. Set font=None in the train loader of references/classification/train_tensorflow.py
  2. launch python references/classification/train_tensorflow.py mobilenet_v3_small --show-samples
Traceback (most recent call last):
  File "references/classification/train_tensorflow.py", line 245, in <module>
    main(args)
  File "references/classification/train_tensorflow.py", line 124, in main
    train_set = CharacterGenerator(
  File "/home/laptopmindee/doctr/doctr/datasets/classification/tensorflow.py", line 29, in __init__
    super().__init__(*args, **kwargs)
  File "/home/laptopmindee/doctr/doctr/datasets/classification/base.py", line 58, in __init__
    self._data = [synthesize_char_img(char, font_family=self.font_family) for char in self.vocab]
  File "/home/laptopmindee/doctr/doctr/datasets/classification/base.py", line 58, in <listcomp>
    self._data = [synthesize_char_img(char, font_family=self.font_family) for char in self.vocab]
  File "/home/laptopmindee/doctr/doctr/datasets/classification/base.py", line 36, in synthesize_char_img
    d.text((4, 0), char, font=font, fill=(255, 255, 255))
  File "/home/laptopmindee/venv3.8/lib/python3.8/site-packages/PIL/ImageDraw.py", line 469, in text
    draw_text(ink)
  File "/home/laptopmindee/venv3.8/lib/python3.8/site-packages/PIL/ImageDraw.py", line 429, in draw_text
    mask = font.getmask(
  File "/home/laptopmindee/venv3.8/lib/python3.8/site-packages/PIL/ImageFont.py", line 149, in getmask
    return self.font.getmask(text, mode)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 0: ordinal not in range(256)

Expected behavior

Display of samples

Environment

DocTR version: 0.3.1a0
TensorFlow version: 2.5.0
PyTorch version: 1.9.0+cu111 (torchvision 0.10.0+cu111)
OpenCV version: 4.5.1
OS: Ubuntu 18.04.5 LTS
Python version: 3.8
Is CUDA available (TensorFlow): Yes
Is CUDA available (PyTorch): Yes
CUDA runtime version: 11.4.100
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060
Nvidia driver version: 470.57.02
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.
@fg-mindee
Copy link
Contributor

Correct, I think we should add OS automatic detection and take a basic font that is compatible (Arial for windows & Mac for instance, FreeMono for Linux)

@charlesmindee
Copy link
Collaborator Author

Closed by #418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: datasets Related to doctr.datasets type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants