Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: 'charmap' codec can't encode characters #999

Closed
T0T4R4 opened this issue May 14, 2023 · 21 comments
Closed

Error: 'charmap' codec can't encode characters #999

T0T4R4 opened this issue May 14, 2023 · 21 comments
Assignees
Labels
🐛 Bug Something isn't working ❔ Need more info

Comments

@T0T4R4
Copy link
Contributor

T0T4R4 commented May 14, 2023

Hi !

I'm following your tutorial for fine-tuning, but received the following error during training on my GPU :

  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1273, in train
    validation_results_tuple = self._validate_epoch(epoch=epoch, silent_mode=silent_mode)
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1759, in _validate_epoch
    return self.evaluate(
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1870, in evaluate
    sg_trainer_utils.display_epoch_summary(
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\super_gradients\training\utils\sg_trainer_utils.py", line 257, in display_epoch_summary
    summary_tree.show()
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\treelib\tree.py", line 854, in show
    print(self._reader)
  File "C:\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 20-22: character maps to <undefined>

I tried to just comment the show() function, but at then end of the whole training i got that error :

Exception ignored in atexit callback: <function reset_all at 0x000001DE4A6357E0>
Traceback (most recent call last):
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\colorama\initialise.py", line 34, in reset_all
    AnsiToWin32(orig_stdout).reset_all()
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\colorama\ansitowin32.py", line 189, in reset_all
    self.wrapped.write(Style.RESET_ALL)
  File "C:\Python310\lib\codecs.py", line 378, in write
    self.stream.write(data)
TypeError: write() argument must be str, not bytes

Environment:

  • OS : Windows 11
  • GPU : RTX 2060
  • Super Gradients version 3.1.1
  • Python 3.10, environment :
 absl-py==1.4.0
alabaster==0.7.13
antlr4-python3-runtime==4.9.3
attrs==23.1.0
Babel==2.12.1
boto3==1.26.126
botocore==1.29.126
build==0.10.0
cachetools==5.3.0
certifi==2022.12.7
chardet==4.0.0
charset-normalizer==3.1.0
click==8.1.3
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.0.7
coverage==5.3.1
cycler==0.10.0
Deprecated==1.2.13
docutils==0.17.1
einops==0.3.2
flatbuffers==23.3.3
fonttools==4.39.3
future==0.18.3
google-auth==2.17.3
google-auth-oauthlib==1.0.0
grpcio==1.54.0
humanfriendly==10.0
hydra-core==1.3.2
idna==2.10
imagesize==1.4.1
Jinja2==3.1.2
jmespath==1.0.1
json-tricks==3.16.1
jsonschema==4.17.3
kiwisolver==1.4.4
Markdown==3.4.3
markdown-it-py==2.2.0
MarkupSafe==2.1.2
matplotlib==3.7.1
mdurl==0.1.2
mpmath==1.3.0
numpy==1.23.0
oauthlib==3.2.2
omegaconf==2.3.0
onnx==1.13.0
onnx-simplifier==0.4.28
onnxruntime==1.13.1
opencv-python==4.7.0.72
packaging==23.1
pandas==2.0.1
Pillow==9.5.0
pip-tools==6.13.0
protobuf==3.20.3
psutil==5.9.5
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycocotools==2.0.4
pyDeprecate==0.3.2
Pygments==2.15.1
pyparsing==2.4.5
pyproject_hooks==1.0.0
pyreadline3==3.4.1
pyrsistent==0.19.3
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3
PyYAML==6.0
rapidfuzz==3.0.0
requests==2.30.0
requests-oauthlib==1.3.1
requests-toolbelt==1.0.0
rich==13.3.5
roboflow==1.0.8
rsa==4.9
s3transfer==0.6.0
scipy==1.10.1
seaborn==0.12.2
sentry-sdk==1.22.2
six==1.16.0
snowballstemmer==2.2.0
Sphinx==4.0.3
sphinx-rtd-theme==1.2.0
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
stringcase==1.2.0
super-gradients @ https://files.pythonhosted.org/packages/16/5b/a3e31ec12a6ce662ed8275f56cc9435a12d0d868109e24703a823ea88581/super_gradients-3.1.1-py3-none-any.whl#sha256=287476390285c31b69dbbbe1d45fe9f1bb654106f1a7403dd4500a3fd53a0294
sympy==1.11.1
tensorboard==2.12.3
tensorboard-data-server==0.7.0
termcolor==1.1.0
thop==0.1.1.post2209072238
tomli==2.0.1
torch @ file:///C:/Users/clert/Downloads/torch-1.13.1%2Bcu117-cp310-cp310-win_amd64.whl#sha256=978239684c6ec455ad2157ff33d44fdb9dd8d3a93b9d2f4ac7aa57691e990136
torch-tb-profiler==0.4.1
torchaudio @ file:///C:/Users/clert/Documents/_Dev/yolo-nas/torchaudio-0.13.1%2Bcu117-cp310-cp310-win_amd64.whl#sha256=2d821e7da413b193ed9acf59c9a4d2ae8704df4c6ff722da0fee77f569b11703
torchmetrics==0.8.0
torchvision @ file:///C:/Users/clert/Documents/_Dev/yolo-nas/torchvision-0.14.1%2Bcu117-cp310-cp310-win_amd64.whl#sha256=b39fc67e7131053d435804d7901e88528611c0832fd9f1cc26476b5a27cc5d81
tqdm==4.65.0
treelib==1.6.1
typing_extensions==4.5.0
tzdata==2023.3
ultralytics==8.0.99
urllib3==1.26.15
Werkzeug==2.3.3
wget==3.2
wrapt==1.15.0
@dagshub
Copy link

dagshub bot commented May 14, 2023

@Louis-Dupont
Copy link
Contributor

Louis-Dupont commented May 14, 2023

Hi @T0T4R4
Can you please add the snippet of code ?

Meanwhile, you can try:

training_hyperparams['silent_mode'] = True
Trainer.train(..., training_hyperparams)

@T0T4R4
Copy link
Contributor Author

T0T4R4 commented May 14, 2023

Should I follow up the discussion on DashHub or here ?

Thanks @Louis-Dupont , looks like by changing the silent_mode back to True (I had it explicitely on False to follow with what happens while training and estimate the time it takes), the code ended gracefully. So thanks for the tip ! 👍

Still, it would be nice to have it working while disabled :)

@Louis-Dupont Louis-Dupont added the 🐛 Bug Something isn't working label May 15, 2023
@Louis-Dupont
Copy link
Contributor

Let's continue here, it's more convenient for people to see the discussion :)

Definitely, I am just not fully sure what causes it. If you have a few seconds to run the following code it would help me understand what happens:

from treelib import Tree

train_tree = Tree()
train_tree.create_node("Training", "Training")

summary_tree = Tree()
summary_tree.create_node("MAIN", "Summary")
summary_tree.paste("Summary", train_tree)
summary_tree.show()
from termcolor import colored

print(colored("Training", color="green"))
print("↗")

And everything together

from treelib import Tree
from termcolor import colored

train_tree = Tree()
train_tree.create_node(colored("Training ↗", color="green"), "Training")

summary_tree = Tree()
summary_tree.create_node("MAIN", "Summary")
summary_tree.paste("Summary", train_tree)
summary_tree.show()

In theory, the last one should fail, and hopefully we can isolate which steps leads to it with the first 3 tests.
If it's the color or the arrow, we can simply add an option to deactivate it. If it's the tree library, then it's a bit more work because we would need to find an alternative way to display the results.

@Louis-Dupont Louis-Dupont self-assigned this May 15, 2023
@Louis-Dupont
Copy link
Contributor

Another thing you can try is to set the environment variable PYTHONIOENCODING=utf8

@T0T4R4
Copy link
Contributor Author

T0T4R4 commented May 16, 2023

Thanks @Louis-Dupont there is no problem with the display of trees, I ran your piece of code successfully.
Will try the env var during my next training...

@Louis-Dupont
Copy link
Contributor

Thanks! 🙏
If you can also run this and share the result: import sys; print(sys.getdefaultencoding()

@Satyajit1993
Copy link

Screenshot (13)

UnicodeEncodeError: 'charmap' codec can't encode characters in position 21-23: character maps to

Python = 3.10
cuda=11.7
pytorch = 1.13.1

@T0T4R4
Copy link
Contributor Author

T0T4R4 commented May 19, 2023

Thanks! 🙏 If you can also run this and share the result: import sys; print(sys.getdefaultencoding()

it's already UTF-8

@sewty
Copy link

sewty commented May 19, 2023

Hey all! I ran into a similar issue to @T0T4R4 earlier today. I'm running within a similar environment (torch = 1.13.1+cu117, super-gradients=3.1.1, python=3.10.10). I did a bit of tinkering and wanted to share some results, as well as possible workaround for now.

First off, I tried setting environment variable PYTHONIOENCONDING = 'utf8' and PYTHONUTF8 = 1 independently, but neither of these seemed to work. I then tried the treelib snippets from @Louis-Dupont. I was able to pass each of these test cases, however, I found that this only worked if I ran them completely separate from the original trainer.train(). Essentially, if trainer.train() had been called in a previous notebook cell and failed, cases 1 and 3 of the treelib snippets would also fail with a similar UnicodeEncodeError (complained about positions 6-8 instead of 20-22).

Then I went to try modifying the source code on my own copy of the super-gradients package. Commenting out summary_tree.show() did allow the training and validation to finish successfully, but obviously, I was unable to view the output. (NOTE: I did not run into the secondary error about a TypeError as experienced by @T0T4R4)

To get the function to work without error and with silent_mode = False (so I could view the output), I added the statement sys.stdout = sys.__stdout__ before summary_tree.show() instead of commenting it out (inspired by the fix here: #1021 - a different issue I was having, but possibly related). This allowed the training and validation to finish successfully and show the treelib output.

FWIW It appears to me that the encoding is being switched to cp1252 from utf8 somewhere along the way, but it not successfully converted back to utf8 before summary_tree.show(). However, I don't know much at all about charsets so take this with a grain of salt. Hope something in here helps!

@T0T4R4
Copy link
Contributor Author

T0T4R4 commented May 19, 2023

Hi @sewty , you described exactly what we both did ;) and what I think as well happens !!
Pretty much at this stage, I'm just commenting out the call to show() as well 😅

Note that I also tried to change all calls to open a file by specifying the UTF8 encoding, and that didn't make any difference...
On my end, the default encoding is UTF-8 anyway...

@sewty
Copy link

sewty commented May 19, 2023

Yea @T0T4R4 its strange that neither setting environment variables nor changing all calls to open a file as you said aren't maintaining the default encoding. To be clear, my final solution did not have summary_tree.show() commented out. It looks like this:

At the very end of function display_epoch_summary in sg_trainer_utils.py:

summary_tree = Tree()
summary_tree.create_node(f"SUMMARY OF EPOCH {epoch}", "Summary")
summary_tree.paste("Summary", train_tree)
summary_tree.paste("Summary", valid_tree)
sys.stdout = sys.__stdout__
summary_tree.show()

I would give this a try, along with silent_mode=False if you haven't already. The idea here is to manually set the encoding to 'utf-8' with sys.stdout = sys.__stdout__ before the call to summary_tree.show() which doesn't like the cp1252 encoding you're seeing (if I understand correctly).

@T0T4R4
Copy link
Contributor Author

T0T4R4 commented May 19, 2023

ok so my default encoding is actually 1252....

import locale
print( locale.getpreferredencoding())

returns
cp1252

For Py 3.10.... I can read "Python opens source files as UTF-8 by default, but any interaction with the filesystem will depend on the environment. It's strongly recommended to use open(filename, encoding='utf-8') to read a file." so my initial approach trying to update all calls to open() to add UTF-8 might be the way...

For now I have edited treelib's tree.py on line 930 to force the UTF-8 encoding, and it passed.

if stdout:
    import sys
    sys.stdout.reconfigure(encoding='utf-8')
    print(self._reader)
else:
    return self._reader

Edit: investigating up the 🪜 ladder, moving those 2 lines to sg_trainer_utils.py in _display_epoch_summary works as well. And at least does not clutter treeelib.

@T0T4R4
Copy link
Contributor Author

T0T4R4 commented May 19, 2023

Ok I'm on this stuff since 2 hours 😅

As soon as the module initializes, it changes the charset to cp1252....

import sys
print(sys.stdout.encoding)

returns utf-8

from super_gradients.training import Trainer
print(sys.stdout.encoding)

returns cp1252

...🤔

@T0T4R4
Copy link
Contributor Author

T0T4R4 commented May 19, 2023

After having spent so much time in the logging block.... it wasn't there 😅 !!

Just found the culprit:

common/abstractions/mute_processes.py

line 30 , mute_current_process()

must add encoding when opening a file 😁

sys.stdout = open(os.devnull, "w")
to
sys.stdout = open(os.devnull, "w", encoding="utf-8")

T0T4R4 added a commit to T0T4R4/super-gradients that referenced this issue May 19, 2023
@T0T4R4
Copy link
Contributor Author

T0T4R4 commented May 19, 2023

PR pending @Louis-Dupont 🙂

@Louis-Dupont
Copy link
Contributor

@T0T4R4 @sewty I merged the fix to master, does that completely fix the treelib issue?

@Satyajit1993
Copy link

@Louis-Dupont 'charmap' codec can't encode characters error is solved.

Thanks for the fix.

geoffrey-g-delhomme pushed a commit to geoffrey-g-delhomme/super-gradients that referenced this issue May 26, 2023
@mazatov
Copy link

mazatov commented Jun 2, 2023

@Louis-Dupont , I just ran into the same error. However I do get an error when I run the snippet you provided.

import sys; 
sys.getdefaultencoding()
'utf-8'

image

@skyprince999
Copy link

@mazatov - pls check


import locale
locale.getdefaultencoding()

This will give _cp152_

@BloodAxe
Copy link
Collaborator

Fixed in 3.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug Something isn't working ❔ Need more info
Projects
None yet
Development

No branches or pull requests

7 participants