Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set UTF-8 in Windows Terminal #11956

Closed
Jasper-zh opened this issue Dec 16, 2021 · 7 comments
Closed

How to set UTF-8 in Windows Terminal #11956

Jasper-zh opened this issue Dec 16, 2021 · 7 comments
Labels
Issue-Question For questions or discussion Needs-Attention The core contributors need to come back around and look at this ASAP. Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Resolution-Answered Related to questions that have been answered

Comments

@Jasper-zh
Copy link

CHCP 65001 has no effect, how do I set the terminal to UTF-8

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Dec 16, 2021
@237dmitry
Copy link

I set utf-8 system wide in intl.cpl

ss-20211216093122

@zadjii-msft
Copy link
Member

You gotta give us more details - what shell you're using, what OS version, what Terminal version, etc. etc.

@zadjii-msft zadjii-msft added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Dec 16, 2021
@eryksun
Copy link

eryksun commented Dec 16, 2021

With code page 65001, even Terminal Preview 1.12.2931.0 reads non-ASCII characters as null bytes. Who knows when Terminal and conhost will support UTF-8 properly. For now, implement a layer that translates between UTF-8 and UTF-16, and call ReadConsoleW() and WriteConsoleW(). We implemented that for Python a long time ago. Problem solved.

@ghost ghost added the No-Recent-Activity This issue/PR is going stale and may be auto-closed without further activity. label Dec 20, 2021
@ghost
Copy link

ghost commented Dec 20, 2021

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

@Jasper-zh
Copy link
Author

Jasper-zh commented Dec 22, 2021

image

This is my current version of the system.
Windows terminal actually uses cmd or powershell.
I usually use Powershell.

Untitled
Untitled
Cmd can be made effective by chcp 65001.
But powshell uses CHCP65001 to no effect, even though the actual CHCP output code is already 65001. However, utF-8 Chinese characters are still garbled.
The current solution is to turn on utf-8 in the language set by the system. This does work, but some old applications may not support utf8, so there will be some problems.

@ghost ghost added Needs-Attention The core contributors need to come back around and look at this ASAP. and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something No-Recent-Activity This issue/PR is going stale and may be auto-closed without further activity. labels Dec 22, 2021
@eryksun
Copy link

eryksun commented Dec 22, 2021

The console API doesn't support reading input as UTF-8, so it's a half-baked implementation. It only supports UTF-8 output, and that's only implemented properly for buffered writes in recent versions of conhost.

Reading input as UTF-8 is limited to ASCII, which is still the case even with Windows 11 and Windows Terminal Preview 1.12.3472.0. For example, here's a low-level os.read() in Python with the console input code page set to CP_UTF8 (65001):

>>> s = os.read(0, 10)
こんにちわ
>>> s
b'\x00\x00\x00\x00\x00\r\n'

Non-ASCII characters are replaced by null characters because the console's WideCharToMultiByte() call fails for each non-ASCII character. The call fails because the console doesn't allocate enough space for the 2-4 byte UTF-8 sequence.

@zadjii-msft
Copy link
Member

But powshell uses CHCP65001 to no effect, even though the actual CHCP output code is already 65001. However, utF-8 Chinese characters are still garbled.

This sounds like an issue with either PowerShell itself, or PSReadline, which should be filed over at https://github.com/powershell/powershell or https://github.com/powershell/psreadline. I'd suspect that updating to the latest version of both of those might just automatically fix this for you, I don't think PowerShell 5& PSReadline 2 (which ships with Windows) supported utf-8 input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Question For questions or discussion Needs-Attention The core contributors need to come back around and look at this ASAP. Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Resolution-Answered Related to questions that have been answered
Projects
None yet
Development

No branches or pull requests

4 participants