Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize the conversion between wchar_t* and Go string on Windows only #7

Merged
merged 1 commit into from
Jul 14, 2023

Conversation

koron
Copy link
Contributor

@koron koron commented Jul 12, 2023

This will fix #6

@koron koron marked this pull request as ready for review July 12, 2023 16:30
@sstallion
Copy link
Owner

sstallion commented Jul 13, 2023

@koron Thank you for your contribution! Could you provide a little more detail on the issue you're seeing? Changes look pretty good, though I do worry a little about some of the casting being done (namely interchanging wchar_t * and *uint16, which can have different alignment). This might need some deeper changes to support native UTF-16 on Windows.

@sstallion sstallion self-assigned this Jul 13, 2023
@sstallion sstallion added the bug Something isn't working label Jul 13, 2023
@sstallion sstallion mentioned this pull request Jul 13, 2023
@sstallion
Copy link
Owner

Thinking about this a little more, what would you think of just changing references to *C.wchar_t to unsafe.Pointer? We can keep the original names but that may be the safest way to avoid making assumptions about alignment.

@koron
Copy link
Contributor Author

koron commented Jul 13, 2023

Could you provide a little more detail on the issue you're seeing?

The root cause of the problem is C.wcstombs() and C.mbstowcs can't treat Chinese/Japanese/Korean in Windows.
It is hard to describe why those functions doesn't work,
it's a complex factors (code page, encoding, c-runtime, compiler used by CGO...) involved to dealing CGO in Windows.

namely interchanging wchar_t * and *uint16, which can have different alignment

And sizeof(wchar_t) is 2 in Windows. So those alignments are same between wchar_t and uint16.

See also: https://learn.microsoft.com/en-us/cpp/cpp/char-wchar-t-char16-t-char32-t?view=msvc-170

The wchar_t type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems.

Anyway, in Windows, what is needed is to convert between UTF-16 (wchar_t[]) and UTF-8 (Go string).
It is defined in syscall package for Windows or x/sys/windows package.
I thought it was a bit confusing that the syscall package had different contents for each OS, so I used the x/sys/windows package.

what would you think of just changing references to *C.wchar_t to unsafe.Pointer?

I guess the cast to unsafe.Pointer wasn't needed, I've removed it from the PR.

@sstallion
Copy link
Owner

The root cause of the problem is C.wcstombs() and C.mbstowcs can't treat Chinese/Japanese/Korean in Windows.
It is hard to describe why those functions doesn't work,
it's a complex factors (code page, encoding, c-runtime, compiler used by CGO...) involved to dealing CGO in Windows.

Gotcha. If you could add a comment to this effect and include a copyright header at the top, we should be good to go. Thanks again!

@koron
Copy link
Contributor Author

koron commented Jul 14, 2023

I added comments and update the PR.
Please check it.

Thanks.

@sstallion
Copy link
Owner

Thanks @koron! I'll get this merged ASAP.

@sstallion sstallion merged commit dcc5435 into sstallion:master Jul 14, 2023
8 checks passed
@koron koron deleted the fix-winapi-error-42 branch July 15, 2023 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

winapi error #42
2 participants