Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's use the Win32 API more precisely #147

Closed
wants to merge 2 commits into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Feb 25, 2019

For many Win32 functions, there actually exist two variants: one that takes const char * ("ANSI", meaning the current code page) and wchar_t * ("Unicode", i.e. UTF-16, at least for all practical matters).

These functions have "A" and "W" suffixes, respectively, e.g. GetFileAttributesW(). The symbols without this suffix are #defined to the *W() versions if the constant UNICODE is defined before including the Windows headers, and to *A() otherwise.

Let's not rely on this constant, but explicitly say what we want: we want the Unicode versions, as they seem to be used by the ANSI flavor anyway.

@dscho dscho added the ready to submit Has commits that have not been submitted yet label Feb 25, 2019
Previously, we would have obtained the user name encoded in whatever the
current code page is.

Note: the "user name" here does not denote the full name but instead the
short logon name.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Many Win32 API functions actually exist in two variants: one with
the `A` suffix that takes ANSI parameters (`char *` or `const char *`)
and one with the `W` suffix that takes Unicode parameters (`wchar_t *`
or `const wchar_t *`).

The ANSI variant assumes that the strings are encoded according to
whatever is the current locale. This is not what Git wants to use on
Windows: we assume that `char *` variables point to strings encoded in
UTF-8.

There is a pseudo UTF-8 locale on Windows, but it does not work
as one might expect. In addition, if we overrode the user's locale, that
would modify the behavior of programs spawned by Git (such as editors,
difftools, etc), therefore we cannot use that pseudo locale.

Further, it is actually highly encouraged to use the Unicode versions
instead of the ANSI versions, so let's do precisely that.

Note: when calling the Win32 API functions _without_ any suffix, it
depends whether the `UNICODE` constant is defined before the relevant
headers are #include'd. Without that constant, the ANSI variants are
used. Let's be explicit and avoid that ambiguity.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member Author

dscho commented Jun 27, 2019

/submit

@dscho dscho removed the ready to submit Has commits that have not been submitted yet label Jun 27, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2019

Submitted as pull.147.git.gitgitgadget@gmail.com

@@ -1407,7 +1407,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen
do_unset_environment_variables();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Thu, Jun 27, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Many Win32 API functions actually exist in two variants: one with
> the `A` suffix that takes ANSI parameters (`char *` or `const char *`)
> and one with the `W` suffix that takes Unicode parameters (`wchar_t *`
> or `const wchar_t *`).
>
> The ANSI variant assumes that the strings are encoded according to
> whatever is the current locale. This is not what Git wants to use on
> Windows: we assume that `char *` variables point to strings encoded in
> UTF-8.
>
> There is a pseudo UTF-8 locale on Windows, but it does not work
> as one might expect.

What does "does not work as one might expect" mean? The reader is left
hanging, not knowing why or how the UTF-8 locale on Windows is
undesirable.

> In addition, if we overrode the user's locale, that
> would modify the behavior of programs spawned by Git (such as editors,
> difftools, etc), therefore we cannot use that pseudo locale.
>
> Further, it is actually highly encouraged to use the Unicode versions
> instead of the ANSI versions, so let's do precisely that.
>
> Note: when calling the Win32 API functions _without_ any suffix, it
> depends whether the `UNICODE` constant is defined before the relevant
> headers are #include'd. Without that constant, the ANSI variants are
> used. Let's be explicit and avoid that ambiguity.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Eric,

On Thu, 27 Jun 2019, Eric Sunshine wrote:

> On Thu, Jun 27, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > Many Win32 API functions actually exist in two variants: one with
> > the `A` suffix that takes ANSI parameters (`char *` or `const char *`)
> > and one with the `W` suffix that takes Unicode parameters (`wchar_t *`
> > or `const wchar_t *`).
> >
> > The ANSI variant assumes that the strings are encoded according to
> > whatever is the current locale. This is not what Git wants to use on
> > Windows: we assume that `char *` variables point to strings encoded in
> > UTF-8.
> >
> > There is a pseudo UTF-8 locale on Windows, but it does not work
> > as one might expect.
>
> What does "does not work as one might expect" mean? The reader is left
> hanging, not knowing why or how the UTF-8 locale on Windows is
> undesirable.

Should I really bore the reader with half-details? At least one of those
behaviors was reported as an unbootable Windows 7, but some others could
not reproduce.

I really did not want to write a novel here that is 1) largely irrelevant
and 2) would take way too long to write because I forgot most of the
details and would have to look them up again.

Hopefully we can do without the full story here, as even one report that
it does not work as expected is enough to make this an unfeasible option
for us, and that's that as far as I am concerned.

Ciao,
Dscho

>
> > In addition, if we overrode the user's locale, that
> > would modify the behavior of programs spawned by Git (such as editors,
> > difftools, etc), therefore we cannot use that pseudo locale.
> >
> > Further, it is actually highly encouraged to use the Unicode versions
> > instead of the ANSI versions, so let's do precisely that.
> >
> > Note: when calling the Win32 API functions _without_ any suffix, it
> > depends whether the `UNICODE` constant is defined before the relevant
> > headers are #include'd. Without that constant, the ANSI variants are
> > used. Let's be explicit and avoid that ambiguity.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2019

This branch is now known as js/mingw-use-utf8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2019

This patch series was integrated into pu via git@3f0b1d7.

@gitgitgadget gitgitgadget bot added the pu label Jun 27, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@10d7f1f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@4893d96.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@1524073.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2019

This patch series was integrated into pu via git@c412472.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 2, 2019

This patch series was integrated into pu via git@355ba94.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into pu via git@a51bff8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into next via git@f528daf.

@gitgitgadget gitgitgadget bot added the next label Jul 3, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into pu via git@d8ba2de.

compat/mingw.c Show resolved Hide resolved
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2019

This patch series was integrated into pu via git@463742d.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 10, 2019

This patch series was integrated into pu via git@dc94250.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 12, 2019

This patch series was integrated into pu via git@0a2ff7c.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 12, 2019

This patch series was integrated into master via git@0a2ff7c.

@gitgitgadget gitgitgadget bot added the master label Jul 12, 2019
@gitgitgadget gitgitgadget bot closed this Jul 12, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 12, 2019

Closed via 0a2ff7c.

@dscho dscho deleted the ansi-unicode branch July 15, 2019 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant