Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf-8/unicode support in legacy tools and lib. #800

Merged
merged 19 commits into from
Nov 22, 2023
Merged

Conversation

MarkCallow
Copy link
Collaborator

@MarkCallow MarkCallow commented Nov 13, 2023

Fixes #764.

@MarkCallow
Copy link
Collaborator Author

MarkCallow commented Nov 13, 2023

@aqnuep I want to remove the _TCHARs and && !defined(_UNICODE) from InitUTF8CLI. I cannot see any use for them beyond allowing someone to compile the tools with _mainw (or whatever it is called) passing the wchar_t** argv to InitUTF8CLI and avoiding the call to GetComandLineW. I don't see the point in the added complexity and I can't see us changing the way we compile the tools. In any case the code as is now does not support the usage I just described.

Thank you so much @aqnuep for the clean method you made to handle unicode file names on Windows. It is so much nicer than having generic text macros everywhere. I have completely removed those macros from everywhere. I'll await your thoughts before pushing a commit or not.

@MarkCallow MarkCallow changed the title Utf-8/unicode support in legacy tools. Utf-8/unicode support in legacy tools and lib. Nov 13, 2023
@aqnuep
Copy link
Collaborator

aqnuep commented Nov 14, 2023

@aqnuep I want to remove the _TCHARs and && !defined(_UNICODE) from InitUTF8CLI. I cannot see any use for them beyond allowing someone to compile the tools with _mainw (or whatever it is called) passing the wchar_t** argv to InitUTF8CLI and avoiding the call to GetComandLineW. I don't see the point in the added complexity and I can't see us changing the way we compile the tools. In any case the code as is now does not support the usage I just described.

Thank you so much @aqnuep for the clean method you made to handle unicode file names on Windows. It is so much nicer than having generic text macros everywhere. I have completely removed those macros from everywhere. I'll await your thoughts before pushing a commit or not.

The reason I left those in is to avoid any existing users of any subset of the repository components with _UNICODE. If we explicitly want to disallow using any component of the repository with _UNICODE then that has to be a conscious decision that would also include generating a configure-time error if _UNICODE is defined. But I'd rather not risk breaking any user code.

@MarkCallow
Copy link
Collaborator Author

Thank you for your review, @aqnuep. Since building with _UNICODE leads to build errors, and always has, we will not be breaking any user code by removing it. With all generic text stuff removed defining _UNICODE will not affect the compiled code so I don't think an error is necessary. Perhaps a warning that is has no effect.

@aqnuep
Copy link
Collaborator

aqnuep commented Nov 15, 2023

Thank you for your review, @aqnuep. Since building with _UNICODE leads to build errors, and always has, we will not be breaking any user code by removing it. With all generic text stuff removed defining _UNICODE will not affect the compiled code so I don't think an error is necessary. Perhaps a warning that is has no effect.

I'm just unsure whether that's the case with all combinations of the configuration parameters, but if you're confident about this and do not see any risk with removing support for it, we could, and then we should add an explicit configure-time error when _UNICODE is enabled, just in case.

@MarkCallow
Copy link
Collaborator Author

MarkCallow commented Nov 20, 2023

I'm just unsure whether that's the case with all combinations of the configuration parameters, but if you're confident about this and do not see any risk with removing support for it, we could, and then we should add an explicit configure-time error when _UNICODE is enabled, just in case.

I don't understand exactly what your concern is.

There are 3 ways I see that _UNICODE could be enabled:

  1. Somebody edits one of more of the CMake files to add the define during compilation.
  2. Somebody is building with their own build system and defines _UNICODE
  3. The compiler or OS automatically defines _UNICODE.

I don't think we should be concerned about 1 & 2. For 3, my only concern would be an app being compiled with _UNICODE defined that calls one of the *CreateFromNamedFile or *WriteToNamedFile libktx functions but if the app attempts to pass a wchar_t string to one of them a compile error will result. An additional warning or error won't help.

Currently if _UNICODE is defined compilation of the tools and loadtest apps will fail. Gtest-based tests and library compilation will succeed since they contain no generic text macros. This is independent of configuration parameters.

The one thing I think we should fix here, with or without removal of generic text macro use, is to fix main() and InitUTF8CLI() to support Windows UWP. Yes, UWP does support console apps. In main this means

#if defined WINDOWS_UWP
int __cdecl main() {
    initUTF8CLI(__argc, __argv);
#else
int main(int argc, char** argv) {
    initUTF8CLI(argc, argv);
#endif
    ...
}

and in initUTF8CLI

#if defined(_WIN32) || defined(WINDOWS_UWP)
#if defined(WINDOWS_UWP)
inline void InitUTF8CLI(int& argc, wchar_t* argv[]) {
    LPWSTR* wideArgv = argv;
#else
inline void InitUTF8CLI(int& argc, char* argv[]) {
    // Windows does not support UTF-8 argv so we have to manually acquire it
    static std::vector<std::unique_ptr<char[]>> utf8Argv(argc);
    LPWSTR commandLine = GetCommandLineW();
    LPWSTR* wideArgv = CommandLineToArgvW(commandLine, &argc);
#endif
    for (int i = 0; i < argc; ++i) {
        int byteSize = WideCharToMultiByte(CP_UTF8, 0, wideArgv[i], -1, nullptr, 0, nullptr, nullptr);
        utf8Argv[i] = std::make_unique<_TCHAR[]>(byteSize);
        WideCharToMultiByte(CP_UTF8, 0, wideArgv[i], -1, utf8Argv[i].get(), byteSize, nullptr, nullptr);
        argv[i] = utf8Argv[i].get();
    }
#else
    // Nothing to do for other platforms
    (void)argc;
    (void)argv;
#endif
}

Note that currently compile of platform_utils.h will fail if _UNICODE is defined as ktxtools apps will expect argv[] to be a char* but it will be a wchat_t*.

@aqnuep
Copy link
Collaborator

aqnuep commented Nov 20, 2023

My only concern is (2) in your list, e.g. they add_subdirectory the KTX-Software repo folder with _UNICODE defined with a subset of the build flags that may currently happen to build with it.

Then again, it's just a general backward compatibility concern, and considering that such users can always just use an older version of the code considering that this was never officially supported and/or functional in practice.

So if you're fine with removing support for _UNICODE=TRUE altogether, that works for me.

@aqnuep
Copy link
Collaborator

aqnuep commented Nov 20, 2023

Regarding Windows UWP support, that should be a separate effort IMO. We should first confirm whether it really is a must to support _UNICODE for UWP, and if yes, then that kind of reopens this can of worms and would have to look into possible solutions.

@MarkCallow
Copy link
Collaborator Author

So if you're fine with removing support for _UNICODE=TRUE altogether, that works for me.

Okay. I'll add the commit that removes all generic text usage.

Regarding Windows UWP support, that should be a separate effort IMO.

I agree. I got carried away. I researched the UWP situation to see if lack of _UNICODE would be an issue. It isn't. It is only relevant to software using the generic text macros to support unicode. Since we aren't, it doesn't apply. The main point is that the command line arguments are only available to UWP apps as wchar_t*. With that knowledge I thought about the changes needed to support UWP in our code and wrote it up. I realized after the fact that there is an error in my sample code.

@aqnuep
Copy link
Collaborator

aqnuep commented Nov 21, 2023

I agree. I got carried away. I researched the UWP situation to see if lack of _UNICODE would be an issue. It isn't. It is only relevant to software using the generic text macros to support unicode. Since we aren't, it doesn't apply. The main point is that the command line arguments are only available to UWP apps as wchar_t*. With that knowledge I thought about the changes needed to support UWP in our code and wrote it up. I realized after the fact that there is an error in my sample code.

I think we should get a UWP build on CI, if we'd like to tackle this, but in the end the main entry point's signature wouldn't even matter if we'd use the same wchar APIs directly to fetch the UTF-16 parameters manually and convert them to UTF-8.

@MarkCallow MarkCallow merged commit 1c5dc9c into main Nov 22, 2023
15 checks passed
@MarkCallow MarkCallow deleted the utf8_support branch November 22, 2023 13:04
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 21, 2024
Remove all generic text macros.

Fixes KhronosGroup#764.

The *NamedFile* functions now accept full utf8 filenames on any platform. Windows applications must use WideCharToMultiByte to convert unicode filenames before calling these functions.
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 22, 2024
Remove all generic text macros.

Fixes KhronosGroup#764.

The *NamedFile* functions now accept full utf8 filenames on any platform. Windows applications must use WideCharToMultiByte to convert unicode filenames before calling these functions.
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 22, 2024
Remove all generic text macros.

Fixes KhronosGroup#764.

The *NamedFile* functions now accept full utf8 filenames on any platform. Windows applications must use WideCharToMultiByte to convert unicode filenames before calling these functions.
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 22, 2024
Remove all generic text macros.

Fixes KhronosGroup#764.

The *NamedFile* functions now accept full utf8 filenames on any platform. Windows applications must use WideCharToMultiByte to convert unicode filenames before calling these functions.
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 22, 2024
Remove all generic text macros.

Fixes KhronosGroup#764.

The *NamedFile* functions now accept full utf8 filenames on any platform. Windows applications must use WideCharToMultiByte to convert unicode filenames before calling these functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UWP x64 build fails due to char/wchar comparison
2 participants