Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of UTF-8 for Windows #728

Merged
merged 6 commits into from
Apr 4, 2017
Merged

Add support of UTF-8 for Windows #728

merged 6 commits into from
Apr 4, 2017

Conversation

SirLynix
Copy link
Contributor

@SirLynix SirLynix commented Apr 2, 2017

Hi there,

We're using Premake at my job and we had some troubles when trying to copy files from/to locations containing unicode characters, the reason for this is that Premake uses the ASCII version of the Windows API.

I made a fix by adding two helper functions (utf8_fromwide and utf8_towide) and using them with the Unicode Windows API everywhere I could (I just didn't do it for the os.getpass function, making the Windows console working with unicode is not an easy thing to do).

Anyway, tell me if I need to fix some stuff, as I'm not sure I did everything in the "right" (premake-team) way 😄

@@ -42,6 +44,10 @@ int os_isdir(lua_State* L)
{
lua_pushboolean(L, 0);
}

#ifdef _WIN32
lua_pop(L, -2); /* pop wide string */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, should be lua_remove, can't fix it right now

lua_pushstring(L, unicode_str);
free(unicode_str);

return unicode_str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you just free'd that pointer.... should really not return that, as that will be a "use after free" access violation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, in my original design I wasn't returning anything. Will change that.

#include "stdlib.h"

#ifdef PLATFORM_WINDOWS
const char* utf8_fromwide(lua_State* L, const wchar_t* wstr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a size_t utf8_fromwide(char* destination, size_t destSize, const wchar_t* wstr);

which would not need to allocate memory, would allow for the call to WideCharToMultiByte to simply be called once. And wouldn't need the LuaState... pushing the string to the state would be up to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though using Lua GC to manage memory was a good idea, but I will make the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh, actually a wrapper function is useless if all it does is calling WideCharToMultiByte, since we can do that in places where we are calling this function (since we're already checking for the Windows part).

What if I make something like that:

const char* utf8_fromwide(const wchar_t* wstr)
 {
 	int size_required = WideCharToMultiByte(CP_UTF8, 0, wstr, -1, NULL, 0, NULL, NULL);
 
 	char* unicode_str = (char*) malloc(size_required * sizeof(char));
 	WideCharToMultiByte(CP_UTF8, 0, wstr, -1, unicode_str, size_required, NULL, NULL);
 
 	return unicode_str;
 }

be a better solution? Of course the caller has to free the buffer then (which I dislike).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that that is not preferable... In general I was just hoping that a malloc for this isn't something we need... mostly because these match functions are used ALOT, and so have quite a bit of performance impact... on our project we're already hitting 2GB of memory use.

if that removes the need for this method entirely, then I'm OK with just calling it in place to be entirely honest, and simply use stack for this...

@tvandijck
Copy link
Contributor

Other then the use after free, I have no real objections... I think API wise relying less on lua and more on just C would be nice, but it's no breaking point for me.. we can always refactor that further if we wish for that..

@SirLynix
Copy link
Contributor Author

SirLynix commented Apr 3, 2017 via email

@fun4jimmy
Copy link

Some of this stuff was discussed as part of #629. I never got round to doing anything that was proposed but I think this might fix that issue as well.

@SirLynix
Copy link
Contributor Author

SirLynix commented Apr 4, 2017

I remade this using only stack memory (and I'm wondering why I didn't do that in the first place..).
In case any conversion fails, it will trigger a Lua error, as discussed in #629.

I'm wondering, should I apply the same changes to the integrated Lua library to support unicode operations? (Talking about io library)

@tvandijck
Copy link
Contributor

Thank you very much, that looks pretty clean to me... ready to merge as far as I'm concerned.

I'm wondering, should I apply the same changes to the integrated Lua library to support unicode operations? (Talking about io library)

We could, but lets make that a separate pull request.. I'm not too fond of actually changing lua itself to be honest, since that makes moving between versions of Lua really cumbersome.. So I'd rather see extensions to the io library to support this... like an io.open_utf8(...) or something of that kind, I don't know how that would look really.. so I don't really have a very well thought out opinion on this, but it should be fairly straightforward to 'override' the standard io library with methods that support utf8 instead.

I would imagine that really all you need to override is the io.open method? all the other methods work on stream handles (FILE*, etc), so those should not be affected.

@SirLynix
Copy link
Contributor Author

SirLynix commented Apr 4, 2017

No problem with making another pull request.
I can override the files functions of io without touching the Lua library, but I think I'll have to override more than one function (since Windows Unicode stuff doesn't work with FILE* handles AFAIK, I'll check that).

I don't think making a separate Unicode open function will work in this case because of the io metatable (for calls like handle:read()), so I'll probably override, except if I find a way to make Unicode works with FILE* handles.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants