-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure path is in native encoding #477
Conversation
Use characters representable in Windows-1252 codepage
FYI: using IMHO, converting |
There is no |
Thanks for pointing out, I was wrong at this point. I meant, it might be also good if we can use the encoding attribute in C++ codes. But, as it seems difficult, just ignore me. Sorry for the noise... |
@hadley Do you have any memory of why xls paths are sent through @jimhester Which approach do you think I should take? I now introduce a 3rd option: We should normalize all paths and do Why? |
I think doing |
I have no recollection of why there is a difference. |
I went with a different option:
|
Fixes #498 Prior to b10a1a8,all xls paths were normalized, but we didn't know exactly why. In b10a1a8, I stopped normalizing xls paths, while solving an unrelated path encoding problem presented by R 3.5. In discussion around #477, @jimhester said: "I think doing enc2native(normalizePath()) on the R side for both xls and xlsx seems the best option." I am now taking this wise advice.
Fixes #498 Prior to b10a1a8,all xls paths were normalized, but we didn't know exactly why. In b10a1a8, I stopped normalizing xls paths, while solving an unrelated path encoding problem presented by R 3.5. In discussion around #477, @jimhester said: "I think doing enc2native(normalizePath()) on the R side for both xls and xlsx seems the best option." I am now taking this wise advice.
Fixes #476 CP1251 char set in file name
Fixes #370 read_excel on Windows fails to open a file from a UTF-8 encoded path containing special characters
This is a new problem seen with R 3.5, we believe as a result of measures taken in response to Bug 17120 :
In a non-UTF-8 locale (basically Windows?),
path.expand()
returns a UTF-8 encoded path, even for non UTF-8 input. readxl pre-processes xls paths vianormalizePath()
, which callspath.expand()
. And then libxls usesfopen()
, which expects the path in the native encoding.We have two choices:
normalizePath()
on xls files. It's not clear to me why we do this. This code goes back to Day 3 of readxl development, in 2015 fc076c4. The history doesn't indicate if there is a reason to usenormalizePath()
. I explored "stop callingnormalizePath()
" in 945178d, which passes tests. If we choose this, I will obviously remove more even code vs. comment it out.normalizePath()
on xls files, but process immediately withenc2native()
. Explored in 8a8c061, which also passes tests.Either way, I think we should consider incorporating the new test. However I decided
skip_on_cran()
might be wise.