Implement `til::au16` and `til::u16a` conversion functions & make first use in `WriteConsoleAImpl` #4493

german-one · 2020-02-06T21:54:42Z

Summary of the Pull Request

Implement conversion functions that complement the existing til::u8u16 and til::u16u8 for the use along with other codepages than UTF-8. A state class will take care of partials at the boundaries of DBCS-encoded text.

References

Proposed in #4422 (comment)

PR Checklist

Closes #xxx
CLA signed. If not, go over here and sign the CLA
Tests added/passed
Requires documentation to be updated
I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

These functions have the potential to supersede ConvertToW and ConvertToA by adding partials handling.
NOTE: The astate class supports only a subset of the possible codepages (see the remarks in the code).

In order to see how it works I updated WriteConsoleAImpl accordingly.
I plan to search the code base for more points to apply these functions to, and commit those updates in an upcoming PR.

Validation Steps Performed

long DBCS strings split into short chunks with an odd number of bytes, and outputted to the console using WriteConsoleA
unit test added

…ed code

DHowett-MSFT · 2020-02-06T22:04:48Z

Initial thought: I'm not sure I love that it complicates u8state; I don't think that all u8-only consumers should be forced to have a codepage check and branch because conhost has some A-codepage legacy. I dunno; @miniksa?

miniksa · 2020-02-06T22:28:55Z

Initial thought: I'm not sure I love that it complicates u8state; I don't think that all u8-only consumers should be forced to have a codepage check and branch because conhost has some A-codepage legacy. I dunno; @miniksa Michael Niksa FTE?

I have not read the code yet, but my immediate reaction is "I'd rather have u8state stay u8state and let all legacy codepages have astate so we can walk away from the A theoretically in Terminal."

german-one · 2020-02-06T22:39:11Z

@miniksa @DHowett-MSFT
They share a lot of code that would be redundant otherwise, And if you look at WriteConsoleAImpl, see you can just call til::au16 with any codepage id incl. UTF-8. The astate is always the same and partials are discarded automatically if the codepage changes.

If you still don't like it i will of course divide them into two.

miniksa · 2020-02-06T22:39:56Z

@miniksa @DHowett-MSFT
They share a lot of code that would be redundant otherwise, And if you look at WriteConsoleAImpl, see you can just call til::au16 with any codepage id incl. UTF-8. The astate is always the same and partials are discarded automatically if the codepage changes.

If you still don't like it i will of course divide them into two.

Nah, I'll reconsider when I get to reading this. I have to fix the tests for the project first. Sorry, I've been busy doing that instead of reading PRs.

german-one · 2020-02-06T22:47:08Z

Haha, I was surprised anyway to get immediate reactions from you guys. I'm not in a hurry 🙂

src/inc/til/u8u16convert.h

german-one · 2020-02-07T17:42:50Z

@DHowett-MSFT I tried to make it easier for the compiler to remove the additional branching for u8u16 calls. But maybe this is only wishful thinking.

miniksa · 2020-02-07T18:25:17Z

Haha, I was surprised anyway to get immediate reactions from you guys. I'm not in a hurry 🙂

Yeah, I don't know what got into me. Haha. I have you on my list, but I'm trying to track down the CI test failures now as top priority. So my full review will take a bit longer. Thanks for your patience.

german-one · 2020-02-10T20:41:41Z

This is annoying now. I have any kind of const and non-const types in my test code, used til::at as rvalue and lvalue, turned all Microsoft analysis rules on, and it passed the analysis without any warning. But eventually it fails here.
Are we really not able to avoid additional suppression of warnings if we want to access an array element?

miniksa · 2020-02-11T16:19:30Z

This is annoying now. I have any kind of const and non-const types in my test code, used til::at as rvalue and lvalue, turned all Microsoft analysis rules on, and it passed the analysis without any warning. But eventually it fails here.
Are we really not able to avoid additional suppression of warnings if we want to access an array element?

We're likely to approve a few #pragma warning(suppress:)s in library code because it keeps us from making mistakes elsewhere.

german-one · 2020-02-11T17:02:41Z

We're likely to approve a few #pragma warning(suppress:)s in library code because it keeps us from making mistakes elsewhere.

@miniksa Actually that was not the point. I still think my approach to update til::at was right.
The current implementation limits it to class types that have a size() method. IMO it should work for most to all types that support the subscript operator, such like random-access iterators, C-arrays, and pointers. We should accept signed index types because iterators and pointers support negative offsets. And we should avoid to cast the passed index to ptrdiff_t or size_t because the expected type size might be smaller. E.g. winrt::hstring::size_type is always uin32_t.
However, I'm disenchanted because I spent a few hours with testing to avoid another rejected commit. There have been enough of them in this PR for the same reason already. And I still don't understand why it shouted out that I used pointer arithmetic at the points I called at.

Nevermind. This is all rather off-topic now 🙂

miniksa · 2020-02-11T17:11:57Z

We're likely to approve a few #pragma warning(suppress:)s in library code because it keeps us from making mistakes elsewhere.

@miniksa Actually that was not the point. I still think my approach to update til::at was right.
The current implementation limits it to class types that have a size() method. IMO it should work for most to all types that support the subscript operator, such like random-access iterators, C-arrays, and pointers. We should accept signed index types because iterators and pointers support negative offsets. And we should avoid to cast the passed index to ptrdiff_t or size_t because the expected type size might be smaller. E.g. winrt::hstring::size_type is always uin32_t.
However, I'm disenchanted because I spent a few hours with testing to avoid another rejected commit. There have been enough of them in this PR for the same reason already. And I still don't understand why it shouted out that I used pointer arithmetic at the points I called at.

Nevermind. This is all rather off-topic now 🙂

It's fine. Thanks for your effort either way.

Just keep in mind that our analysis system is our best effort of what we think we need to do for static analysis. You might prove it wrong in ways that we haven't thought about yet!

german-one · 2020-02-12T12:50:05Z

You might prove it wrong in ways that we haven't thought about yet!

I can only proof my local VS wrong which didn't report any warning in the analysis. But of course it's my fault. I should have seen that additional overloads for arrays are necessary to avoid the array decay. I don't want to tinker with that any longer. At least not in this PR. Maybe I start another one if the time is right ...

german-one · 2020-02-17T23:14:14Z

The list of codepages without partials handling is a little shorter now, which also means that the support is better than what we have today.

german-one · 2020-02-17T23:17:15Z

Oh and CP54936 might be of interest to meet the GB 18030 standard.

german-one · 2020-02-21T01:06:58Z

I have to find out what's going wrong in #4673. Probably I'll commit the bug fix here.

…nsoleAImpl

german-one · 2020-03-07T18:10:56Z

@miniksa I'll most likely file an issue that the console API functions need an overhaul. And I'll certainly jump on this. However, whenever I investigate the code I stumble over the same thing. It's the read parameter of WriteConsoleAImpl (also in other functions, respectively), which is related to the lpNumberOfCharsWritten parameter in the actual WriteConsoleA function I guess. There is a semantic difference between 'read' and 'written' and I still don't understand what the actual definition of the value in read is. Could be:

the bytes consumed, which is the same number of bytes that we received
the bytes used for the output, which is the number of bytes we received, plus the bytes taken from the cache, minus the bytes captured in the cache

But how does this go together with lpNumberOfCharsWritten? And what is meant here?

the number of glyphs written
the number of character cells occupied for the output

Can you shed some light on that?

jsoref · 2020-03-25T21:41:55Z

Most of these are from master (you can look at master to see what's pending). It doesn't like au which is I think something you added.

jsoref · 2020-03-25T21:52:35Z

I made #5124 to get master to a green check mark, any items that you see in the output in #4493 (comment) which aren't in jsoref@e80b602 are things that you'll need to add to a file.

That appears to be just:

au
GSM

That's reasonable as both appear to be present in your PR. Going forward, the spell checker should automatically annotate your commits instead of just saying "I found stuff".

I'd personally add both items to the main whitelist.txt file (it would like to be sorted using sort -u -f, but it doesn't really care too much).

german-one · 2020-03-25T21:59:16Z

Great, thanks!

jsoref · 2020-03-25T22:06:05Z

(And that's merged, so now it really should just be those two.)

german-one · 2020-03-25T22:28:16Z

Hmm, seems that I still should have added all annotated words? Sorry for my ignorance.

jsoref · 2020-03-25T22:32:07Z

You'd need to merge master again. You're missing e80b602.

(Or you can have faith that when this is merged, it'd be merged in w/ master which would have that commit and you'd be fine.)

jsoref · 2020-03-25T22:39:50Z

The bot is judging your branch on its own merits, not considering that it might be merged into master. In the normal case, you would be branching off a green check (passing) master and thus any new items will be your own responsibility. Unfortunately your branch started off before the spell checker was merged, and between when the spell checker was written and when it was merged additional items were added. You then merged in master and got the spell checker which was still annoyed at master at that point in time. I then wrote and had merged into master a fix to make master green. And you added things which cleared your contribution. Merging master at this point should thus give you green. (As would being merged into master, assuming everyone is happy w/ your commit.)

Sorry for the bumpy ride. It's hard getting things merged into a fast moving project. But this should be the end of that adventure. Anyone who rebases on top of master now will be past it / as would anyone who merges to master now.

german-one · 2020-03-27T15:56:51Z

Even if the misspellings are not marked to be resolved automatically by the bot (as I've seen in other PRs) that should be okay now since it obviously didn't find anything in the recent commits.

jsoref · 2020-03-27T16:22:51Z

Interesting... yeah, I could have the bot go back and look for previous comments and resolve them (or at least mark them as obsolete). I'll add that to my imaginary backlog.

In testing, I've actually been manually doing that on some of my test PRs, so, yeah... that's definitely worth doing.

(Right now, if it doesn't find any errors, it just won't leave a comment, and you'll get your green check mark. -- Here, you can see the check mark in german-one@ab4e0d7)

german-one · 2020-07-19T10:36:03Z

Way too stale now 😀
This PR doesn't reference an issue which indicates that partials handling in a couple of codepages has never been a real problem in the past.

german-one added 4 commits February 6, 2020 02:01

implement til::au16 and til::u16a conversion functions & update relat…

a8bd2df

…ed code

eliminate redundant code, update comments, rearrange functions

b19b066

add unit test for DBCS partials

37324d5

make first use in WriteConsoleAImpl

fcc9bf1

pass static analysis

2571d23

german-one added 3 commits February 7, 2020 00:14

try harder to pass the SA

c7bdc90

have fun with C array

7f77579

get over the brace barrier?

c07cb48

DHowett-MSFT reviewed Feb 7, 2020

View reviewed changes

src/inc/til/u8u16convert.h Outdated Show resolved Hide resolved

german-one added 3 commits February 7, 2020 01:57

see what til::at makes out of it

20f1d2b

try again to suppress warnings

aff87e0

make the CPINFO a private member and only update it if necessary

519c374

update til::at to make it applicable for arrays

26ac753

reverse til::at update

542bfd2

add partials handling to GB 18030 and GSM 7 bit codepages

fb74524

german-one added 4 commits March 7, 2020 00:13

funny business: try to call the pointer overload explicitely

fe60901

use rvalue reference

0d83b56

remove array overload, use pointer

385b735

make aState a public member of SCREEN_INFORMATION & use it in WriteCo…

11602df

…nsoleAImpl

german-one requested review from DHowett-MSFT and miniksa March 7, 2020 16:45

german-one and others added 4 commits March 8, 2020 20:54

spelling

f2a4b3a

risk on more red X: explicitely exclude arrays in til::at overloads

9008eff

undo, even that didn't work

bd2be6b

Merge branch 'master' into master

79f154e

german-one mentioned this pull request Mar 25, 2020

ci: run spell check in CI, fix remaining issues #4799

Merged

5 tasks

add au and GSM to whitelist.txt

b857c3e

german-one and others added 2 commits March 26, 2020 21:25

Merge remote-tracking branch 'upstream/master'

83443da

keep initialization style consistent

ab4e0d7

Merge remote-tracking branch 'upstream/master'

4983479

german-one closed this Jul 19, 2020

german-one mentioned this pull request Jul 20, 2020

Move u8State from WriteConsoleAImpl to SCREEN_INFORMATION #6982

Open

This was referenced Oct 21, 2020

Some multibyte chars, e.g. 'α', cannot be read by ReadFile() under codepage 932 (Japanese). #7589

Closed

[Epic] COOKED READ and character conversion fun with all our different Convert* functions #7777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `til::au16` and `til::u16a` conversion functions & make first use in `WriteConsoleAImpl` #4493

Implement `til::au16` and `til::u16a` conversion functions & make first use in `WriteConsoleAImpl` #4493

german-one commented Feb 6, 2020 •

edited

Loading

DHowett-MSFT commented Feb 6, 2020

miniksa commented Feb 6, 2020

german-one commented Feb 6, 2020

miniksa commented Feb 6, 2020

german-one commented Feb 6, 2020

german-one commented Feb 7, 2020

miniksa commented Feb 7, 2020

german-one commented Feb 10, 2020 •

edited

Loading

miniksa commented Feb 11, 2020

german-one commented Feb 11, 2020

miniksa commented Feb 11, 2020

german-one commented Feb 12, 2020 •

edited

Loading

german-one commented Feb 17, 2020

german-one commented Feb 17, 2020

german-one commented Feb 21, 2020

german-one commented Mar 7, 2020 •

edited

Loading

jsoref commented Mar 25, 2020

jsoref commented Mar 25, 2020

german-one commented Mar 25, 2020

jsoref commented Mar 25, 2020 •

edited

Loading

german-one commented Mar 25, 2020

jsoref commented Mar 25, 2020

jsoref commented Mar 25, 2020

german-one commented Mar 27, 2020

jsoref commented Mar 27, 2020

german-one commented Jul 19, 2020

Implement til::au16 and til::u16a conversion functions & make first use in WriteConsoleAImpl #4493

Implement til::au16 and til::u16a conversion functions & make first use in WriteConsoleAImpl #4493

Conversation

german-one commented Feb 6, 2020 • edited Loading

Summary of the Pull Request

References

PR Checklist

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

DHowett-MSFT commented Feb 6, 2020

miniksa commented Feb 6, 2020

german-one commented Feb 6, 2020

miniksa commented Feb 6, 2020

german-one commented Feb 6, 2020

german-one commented Feb 7, 2020

miniksa commented Feb 7, 2020

german-one commented Feb 10, 2020 • edited Loading

miniksa commented Feb 11, 2020

german-one commented Feb 11, 2020

miniksa commented Feb 11, 2020

german-one commented Feb 12, 2020 • edited Loading

german-one commented Feb 17, 2020

german-one commented Feb 17, 2020

german-one commented Feb 21, 2020

german-one commented Mar 7, 2020 • edited Loading

jsoref commented Mar 25, 2020

jsoref commented Mar 25, 2020

german-one commented Mar 25, 2020

jsoref commented Mar 25, 2020 • edited Loading

german-one commented Mar 25, 2020

jsoref commented Mar 25, 2020

jsoref commented Mar 25, 2020

german-one commented Mar 27, 2020

jsoref commented Mar 27, 2020

german-one commented Jul 19, 2020

Implement `til::au16` and `til::u16a` conversion functions & make first use in `WriteConsoleAImpl` #4493

Implement `til::au16` and `til::u16a` conversion functions & make first use in `WriteConsoleAImpl` #4493

german-one commented Feb 6, 2020 •

edited

Loading

german-one commented Feb 10, 2020 •

edited

Loading

german-one commented Feb 12, 2020 •

edited

Loading

german-one commented Mar 7, 2020 •

edited

Loading

jsoref commented Mar 25, 2020 •

edited

Loading