Performance throughout the game overview/rating #388

sektorate · 2021-03-10T18:45:51Z

Hello, many thanks for this amazing trainer! Feature suggestion: I imagine each move being placed in categories (e.g. blunder, mistake, inaccuracy, okay, excellent, best move) based on the percentage change in winrate it effects. The percentage ranges for these categories could be user definable.
An overall "accuracy" score out of 100 could then be generated for each player based on the percentage of their moves the engine rates as best. These ideas are inspired by the analysis features of chess.com, that give an overall insight into the players' performance in a game; this would supplement analysis of each individual move.
Thanks again for your work, I'd love to hear your thoughts.

sanderland · 2021-03-11T08:47:24Z

Some discussion of this in #340 - it's a bit strange to suggest adding categories when they already exist though :)

sektorate · 2021-03-11T16:55:00Z

You're right, of course!

sektorate · 2021-04-13T01:01:39Z

On first impression, 1d layout seems more easily readable than 2d. Some questions to aid iteration:

-Is it possible to create an overall performance score? e.g. if every move lost no points/was the AI top move this gives a 100% accuracy, if every move lost more than 12 points/was the AI worst move gives 0%. This would provide extra insights: "did I win because I played well, or because my opponent played terribly?", "I lost even though my performance was good so this loss isn't so bad".

-Is it possible/would it be useful to combine the points lost/AI rank into one overall metric? An overall score of each move would be more concise.

Overall it would be great to give the user control over the level of granularity, i.e. whether 2d or 1d, which stats are shown; whether the moves are shown graded separately by points lost/AI top move or whether these are combined into one metric etc.

I hope these thoughts help, I love this feature already and am excited to use it.

Eric-Wainwright · 2021-04-13T21:44:40Z

The Chess.com site really did a nice job with their game report. However, they probably have 100+ developers working for them. Here's a simple version of their accuracy report. It would allow users to customize the category names in the Teaching/Analysis settings. Instead of a separate Game Report, this could also be just another tab, unless you're planning to add additional information in the future.

The accuracy stat would be a weighted average of the categories. Ideally, this accuracy information would update as users moved through the game tree, not just at the end of the game.

Note that all chess apps and sites (that I've seen) use strictly board evaluations for computing mistakes, i.e. top move - actual move. There's no reporting done on how much a move improves the prior position, i.e. actual move - prior move. We've discussed this before.

P.S. I don't know how useful the 2D performance table would be. It seems more like a curiosity rather than helpful information. But, I guess it might show how well your intuition (policy) is working versus your calculation (tree search). The accuracy information seems more helpful.

sanderland · 2021-04-14T15:57:48Z

movecomplexity = sum(policy over candididates) - sum(policy over candididates with point loss <= 0.5) i.e. what policy % is bad moves the ai thought were worth considering.
complexity = average movecomplexity
weighted_loss = point loss weighted by min(movecomplexity ,0.25) for dark green moves, or 0.25 for mistakes, i.e. trying to downweigh obvious moves
accuracy = 100 * 0.75**weighted_loss

formulas aren't great yet. but I like the layout and fields
midgame is just move 50-150, which is also not perfect...

sektorate · 2021-04-14T18:32:28Z

^ this looks great! being able to focus on one stage of the game is an excellent idea.

Eric-Wainwright · 2021-04-14T19:31:13Z

Sander, I like it! Much improved over my version. :-)

I'm working on trying to understand your formulas. What is the cutoff point for the AI candidates, or is this determined by max_visits? Wouldn't higher visits skew the complexity rate upwards (more poor moves searched?)

The accuracy formula seems reasonable. I want to research to see how some of the chess apps do it.

I think the colors on the Teaching/Analysis settings should be re-ordered to match this for consistency.

Good stuff!

sanderland · 2021-04-14T19:59:22Z

it's using all candidate moves returned by the ai, this is of course influenced by visits, root noise, etc.
the idea is that the sum of policy priors over low-pointloss candidates represents 'obvious good moves' and the remainder 'nice looking but not good moves' and your move should be considered more important if the latter is big.

I am not convinced this is the best approach, but it's the first thing that kind of did something reasonable.

sanderland · 2021-04-14T20:06:28Z

I think the colors on the Teaching/Analysis settings should be re-ordered to match this for consistency.

c2ca285

Eric-Wainwright · 2021-04-14T21:43:45Z

I did some quick research on Go and Chess apps, and the only one that seems to calculate a game accuracy is Chess.com. They call theirs "Computer Aggregated Precision Score" or CAPS. It's a proprietary model that incorporates game mistakes and other "pattern of strength" algorithms. In other words, it is a black box.

There's some controversy in the forums about how well it works. Apparently, the statistic can vary widely over games, and it does not give a great predictive power into the rank of a player.

Links here and here.

As for complexity, I think your idea has merit. I spent some time studying L&D problems earlier trying to understand why some were more complex than others. The number of reasonable-looking branches in the search tree has mostly to do with it. Whether you can tease this information out of differences between policy priors and search results will be interesting.

This feature may take lots of thought and testing. I vote to roll out something simple and get feedback on it. Maybe create a beta version that we can do some testing on.

sanderland · 2021-04-15T07:48:35Z

This feature may take lots of thought and testing. I vote to roll out something simple and get feedback on it. Maybe create a beta version that we can do some testing on.

I generally don't hide things, it's in branch and anyone can test it. Releasing is a lot of work though, and the last time I released for feedback I got zero comments, soooo

sanderland · 2021-04-19T20:58:44Z

testing another weighting in 3adde54
this time based on the policy-weighted point loss

xiaoyifang · 2021-04-22T07:49:43Z

this part can be replace with a chart like this. more intuitive

sanderland · 2021-04-29T19:59:43Z

want to test this a bit more properly. if someone could help collect a nice test set that would be appreciated:

A variety of around 50-100 sgf games from 15k to 7d. 19x19, at least 200-250 moves played. Should have the BR and WR fields set (as in e.g. ogs)

Eric-Wainwright · 2021-04-29T22:22:55Z

I'll commit to scraping 50 games from OGS spread across 15k to 7d. Does it matter if they're even or handicap?

sanderland · 2021-04-30T06:41:05Z

Shouldn't matter. The idea is to see the numbers by player rank more systematically

Eric-Wainwright · 2021-05-01T19:15:06Z

Here are 30 OGS games ranging from 9k to 4d.

10 OGS games 1d-4d.zip
10 OGS games 1k-4k.zip
10 OGS games 5k-9k.zip

sanderland · 2021-05-02T17:56:56Z

code as in dde545b (weighted by complexity ~ expected point loss if playing candidates with p=policy)
40b 7.9G @ 500 visits

data: https://pastebin.com/k44TYjY9

accuracy seems ok, a bit weird on 2 outliers. complexity is a bit all over the place, may just remove it.

Eric-Wainwright · 2021-05-03T02:55:10Z

20 more OGS games

10 OGS games 4d-7d.zip
10 OGS games 10k-15k.zip

sanderland · 2021-05-03T16:08:44Z

best result for now, as of 6a71266
will remove complexity as a stat as it's more about early/mid/endgame

data: https://pastebin.com/raw/GsURUGSa

keeping this unless there's any bright ideas

Eric-Wainwright · 2021-05-03T17:26:53Z

The accuracy stat r^2 is looking pretty good. The complexity stat will need some more thinking.

'ai approved' stat (move in top 5 and pt loss <0.5, could have a better name)

Agree that 'ai approved' is a bit awkward. If you had category labels for pt loss, you could use the label. Not sure you need to limit it to top 5, but just the pt loss range seems good enough.

Let me know if you need more games for testing.

sanderland · 2021-05-03T18:05:09Z

The accuracy stat r^2 is looking pretty good. The complexity stat will need some more thinking.

Will probably just kill the complexity stat.

'ai approved' stat (move in top 5 and pt loss <0.5, could have a better name)

Agree that 'ai approved' is a bit awkward. If you had category labels for pt loss, you could use the label. Not sure you need to limit it to top 5, but just the pt loss range seems good enough.

The idea is that top 1 is very network dependent, and no limit is very visits dependent, this should be less so (as seen by katago selfplay games ending up at near 100%)

Let me know if you need more games for testing.

I think this is a very nice data set. If you can figure out what's up with the 1-2 outliers that might help though!

Eric-Wainwright · 2021-05-04T03:09:47Z

In the first outlier game (sunny25 vs lyq), both players missed the killing/saving of a group for many moves (163 - 202). This resulted in 20+ point loss swings for a significant portion of the game.

33352666-265-sunny25-lyq.zip

In the second outlier game (silent1 vs sunny25), both players missed a severe cut for many moves (49 - 126). Then, they missed a double sente endgame sequence for many moves (131 - 187).

33352369-209-silent1-sunny25.zip

sanderland · 2021-05-09T12:26:31Z

Thoughts on comparing the blunder classes to the other player compared to all moves?
it's already proving confusing, may just remove the bars there.

sanderland · 2021-05-09T12:27:32Z

Also if someone has/can make a texture that helps make the bars look like bars, that could be nice! (as in transparency mask-only texture)

Dontbtme · 2021-05-10T08:14:06Z

Here's a mockup I made of what I think would be the most useful to review my games.
It shows everything from each player's point of view, so no matter how well or bad White played, if Black played 104 green moves out of his own total of 152 moves, then his green moves percentage bar should be filled about 68%.
Now, if instead we want to share stats between players for some reason (meaning 'Black played 80% of the red mistakes played in the game, which means White played only 20% of them'), then that could be an option toggled in the Teaching/Analysis Settings. That way we could get the best of both worlds.
What do you think?

sente361 · 2021-05-10T09:28:04Z

Here is a suggestion for a more natural graphic for the Points Lost area:

Dontbtme · 2021-05-10T13:25:22Z

Even though a pyramid looks more natural from a graphical standpoint, the base may not be always the widest (for a double digit kyu maybe?). And I don't know, I feel we usually see best moves as "top" moves, visually speaking, maybe?
In any case, I came up we another mockup that would combine Points lost seen from 1 player only's perspective as well as from both players'.
How does it look?

sektorate · 2021-05-10T19:26:33Z

This version of the panel seems a little cluttered. I'm not sure how useful seeing the proportion of points lost between each player is either, I personally would be interested mostly in the amount/% of my moves that fall into each category. If a user is that interested in the proportion between each player, they can compare the numbers.
Having best moves shown at the top makes sense to me also.

Eric-Wainwright · 2021-05-10T19:58:35Z

This version of the panel seems a little cluttered. I'm not sure how useful seeing the proportion of points lost between each player is either.
Having best moves shown at the top makes sense to me also.

^^^ Agree.

I think Sander has the cleanest layout. You don’t want to make it more complicated than this. The X-axis scale for each item can usually be inferred from the label, which is good.

I think showing the bars for all move classes is fine to achieve consistency. Except, I don’t understand the X-axis scale used in Sander’s version (what are the blunder lengths suppose to be?). I would think this should be the % of time you played that move class within the game.

Dontbtme · 2021-05-10T20:25:47Z

I'm also only interested in the % of moves that fall into each category (separately for each player).
I only added the % sharing both players' data as a whole because that was what Sander went for initially, or so it seemed to me.
Anyway, you'll see below my last mockup. This one is the closest to what I would go for if it was up to me.
Your thoughts?

sektorate · 2021-05-10T20:44:00Z

^ This looks great to me.

Eric-Wainwright · 2021-05-10T20:50:53Z

Are you arguing mostly about content or format? If content, then I agree that showing the % of moves in each category is best. The format could either be as a pie chart, a stacked chart (like yours), or a bar chart (like Sander's). (Although, I'm not sure what Sander was trying to show in his mockup :-)

As for format, a simple bar chart like Sander's is good enough for me, and it matches the style of the top section. But, I'd be Ok with either.

sanderland · 2021-05-11T18:45:42Z

I like the stacked chart, but it's a bit complicated to make, particularly hiding text dynamically.
Here's back to % by category and colourful.

As you see, 1 becomes basically 0 and many games just end up being greeeeeeeeeeeen.

sanderland · 2021-05-11T18:47:27Z

lower level game

Eric-Wainwright · 2021-05-11T19:20:46Z

I like the latest iteration and certainly would be very happy with it. Although, I don't know why you wouldn't match the same style of bars in both sections:

sente361 · 2021-05-11T23:27:44Z

I prefer Sander's latest iteration; a microsecond glance - "I got mostly green - yippee!" (Coloring the bars makes the data leap out at you.) In the same vein, I would also color the bars in the Key Statistics section; a nice blue would look good.

sente361 · 2021-05-11T23:45:59Z

In my opinion it is simpler and easier to understand if the central "key" of the Points Lost section is presented in this form:

xiaoyifang · 2021-05-12T01:20:44Z

i think ,a better value can have a significant background color.
such as ,
Mean point Loss, the less the better
AI Top 5 ,the more the better.

at the same while ,avoid using too much color.

Dontbtme · 2021-05-12T06:28:57Z

In Sander's version I think the middle part should be less prominent and somehow detached from the bars left and right, cause to me it kind of seemed like every color was played a lot no matter what.
I tried two mockups. They're not great but they'll show what I mean.
The first one shows the most data.

The second on shows the same amount of data than in Sander's

In any case I don't think the Points Lost middle part should be as wide as the Key statistic one, as in Sander's the colors really seemed to have been played a lot even when say no red or purple mistakes where played

sanderland · 2021-05-12T08:24:35Z

maybe better blue.

@Dontbtme sure that looks better, but keep in mind the whole thing is a single grid layout of labels, and the line is the bottom of the header cell. Give it a try and you'll see how difficult simple things can be in kivy ;)

Dontbtme · 2021-05-12T09:56:25Z

Still, as is, any bar looks big is what I meant. Can't you limit the colors in the middle around >0.5 etc. without changing the grid? Since colors in the left and right columns are only filling them depending on the %, why colors in the midle column have to fill it entirely? Colors in the middle are what's popping up the most in your picture, when we should be focusing on colors from the player's bars. I would even rather not having any colors in the middle colomn if that's too complicated, that way the mistakes's data colors would appear clearly and brightly on each players's column
Anyway, that's only my two cents (although I'm not sure about the dark blue you switched to in the key statistics either, since the all around UI is already some kind of dark blue, but I digress)
But anyway, if the above isn't convincing, then maybe that's just a matter of taste, in which case just ignore it and let's move on ^_^

xiaoyifang · 2021-05-12T13:52:54Z

seems wrong value >100%

sanderland · 2021-05-12T14:06:37Z

seems wrong value >100%
I think this bug was fixed and you have an old commit

Eric-Wainwright · 2021-05-12T16:36:18Z

For statistics and the point losses, you could report both the percentage and absolute count where needed together, instead of having separate columns, or not reporting both values. I don't think the order matters. Also, there would then be no need to put "%" in the labels.

xiaoyifang · 2021-05-13T07:53:35Z

as already there are many suggestions now :-) .
I recommend this

in the upper area ,the better value have a significant background color (green etc).
the lower part 's ,the player's bar can be aligned together.

the advantage of this is ,just one glimpse ,you know who does better.and which value is higher .
without the need to read the number.

sanderland · 2021-06-28T12:24:36Z

Closing this as it's soon released, but feel free to continue discussion.

sanderland added the enhancement New feature or request label Mar 11, 2021

sanderland changed the title ~~Enhancement: Accuracy and Move Categories~~ Performance throughout the game overview/rating Mar 11, 2021

This comment has been minimized.

Sign in to view

sanderland added the 1.9 label May 3, 2021

This comment has been minimized.

Sign in to view

sanderland closed this as completed Jun 28, 2021

kaorahi mentioned this issue Nov 18, 2023

Show counts of intuition misses kaorahi/lizzie#7

Open

Performance throughout the game overview/rating #388

Performance throughout the game overview/rating #388

Comments

sektorate commented Mar 10, 2021

sanderland commented Mar 11, 2021

sektorate commented Mar 11, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

sektorate commented Apr 13, 2021

Eric-Wainwright commented Apr 13, 2021

sanderland commented Apr 14, 2021 • edited Loading

sektorate commented Apr 14, 2021

Eric-Wainwright commented Apr 14, 2021

sanderland commented Apr 14, 2021 • edited Loading

sanderland commented Apr 14, 2021

Eric-Wainwright commented Apr 14, 2021

sanderland commented Apr 15, 2021

sanderland commented Apr 19, 2021

xiaoyifang commented Apr 22, 2021

sanderland commented Apr 29, 2021

Eric-Wainwright commented Apr 29, 2021

sanderland commented Apr 30, 2021

Eric-Wainwright commented May 1, 2021

sanderland commented May 2, 2021 • edited Loading

Eric-Wainwright commented May 3, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

sanderland commented May 3, 2021 • edited Loading

Eric-Wainwright commented May 3, 2021

sanderland commented May 3, 2021

Eric-Wainwright commented May 4, 2021

sanderland commented May 9, 2021 • edited Loading

sanderland commented May 9, 2021

Dontbtme commented May 10, 2021 • edited Loading

sente361 commented May 10, 2021

Dontbtme commented May 10, 2021 • edited Loading

sektorate commented May 10, 2021

Eric-Wainwright commented May 10, 2021

Dontbtme commented May 10, 2021 • edited Loading

sektorate commented May 10, 2021

Eric-Wainwright commented May 10, 2021 • edited Loading

sanderland commented May 11, 2021

sanderland commented May 11, 2021

Eric-Wainwright commented May 11, 2021

sente361 commented May 11, 2021 • edited Loading

sente361 commented May 11, 2021

xiaoyifang commented May 12, 2021 • edited Loading

Dontbtme commented May 12, 2021 • edited Loading

sanderland commented May 12, 2021

Dontbtme commented May 12, 2021 • edited Loading

xiaoyifang commented May 12, 2021

sanderland commented May 12, 2021

Eric-Wainwright commented May 12, 2021 • edited Loading

xiaoyifang commented May 13, 2021 • edited Loading

sanderland commented Jun 28, 2021

sanderland commented Apr 14, 2021 •

edited

Loading

sanderland commented Apr 14, 2021 •

edited

Loading

sanderland commented May 2, 2021 •

edited

Loading

sanderland commented May 3, 2021 •

edited

Loading

sanderland commented May 9, 2021 •

edited

Loading

Dontbtme commented May 10, 2021 •

edited

Loading

Dontbtme commented May 10, 2021 •

edited

Loading

Dontbtme commented May 10, 2021 •

edited

Loading

Eric-Wainwright commented May 10, 2021 •

edited

Loading

sente361 commented May 11, 2021 •

edited

Loading

xiaoyifang commented May 12, 2021 •

edited

Loading

Dontbtme commented May 12, 2021 •

edited

Loading

Dontbtme commented May 12, 2021 •

edited

Loading

Eric-Wainwright commented May 12, 2021 •

edited

Loading

xiaoyifang commented May 13, 2021 •

edited

Loading