Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance throughout the game overview/rating #388

Closed
sektorate opened this issue Mar 10, 2021 · 54 comments
Closed

Performance throughout the game overview/rating #388

sektorate opened this issue Mar 10, 2021 · 54 comments
Labels
1.9 enhancement New feature or request

Comments

@sektorate
Copy link

Hello, many thanks for this amazing trainer! Feature suggestion: I imagine each move being placed in categories (e.g. blunder, mistake, inaccuracy, okay, excellent, best move) based on the percentage change in winrate it effects. The percentage ranges for these categories could be user definable.
An overall "accuracy" score out of 100 could then be generated for each player based on the percentage of their moves the engine rates as best. These ideas are inspired by the analysis features of chess.com, that give an overall insight into the players' performance in a game; this would supplement analysis of each individual move.
Thanks again for your work, I'd love to hear your thoughts.
Screenshot (2)

@sanderland sanderland added the enhancement New feature or request label Mar 11, 2021
@sanderland sanderland changed the title Enhancement: Accuracy and Move Categories Performance throughout the game overview/rating Mar 11, 2021
@sanderland
Copy link
Owner

Some discussion of this in #340 - it's a bit strange to suggest adding categories when they already exist though :)

@sektorate
Copy link
Author

You're right, of course!

@sanderland

This comment has been minimized.

@sanderland

This comment has been minimized.

@sanderland

This comment has been minimized.

@sektorate
Copy link
Author

On first impression, 1d layout seems more easily readable than 2d. Some questions to aid iteration:

-Is it possible to create an overall performance score? e.g. if every move lost no points/was the AI top move this gives a 100% accuracy, if every move lost more than 12 points/was the AI worst move gives 0%. This would provide extra insights: "did I win because I played well, or because my opponent played terribly?", "I lost even though my performance was good so this loss isn't so bad".

-Is it possible/would it be useful to combine the points lost/AI rank into one overall metric? An overall score of each move would be more concise.

Overall it would be great to give the user control over the level of granularity, i.e. whether 2d or 1d, which stats are shown; whether the moves are shown graded separately by points lost/AI top move or whether these are combined into one metric etc.

I hope these thoughts help, I love this feature already and am excited to use it.

@Eric-Wainwright
Copy link

The Chess.com site really did a nice job with their game report. However, they probably have 100+ developers working for them. Here's a simple version of their accuracy report. It would allow users to customize the category names in the Teaching/Analysis settings. Instead of a separate Game Report, this could also be just another tab, unless you're planning to add additional information in the future.

The accuracy stat would be a weighted average of the categories. Ideally, this accuracy information would update as users moved through the game tree, not just at the end of the game.

Note that all chess apps and sites (that I've seen) use strictly board evaluations for computing mistakes, i.e. top move - actual move. There's no reporting done on how much a move improves the prior position, i.e. actual move - prior move. We've discussed this before.

P.S. I don't know how useful the 2D performance table would be. It seems more like a curiosity rather than helpful information. But, I guess it might show how well your intuition (policy) is working versus your calculation (tree search). The accuracy information seems more helpful.

image

image

@sanderland
Copy link
Owner

sanderland commented Apr 14, 2021

image

movecomplexity = sum(policy over candididates) - sum(policy over candididates with point loss <= 0.5) i.e. what policy % is bad moves the ai thought were worth considering.
complexity = average movecomplexity
weighted_loss = point loss weighted by min(movecomplexity ,0.25) for dark green moves, or 0.25 for mistakes, i.e. trying to downweigh obvious moves
accuracy = 100 * 0.75**weighted_loss

formulas aren't great yet. but I like the layout and fields
midgame is just move 50-150, which is also not perfect...

@sektorate
Copy link
Author

^ this looks great! being able to focus on one stage of the game is an excellent idea.

@Eric-Wainwright
Copy link

Sander, I like it! Much improved over my version. :-)

I'm working on trying to understand your formulas. What is the cutoff point for the AI candidates, or is this determined by max_visits? Wouldn't higher visits skew the complexity rate upwards (more poor moves searched?)

The accuracy formula seems reasonable. I want to research to see how some of the chess apps do it.

I think the colors on the Teaching/Analysis settings should be re-ordered to match this for consistency.

Good stuff!

@sanderland
Copy link
Owner

sanderland commented Apr 14, 2021

it's using all candidate moves returned by the ai, this is of course influenced by visits, root noise, etc.
the idea is that the sum of policy priors over low-pointloss candidates represents 'obvious good moves' and the remainder 'nice looking but not good moves' and your move should be considered more important if the latter is big.

I am not convinced this is the best approach, but it's the first thing that kind of did something reasonable.

@sanderland
Copy link
Owner

I think the colors on the Teaching/Analysis settings should be re-ordered to match this for consistency.

c2ca285

@Eric-Wainwright
Copy link

I did some quick research on Go and Chess apps, and the only one that seems to calculate a game accuracy is Chess.com. They call theirs "Computer Aggregated Precision Score" or CAPS. It's a proprietary model that incorporates game mistakes and other "pattern of strength" algorithms. In other words, it is a black box.

There's some controversy in the forums about how well it works. Apparently, the statistic can vary widely over games, and it does not give a great predictive power into the rank of a player.

Links here and here.

As for complexity, I think your idea has merit. I spent some time studying L&D problems earlier trying to understand why some were more complex than others. The number of reasonable-looking branches in the search tree has mostly to do with it. Whether you can tease this information out of differences between policy priors and search results will be interesting.

This feature may take lots of thought and testing. I vote to roll out something simple and get feedback on it. Maybe create a beta version that we can do some testing on.

@sanderland
Copy link
Owner

This feature may take lots of thought and testing. I vote to roll out something simple and get feedback on it. Maybe create a beta version that we can do some testing on.

I generally don't hide things, it's in branch and anyone can test it. Releasing is a lot of work though, and the last time I released for feedback I got zero comments, soooo

@sanderland
Copy link
Owner

testing another weighting in 3adde54
this time based on the policy-weighted point loss

@xiaoyifang
Copy link
Contributor

image
this part can be replace with a chart like this. more intuitive
image

@sanderland
Copy link
Owner

want to test this a bit more properly. if someone could help collect a nice test set that would be appreciated:

A variety of around 50-100 sgf games from 15k to 7d. 19x19, at least 200-250 moves played. Should have the BR and WR fields set (as in e.g. ogs)

@Eric-Wainwright
Copy link

I'll commit to scraping 50 games from OGS spread across 15k to 7d. Does it matter if they're even or handicap?

@sanderland
Copy link
Owner

Shouldn't matter. The idea is to see the numbers by player rank more systematically

@Eric-Wainwright
Copy link

Here are 30 OGS games ranging from 9k to 4d.

10 OGS games 1d-4d.zip
10 OGS games 1k-4k.zip
10 OGS games 5k-9k.zip

@sanderland
Copy link
Owner

sanderland commented May 2, 2021

code as in dde545b (weighted by complexity ~ expected point loss if playing candidates with p=policy)
40b 7.9G @ 500 visits

image

data: https://pastebin.com/k44TYjY9

accuracy seems ok, a bit weird on 2 outliers. complexity is a bit all over the place, may just remove it.

@Eric-Wainwright
Copy link

20 more OGS games

10 OGS games 4d-7d.zip
10 OGS games 10k-15k.zip

@sanderland

This comment has been minimized.

@sanderland

This comment has been minimized.

@sanderland

This comment has been minimized.

@sanderland sanderland added the 1.9 label May 3, 2021
@sanderland

This comment has been minimized.

@sanderland
Copy link
Owner

sanderland commented May 3, 2021

best result for now, as of 6a71266
will remove complexity as a stat as it's more about early/mid/endgame

image
data: https://pastebin.com/raw/GsURUGSa

keeping this unless there's any bright ideas

@Eric-Wainwright
Copy link

The accuracy stat r^2 is looking pretty good. The complexity stat will need some more thinking.

'ai approved' stat (move in top 5 and pt loss <0.5, could have a better name)

Agree that 'ai approved' is a bit awkward. If you had category labels for pt loss, you could use the label. Not sure you need to limit it to top 5, but just the pt loss range seems good enough.

Let me know if you need more games for testing.

@sanderland
Copy link
Owner

The accuracy stat r^2 is looking pretty good. The complexity stat will need some more thinking.

Will probably just kill the complexity stat.

'ai approved' stat (move in top 5 and pt loss <0.5, could have a better name)

Agree that 'ai approved' is a bit awkward. If you had category labels for pt loss, you could use the label. Not sure you need to limit it to top 5, but just the pt loss range seems good enough.

The idea is that top 1 is very network dependent, and no limit is very visits dependent, this should be less so (as seen by katago selfplay games ending up at near 100%)

Let me know if you need more games for testing.

I think this is a very nice data set. If you can figure out what's up with the 1-2 outliers that might help though!

@Eric-Wainwright
Copy link

In the first outlier game (sunny25 vs lyq), both players missed the killing/saving of a group for many moves (163 - 202). This resulted in 20+ point loss swings for a significant portion of the game.

33352666-265-sunny25-lyq.zip

In the second outlier game (silent1 vs sunny25), both players missed a severe cut for many moves (49 - 126). Then, they missed a double sente endgame sequence for many moves (131 - 187).

33352369-209-silent1-sunny25.zip

@sanderland
Copy link
Owner

sanderland commented May 9, 2021

image

Thoughts on comparing the blunder classes to the other player compared to all moves?
it's already proving confusing, may just remove the bars there.

@sanderland
Copy link
Owner

Also if someone has/can make a texture that helps make the bars look like bars, that could be nice! (as in transparency mask-only texture)

@Dontbtme
Copy link
Contributor

Dontbtme commented May 10, 2021

Here's a mockup I made of what I think would be the most useful to review my games.
It shows everything from each player's point of view, so no matter how well or bad White played, if Black played 104 green moves out of his own total of 152 moves, then his green moves percentage bar should be filled about 68%.
Now, if instead we want to share stats between players for some reason (meaning 'Black played 80% of the red mistakes played in the game, which means White played only 20% of them'), then that could be an option toggled in the Teaching/Analysis Settings. That way we could get the best of both worlds.
What do you think?
Report-Mockup-Proposal

@sente361
Copy link

Here is a suggestion for a more natural graphic for the Points Lost area:

image

@Dontbtme
Copy link
Contributor

Dontbtme commented May 10, 2021

Even though a pyramid looks more natural from a graphical standpoint, the base may not be always the widest (for a double digit kyu maybe?). And I don't know, I feel we usually see best moves as "top" moves, visually speaking, maybe?
In any case, I came up we another mockup that would combine Points lost seen from 1 player only's perspective as well as from both players'.
How does it look?
Report-Mockup-Proposal04

@sektorate
Copy link
Author

This version of the panel seems a little cluttered. I'm not sure how useful seeing the proportion of points lost between each player is either, I personally would be interested mostly in the amount/% of my moves that fall into each category. If a user is that interested in the proportion between each player, they can compare the numbers.
Having best moves shown at the top makes sense to me also.

@Eric-Wainwright
Copy link

This version of the panel seems a little cluttered. I'm not sure how useful seeing the proportion of points lost between each player is either.
Having best moves shown at the top makes sense to me also.

^^^ Agree.

I think Sander has the cleanest layout. You don’t want to make it more complicated than this. The X-axis scale for each item can usually be inferred from the label, which is good.

I think showing the bars for all move classes is fine to achieve consistency. Except, I don’t understand the X-axis scale used in Sander’s version (what are the blunder lengths suppose to be?). I would think this should be the % of time you played that move class within the game.

@Dontbtme
Copy link
Contributor

Dontbtme commented May 10, 2021

I'm also only interested in the % of moves that fall into each category (separately for each player).
I only added the % sharing both players' data as a whole because that was what Sander went for initially, or so it seemed to me.
Anyway, you'll see below my last mockup. This one is the closest to what I would go for if it was up to me.
Your thoughts?
Report-Mockup-Proposal05

@sektorate
Copy link
Author

^ This looks great to me.

@Eric-Wainwright
Copy link

Eric-Wainwright commented May 10, 2021

Are you arguing mostly about content or format? If content, then I agree that showing the % of moves in each category is best. The format could either be as a pie chart, a stacked chart (like yours), or a bar chart (like Sander's). (Although, I'm not sure what Sander was trying to show in his mockup :-)

As for format, a simple bar chart like Sander's is good enough for me, and it matches the style of the top section. But, I'd be Ok with either.

@sanderland
Copy link
Owner

I like the stacked chart, but it's a bit complicated to make, particularly hiding text dynamically.
Here's back to % by category and colourful.
image

As you see, 1 becomes basically 0 and many games just end up being greeeeeeeeeeeen.

@sanderland
Copy link
Owner

image
lower level game

@Eric-Wainwright
Copy link

I like the latest iteration and certainly would be very happy with it. Although, I don't know why you wouldn't match the same style of bars in both sections:

image

@sente361
Copy link

sente361 commented May 11, 2021

I prefer Sander's latest iteration; a microsecond glance - "I got mostly green - yippee!" (Coloring the bars makes the data leap out at you.) In the same vein, I would also color the bars in the Key Statistics section; a nice blue would look good.

@sente361
Copy link

In my opinion it is simpler and easier to understand if the central "key" of the Points Lost section is presented in this form:

image

@xiaoyifang
Copy link
Contributor

xiaoyifang commented May 12, 2021

i think ,a better value can have a significant background color.
such as ,
Mean point Loss, the less the better
AI Top 5 ,the more the better.
image
at the same while ,avoid using too much color.

@Dontbtme
Copy link
Contributor

Dontbtme commented May 12, 2021

In Sander's version I think the middle part should be less prominent and somehow detached from the bars left and right, cause to me it kind of seemed like every color was played a lot no matter what.
I tried two mockups. They're not great but they'll show what I mean.
The first one shows the most data.
Report-Mockup210512a
The second on shows the same amount of data than in Sander's
Report-Mockup210512c
In any case I don't think the Points Lost middle part should be as wide as the Key statistic one, as in Sander's the colors really seemed to have been played a lot even when say no red or purple mistakes where played

@sanderland
Copy link
Owner

image
maybe better blue.

@Dontbtme sure that looks better, but keep in mind the whole thing is a single grid layout of labels, and the line is the bottom of the header cell. Give it a try and you'll see how difficult simple things can be in kivy ;)

@Dontbtme
Copy link
Contributor

Dontbtme commented May 12, 2021

Still, as is, any bar looks big is what I meant. Can't you limit the colors in the middle around >0.5 etc. without changing the grid? Since colors in the left and right columns are only filling them depending on the %, why colors in the midle column have to fill it entirely? Colors in the middle are what's popping up the most in your picture, when we should be focusing on colors from the player's bars. I would even rather not having any colors in the middle colomn if that's too complicated, that way the mistakes's data colors would appear clearly and brightly on each players's column
Anyway, that's only my two cents (although I'm not sure about the dark blue you switched to in the key statistics either, since the all around UI is already some kind of dark blue, but I digress)
But anyway, if the above isn't convincing, then maybe that's just a matter of taste, in which case just ignore it and let's move on ^_^

@xiaoyifang
Copy link
Contributor

image
seems wrong value >100%

@sanderland
Copy link
Owner

image
seems wrong value >100%
I think this bug was fixed and you have an old commit

@Eric-Wainwright
Copy link

Eric-Wainwright commented May 12, 2021

For statistics and the point losses, you could report both the percentage and absolute count where needed together, instead of having separate columns, or not reporting both values. I don't think the order matters. Also, there would then be no need to put "%" in the labels.

image

@xiaoyifang
Copy link
Contributor

xiaoyifang commented May 13, 2021

as already there are many suggestions now :-) .
I recommend this
image

in the upper area ,the better value have a significant background color (green etc).
the lower part 's ,the player's bar can be aligned together.

the advantage of this is ,just one glimpse ,you know who does better.and which value is higher .
without the need to read the number.

@sanderland
Copy link
Owner

Closing this as it's soon released, but feel free to continue discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.9 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants