Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small speedup in incremental accumulator updates #5576

Closed
wants to merge 1 commit into from

Conversation

MinetaS
Copy link
Contributor

@MinetaS MinetaS commented Sep 7, 2024

Instead of updating at most two accumulators, update all accumluators during incremental updates. Tests have shown that this change yields a small speedup of at least 0.5%, and up to 1% with shorter TC.

Passed STC:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 54368 W: 14179 L: 13842 D: 26347
Ptnml(0-2): 173, 6122, 14262, 6449, 178
https://tests.stockfishchess.org/tests/view/66db038a9de3e7f9b33d1ad9

Passed 5+0.05:
LLR: 2.98 (-2.94,2.94) <0.00,2.00>
Total: 55040 W: 14682 L: 14322 D: 26036
Ptnml(0-2): 303, 6364, 13856, 6664, 333
https://tests.stockfishchess.org/tests/view/66dbc325dc53972b68218ba7

Passed non-regression LTC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 57390 W: 14555 L: 14376 D: 28459
Ptnml(0-2): 37, 5876, 16683, 6069, 30
https://tests.stockfishchess.org/tests/view/66dbc30adc53972b68218ba5

No functional change

Instead of updating at most two accumulators, update all accumluators
during incremental updates. Tests have shown that this change yields a
small speedup of at least 0.5%, and up to 1% with shorter TC.

Passed STC:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 54368 W: 14179 L: 13842 D: 26347
Ptnml(0-2): 173, 6122, 14262, 6449, 178
https://tests.stockfishchess.org/tests/view/66db038a9de3e7f9b33d1ad9

Passed 5+0.05:
LLR: 2.98 (-2.94,2.94) <0.00,2.00>
Total: 55040 W: 14682 L: 14322 D: 26036
Ptnml(0-2): 303, 6364, 13856, 6664, 333
https://tests.stockfishchess.org/tests/view/66dbc325dc53972b68218ba7

Passed non-regression LTC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 57390 W: 14555 L: 14376 D: 28459
Ptnml(0-2): 37, 5876, 16683, 6069, 30
https://tests.stockfishchess.org/tests/view/66dbc30adc53972b68218ba5

No functional change
@peregrineshahin
Copy link
Contributor

Wow, congrats on the speedup!
I'm curious to ask a couple of questions...
Could you explain the rationale about how this idea came to mind, or works?
I wonder if this option is a new one...
Does such a change align well with what we think we know best about NNUE, or a surprising one to you?

@peregrineshahin
Copy link
Contributor

peregrineshahin commented Sep 8, 2024

@peregrineshahin unclear what you're even asking in your last two questions

the questions are general, I'm good with any reply about this patch.

@xu-shawn
Copy link
Contributor

xu-shawn commented Sep 8, 2024

@peregrineshahin unclear what you're even asking in your last two questions

They make perfect sense to me

@MinetaS
Copy link
Contributor Author

MinetaS commented Sep 8, 2024

I was about to sum up the logics behind but I had to sleep first, thanks peregrine for the questions. :)

Could you explain the rationale about how this idea came to mind, or works?

This is the current update strategy:

Let's assume there are not computed accumulators between two accumulators (that we are trying to update), and only State[1] and State[5] are computed. Because SF search is limited-depth DFS, starting from ply = 6 accumulators will be updated in the following order:

2/6, 3/7, 4/8, 5/9, 10, 11

As you see, for a few iterations, SF has to write to two different accumulators that are not close to each other, which is possibly not cache-friendly. Now, this patch updates all accumluators up to pos.state() successively, which could be slower for one single position but benefits the overall search.

Note that this explanation is not what I had in my mind first; I'd just been trying different approaches to find something works.

Does such a change align well with what we think we know best about NNUE, or a surprising one to you?

It depends on the shape of search tree and update/refresh cost model. I think experimenting with update/refresh cost model further is a good direction to improve this patch.

@cj5716
Copy link
Contributor

cj5716 commented Sep 8, 2024

I've been making some attempts to improve update behaviour for some time, and I think there are some optimisations we can make on this PR. I will try and launch some tests in the coming days. Good job on this PR!

PS. in my head the main saving is the fact that we save multiple adds and subs, i.e. if 1 is computed and current accumulator is 5, at the moment we will update 2 and 5. However, what is to note is that 3 and 4 are computed in the intermediate steps to update 5, ie. this PR saves some extra recomputation.

@MinetaS
Copy link
Contributor Author

MinetaS commented Sep 8, 2024

PS. in my head the main saving is the fact that we save multiple adds and subs, i.e. if 1 is computed and current accumulator is 5, at the moment we will update 2 and 5. However, what is to note is that 3 and 4 are computed in the intermediate steps to update 5, ie. this PR saves some extra recomputation.

Eventually 3 and 4 are computed in later evaluations so I thought the cost itself is equal. Also if the search tree is not wide (i.e. multiple moveCount=1 nodes and they are all beta cutoff), or the node is leaf, it actually becomes worse than what we do currently.

@vondele vondele added simplification A simplification patch 🚀 gainer Gains elo no-functional-change labels Sep 8, 2024
@cj5716
Copy link
Contributor

cj5716 commented Sep 8, 2024

Eventually 3 and 4 are computed in later evaluations so I thought the cost itself is equal.

if you think about it, it's more like:
old:
update 2 and 5:
1 -> apply 2 -> store 2 -> apply 3 -> apply 4 -> apply 5 -> store 5
update 3 and 4:
2 -> apply 3 -> store 3 -> apply 4 -> store 4

if you look at it, "apply 3" and "apply 4" are called twice

vs patch:
1 -> apply 2 -> store 2 -> apply 3 -> store 3 -> apply 4 -> store 4 -> apply 5 -> store 5

although, this is assuming the worst case so it makes sense why the extra savings don't gain as much as one might think in the theoretical sense

@MinetaS
Copy link
Contributor Author

MinetaS commented Sep 8, 2024

if you think about it, it's more like: old: update 2 and 5: 1 -> apply 2 -> store 2 -> apply 3 -> apply 4 -> apply 5 -> store 5 update 3 and 4: 2 -> apply 3 -> store 3 -> apply 4 -> store 4

if you look at it, "apply 3" and "apply 4" are called twice

vs patch: 1 -> apply 2 -> store 2 -> apply 3 -> store 3 -> apply 4 -> store 4 -> apply 5 -> store 5

Yes that's correct as well, although I'm not sure if populating removed/added lists is slow to that extent. Might be worth to check where the gains are from to optimize this even further.

@Disservin Disservin added the to be merged Will be merged shortly label Sep 9, 2024
@Disservin Disservin closed this in 2680c9c Sep 9, 2024
skystarspython pushed a commit to official-pikafish/Pikafish that referenced this pull request Sep 14, 2024
Instead of updating at most two accumulators, update all accumluators
during incremental updates. Tests have shown that this change yields a
small speedup of at least 0.5%, and up to 1% with shorter TC.

Passed STC:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 54368 W: 14179 L: 13842 D: 26347
Ptnml(0-2): 173, 6122, 14262, 6449, 178
https://tests.stockfishchess.org/tests/view/66db038a9de3e7f9b33d1ad9

Passed 5+0.05:
LLR: 2.98 (-2.94,2.94) <0.00,2.00>
Total: 55040 W: 14682 L: 14322 D: 26036
Ptnml(0-2): 303, 6364, 13856, 6664, 333
https://tests.stockfishchess.org/tests/view/66dbc325dc53972b68218ba7

Passed non-regression LTC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 57390 W: 14555 L: 14376 D: 28459
Ptnml(0-2): 37, 5876, 16683, 6069, 30
https://tests.stockfishchess.org/tests/view/66dbc30adc53972b68218ba5

closes official-stockfish/Stockfish#5576

No functional change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 gainer Gains elo no-functional-change simplification A simplification patch to be merged Will be merged shortly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants