Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cpu] remove branch prediction logic #678

Merged
merged 2 commits into from
Aug 31, 2023
Merged

[cpu] remove branch prediction logic #678

merged 2 commits into from
Aug 31, 2023

Conversation

stnolting
Copy link
Owner

@stnolting stnolting commented Aug 30, 2023

This PR removes the CPU front end 's "branch prediction" logic that was used to halt instruction fetch while a branch/jump instruction is in progress until the destination address is available (resulting in less bus traffic / congestion).

However, benchmarks show that this prediction actually lowers performance:

Exemplary CoreMark run with branch prediction:

NEORV32: Hardware Performance Monitors (low words only)
 > Active clock cycles:          2199671829
 > Retired instructions:         596184106
 > Retired compr. instructions:  349937868
 > Instr.-fetch wait cycles:     34008506
 > Instr.-issue wait cycles:     286243045
 > Multi-cycle ALU wait cycles:  99331050
 > Load operations:              108277848
 > Store operations:             28390960
 > Load/store wait cycles:       5984501
 > Unconditional jumps:          16292334
 > Conditional branches (all):   115064467
 > Conditional branches (taken): 58094389
 > Entered traps:                0
 > Illegal operations:           0

Exemplary CoreMark run without branch prediction:

NEORV32: Hardware Performance Monitors (low words only)
 > Active clock cycles:          2188425528 (faster!)
 > Retired instructions:         596184106
 > Retired compr. instructions:  349937868
 > Instr.-fetch wait cycles:     0
 > Instr.-issue wait cycles:     263816491
 > Multi-cycle ALU wait cycles:  99331050
 > Load operations:              108277848
 > Store operations:             28390960
 > Load/store wait cycles:       17164754
 > Unconditional jumps:          16292334
 > Conditional branches (all):   115064467
 > Conditional branches (taken): 58094389
 > Entered traps:                0
 > Illegal operations:           0

Adding caches results in the same speed up factor when the prediction logic is removed. Additionally, removing the prediction logic reduces core size and relaxes the critical path (= the branch taken / not taken logic).

@stnolting stnolting added HW hardware-related optimization Make things faster, smaller and more efficient labels Aug 30, 2023
@stnolting stnolting self-assigned this Aug 30, 2023
@stnolting stnolting marked this pull request as ready for review August 30, 2023 19:04
@stnolting stnolting merged commit 1df13f5 into main Aug 31, 2023
8 checks passed
@stnolting stnolting deleted the front_end branch August 31, 2023 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HW hardware-related optimization Make things faster, smaller and more efficient
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant