Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider splitting ComponentTicks #4884

Closed
james7132 opened this issue May 31, 2022 · 2 comments
Closed

Consider splitting ComponentTicks #4884

james7132 opened this issue May 31, 2022 · 2 comments
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help

Comments

@james7132
Copy link
Member

james7132 commented May 31, 2022

What problem does this solve or what need does it fill?

ComponentTicks is a lopsided datatype. The added ticks are immutable over the lifetime of the component and the changed ticks may be updated every frame. When using them in Changed<T> or Added<T> filters, half the memory fetched goes unused. When writing to it via Mut, cache lines are written out twice as often as they could.

This will likely be blocked on #4883 being resolved first before making any changes here.

What solution would you like?

Split Vec<UnsafeCell<ComponentTicks>> into two Vec<UnsafeCell<u32>> for added and changed ticks. Update related structs (i.e. Added<T>, Changed<T>, ChangeTrackers<T>, Mut<T>) to fetch only the ones needed.

Ideally, this should also enable autovectorization in query code for these filters, which should further speed up change detection filters.

What alternative(s) have you considered?

Leaving change detection as is.

@james7132 james7132 added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels May 31, 2022
@alice-i-cecile
Copy link
Member

I'm fully on board with this plan. Before we merge a PR for this, I'd like a benchmark for change detection so we can see the impact.

@alice-i-cecile alice-i-cecile added the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label May 31, 2022
@james7132
Copy link
Member Author

Updated the issue to mention this is blocked on #4883.

@bors bors bot closed this as completed in 55ca7fc Nov 21, 2022
taiyoungjang pushed a commit to taiyoungjang/bevy that referenced this issue Dec 15, 2022
# Objective
Fixes bevyengine#4884. `ComponentTicks` stores both added and changed ticks contiguously in the same 8 bytes. This is convenient when passing around both together, but causes half the bytes fetched from memory for the purposes of change detection to effectively go unused. This is inefficient when most queries (no filter, mutating *something*) only write out to the changed ticks.

## Solution
Split the storage for change detection ticks into two separate `Vec`s inside `Column`. Fetch only what is needed during iteration.

This also potentially also removes one blocker from autovectorization of dense queries.

EDIT: This is confirmed to enable autovectorization of dense queries in `for_each` and `par_for_each`  where possible.  Unfortunately `iter` has other blockers that prevent it.

### TODO

 - [x] Microbenchmark
 - [x] Check if this allows query iteration to autovectorize simple loops.
 - [x] Clean up all of the spurious tuples now littered throughout the API

### Open Questions

 - ~~Is `Mut::is_added` absolutely necessary? Can we not just use `Added` or `ChangeTrackers`?~~ It's optimized out if unused.
 - ~~Does the fetch of the added ticks get optimized out if not used?~~ Yes it is.

---

## Changelog
Added: `Tick`, a wrapper around a single change detection tick.
Added: `Column::get_added_ticks`
Added: `Column::get_column_ticks`
Added: `SparseSet::get_added_ticks`
Added: `SparseSet::get_column_ticks`
Changed: `Column` now stores added and changed ticks separately internally.
Changed: Most APIs returning `&UnsafeCell<ComponentTicks>` now returns `TickCells` instead, which contains two separate `&UnsafeCell<Tick>` for either component ticks.
Changed: `Query::for_each(_mut)`, `Query::par_for_each(_mut)` will now leverage autovectorization to speed up query iteration where possible.

## Migration Guide
TODO
alradish pushed a commit to alradish/bevy that referenced this issue Jan 22, 2023
# Objective
Fixes bevyengine#4884. `ComponentTicks` stores both added and changed ticks contiguously in the same 8 bytes. This is convenient when passing around both together, but causes half the bytes fetched from memory for the purposes of change detection to effectively go unused. This is inefficient when most queries (no filter, mutating *something*) only write out to the changed ticks.

## Solution
Split the storage for change detection ticks into two separate `Vec`s inside `Column`. Fetch only what is needed during iteration.

This also potentially also removes one blocker from autovectorization of dense queries.

EDIT: This is confirmed to enable autovectorization of dense queries in `for_each` and `par_for_each`  where possible.  Unfortunately `iter` has other blockers that prevent it.

### TODO

 - [x] Microbenchmark
 - [x] Check if this allows query iteration to autovectorize simple loops.
 - [x] Clean up all of the spurious tuples now littered throughout the API

### Open Questions

 - ~~Is `Mut::is_added` absolutely necessary? Can we not just use `Added` or `ChangeTrackers`?~~ It's optimized out if unused.
 - ~~Does the fetch of the added ticks get optimized out if not used?~~ Yes it is.

---

## Changelog
Added: `Tick`, a wrapper around a single change detection tick.
Added: `Column::get_added_ticks`
Added: `Column::get_column_ticks`
Added: `SparseSet::get_added_ticks`
Added: `SparseSet::get_column_ticks`
Changed: `Column` now stores added and changed ticks separately internally.
Changed: Most APIs returning `&UnsafeCell<ComponentTicks>` now returns `TickCells` instead, which contains two separate `&UnsafeCell<Tick>` for either component ticks.
Changed: `Query::for_each(_mut)`, `Query::par_for_each(_mut)` will now leverage autovectorization to speed up query iteration where possible.

## Migration Guide
TODO
ItsDoot pushed a commit to ItsDoot/bevy that referenced this issue Feb 1, 2023
# Objective
Fixes bevyengine#4884. `ComponentTicks` stores both added and changed ticks contiguously in the same 8 bytes. This is convenient when passing around both together, but causes half the bytes fetched from memory for the purposes of change detection to effectively go unused. This is inefficient when most queries (no filter, mutating *something*) only write out to the changed ticks.

## Solution
Split the storage for change detection ticks into two separate `Vec`s inside `Column`. Fetch only what is needed during iteration.

This also potentially also removes one blocker from autovectorization of dense queries.

EDIT: This is confirmed to enable autovectorization of dense queries in `for_each` and `par_for_each`  where possible.  Unfortunately `iter` has other blockers that prevent it.

### TODO

 - [x] Microbenchmark
 - [x] Check if this allows query iteration to autovectorize simple loops.
 - [x] Clean up all of the spurious tuples now littered throughout the API

### Open Questions

 - ~~Is `Mut::is_added` absolutely necessary? Can we not just use `Added` or `ChangeTrackers`?~~ It's optimized out if unused.
 - ~~Does the fetch of the added ticks get optimized out if not used?~~ Yes it is.

---

## Changelog
Added: `Tick`, a wrapper around a single change detection tick.
Added: `Column::get_added_ticks`
Added: `Column::get_column_ticks`
Added: `SparseSet::get_added_ticks`
Added: `SparseSet::get_column_ticks`
Changed: `Column` now stores added and changed ticks separately internally.
Changed: Most APIs returning `&UnsafeCell<ComponentTicks>` now returns `TickCells` instead, which contains two separate `&UnsafeCell<Tick>` for either component ticks.
Changed: `Query::for_each(_mut)`, `Query::par_for_each(_mut)` will now leverage autovectorization to speed up query iteration where possible.

## Migration Guide
TODO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants