Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor how CowData manages its allocations #74582

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RedworkDE
Copy link
Member

@RedworkDE RedworkDE commented Mar 8, 2023

Context: I was looking into making CowData use 64-bit indices to solve the various crashes / issues caused by too large arrays, and found that the internal structure of CowData is fundamentally not suited for such a thing (and also really weird) so I went and reworked how the internal data storage is managed.

To the outside this PR shouldn't change any behavior, except that allocating somewhat larger arrays works now (and also on MSVC there are no longer cases where operations return OK but actually failed).

There are a few more improvements that could / should be done, but I stopped before feature creep completely overtook this.

Needs #74555 to pass CI.

  • Use an explicit type for the data prefix to enable easier changes.
  • Explicitly store the capacity instead of using the next power of two.
  • If resizing to a significantly larger size (more 2x the current capacity), assume that the caller has just resized to their final size and don't allocate any excess capacity.
  • Trim the capacity less aggressively to fix edge cases when repeatedly adding and removing elements causing extreme performance issues.
  • Do not rely on overflow checked operations to determine the maximum size, to avoid crashes with some compilers when using too large sizes.
  • On 64-bit system increase the the maximum size to 0x7ffffff0, generally by a factor of the size of the elements.
  • On 32-bit system the maximum size remains limited by the size of the address space / memory, but in some cases it may still be possible to allocate larger arrays compared to before.
  • Add a minimum capacity of 4 elements to remove some unnecessary reallocation for small arrays.
  • Remove a few unnecessary potential copies when resizing, adding or removing elements.

What to do with the ERR_PROPAGATE macro?
I added this to check an Error return value and to return it from the current method if it is not OK, is this usefull for other places in the engine as well and should be added to the other ERR macros? Should I just inline that check everywhere? Or can it just stay as is?


What about alignment?
The old version of this just didn't care about alignment and just returned what it got from the allocator (which is malloc aligned + 16 bytes).
This assumes that the allocator pointer is aligned correctly and then offsets the first element by an multiple of the alignment, but I am not sure if the alignment is correct for types with large alignments, ie simd types. Should this be extended to ensure correct alignment for such types?


About the buffers in max_size:
The 64 bytes buffer to filling the entire address space with one array is arbitrary and not going to matter beyond preventing overflows when computing the size of the allocation as way more than that are going to be filled with godot, so such allocations are going to fail anyways.
The 15 indices buffer is similarly arbitrary and serves to prevent overflows when computing size()+1 and similar things.
For reference in C# the maximum array size is 0x7fffffc7 bytes regardless of element size. (Buffer of 56).


About that performance hole in the current implementation:
When repeatedly increasing and decreasing the size of the allocation across a power of 2, on every operation the entire backing array will be reallocated, causing (in constructed examples 1000x+) slowdowns compared to doing the same with just a few more or less elements.

func _ready():
	perf(43689)
	perf(43690)
	perf(43691)
func perf(size):
	var start = Time.get_ticks_msec()
	var arr = []
	arr.resize(size)
	for i in range(10000):
		arr.push_back(null)
		arr.pop_back()
	var end = Time.get_ticks_msec()
	print("%d took %dms" % [size, end - start])
	pass

Before: (on a production build)

43689 took 2ms
43690 took 4467ms
43691 took 1ms

After: (on a debug build)

43689 took 9ms
43690 took 9ms
43691 took 9ms

(And no this didn't just move where this happens)


Allowed array sizes:
Previously (unpacked) arrays with 100 million + elements would fail to allocate / crash
Now all Arrays can have 2 billion elements (if you have the 50GB ram needed for that) and all failures will correctly return error codes.
Unfortunately allocating arrays with over 4 billion elements will appear to work, but the size actually gets truncated to 32bit and thus the result is actually smaller. This happens at the gdscript to native boundary tho, so I am not sure if this can easily be fixed without breaking lots of things or actually making the index/size 64 bit large. See #74676


Memory savings:
Because this PR doesn't over-allocate when resizing to the required maximum size, this saves quite a bit of memory.
https://github.com/Calinou/godot-reflection goes from 20MiB+9MiB (30% waste) to 20MiB+800KiB (4% waste)
See https://github.com/RedworkDE/godot/tree/resource-mon-cowdata and https://github.com/RedworkDE/godot/tree/resource-mon-cowdata-64 for adding a resource monitor for this.


Potential issues:

  • This touches a pretty core element of the engine
  • This tries to not over-allocate memory if possible, so buffer overruns are more likely to overrun the allocation entirely
  • It is no longer possible to marshal all PackedArray instances into C# arrays, but this issue previously already existed into the other direction.
  • ....

Potential advantages:

  • Allows bigger arrays
  • Paves the way for even bigger arrays
  • Probably decreases the overall memory use a bit
  • The whole code is no longer super weird
  • ....

@TokisanGames
Copy link
Contributor

TokisanGames commented Jul 25, 2023

I built Godot 4.1.2-rc with this PR and #77306, using g++-mingw-w64-x86-64-posix 10.3.0-14ubuntu1+24.3. Then I tested Terrain3D saving a large data set described in 62585 and our issue.

I loaded up 16k x 8k of data, attempting to save 1.4gb memory (though we need 3gb now for 16k terrains, and will be increasing that later). I saved using the inspector, which just uses the ResourceSaver::save(), independent of our code. Upon saving, Godot did not crash so that's an improvement.

However I get these errors:

# First one ~100 times:
ERROR: Index p_size = -2147483648 is out of bounds (max_size() = 2147483632).
   at: resize (./core/templates/cowdata.h:374)
ERROR: Method/function failed. Returning: ERR_FILE_CORRUPT
   at: parse_variant (core/io/resource_format_binary.cpp:656)
ERROR: Failed loading resource: res://new_terrain_3d_storage.res. Make sure resources have been imported by opening the.
   at: (core/io/resource_loader.cpp:273)

And now my resource file is corrupt. Previously in 4.1.1-stable Godot crashed, but the file was untouched.

@RandomShaper
Copy link
Member

Code-wise and rationale-wise, this looks good to me. Functionality-wise (perf, etc.) I would leave some benchmarking to others to really know how beneficial this is in real world cases.

Besides, I've made a branch with this PR and one commit of mine on top to suggest those changes to be integrated here: https://github.com/RandomShaper/godot/commits/pr74582_suggestions.

@RedworkDE RedworkDE force-pushed the cowdata-64 branch 3 times, most recently from 51f90f0 to 6979ae1 Compare August 26, 2023 14:01
- Use an explicit type for the data prefix to enable easier changes.
- Explicitly store the capacity instead of using the next power of two.
- If resizing to a significantly larger size (more 2x the current
  capacity), assume that the caller has just resized to their final
  size and don't allocate any excess capacity.
- Trim the capacity less aggressively to fix edge cases when repeatedly
  adding and removing elements causing extreme performance issues.
- Do not rely on overflow checked operations to determine the maximum
  size, to avoid crashes with some compilers when using too large sizes.
- On 64-bit system increase the the maximum size to 0x7ffffff0,
  generally by a factor of the size of the elements.
- On 32-bit system the maximum size remains limited by the size of the
  address space / memory, but in some cases it may still be possible to
  allocate larger arrays compared to before.
- Add a minimum capacity to remove some unnecessary reallocation for
  small arrays.
- Remove a few unnecessary potential copies when resizing, adding or
  removing elements.
Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow review.

Just left a few minor suggestions.

Timeline wise, I think we're getting a bit too close to 4.3 beta to merge such a core change now, unless @godotengine/core members think it particularly safe or critical.
But I'd like to collect a few approvals from the team so we can merge early on for 4.4.

Comment on lines +57 to +64
#define ERR_PROPAGATE(m_expr) \
if (true) { \
Error err = m_expr; \
if (unlikely(err)) { \
return err; \
} \
} else \
((void)0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs a dedicated macro just for 4 cases in this class. It's a common pattern that we fully spell out when it's used.

Comment on lines +130 to +134
template <typename T>
constexpr auto ALIGN_AT(const T m_number, const T m_alignment) {
return ((m_number + (m_alignment - 1)) / m_alignment) * m_alignment;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used once in CowData, so I would keep it defined there, and likely directly inline.

int capacity;
int size;
};
static_assert(std::is_trivially_destructible<CowDataPrefix>::value, "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_assert(std::is_trivially_destructible<CowDataPrefix>::value, "");
static_assert(std::is_trivially_destructible<CowDataPrefix>::value, "CowDataPrefix should be trivially destructible.");

@akien-mga akien-mga modified the milestones: 4.x, 4.4 May 6, 2024
return rc;

CowDataPrefix *mem_new = (CowDataPrefix *)Memory::realloc_static(prefix, size_t(p_new_capacity) * sizeof(T) + prefix_offset());
ERR_FAIL_COND_V_MSG(!mem_new, ERR_OUT_OF_MEMORY, "Insufficient memory to allocate CowData");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ERR_FAIL_COND_V_MSG(!mem_new, ERR_OUT_OF_MEMORY, "Insufficient memory to allocate CowData");
ERR_FAIL_NULL_V_MSG(mem_new, ERR_OUT_OF_MEMORY, "Insufficient memory to allocate CowData.");


uint32_t *mem_new = (uint32_t *)Memory::alloc_static(_get_alloc_size(current_size), true);
CowDataPrefix *mem_new = (CowDataPrefix *)Memory::alloc_static(size_t(p_capacity) * sizeof(T) + prefix_offset());
ERR_FAIL_COND_V_MSG(!mem_new, ERR_OUT_OF_MEMORY, "Insufficient memory to allocate CowData");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ERR_FAIL_COND_V_MSG(!mem_new, ERR_OUT_OF_MEMORY, "Insufficient memory to allocate CowData");
ERR_FAIL_NULL_V_MSG(mem_new, ERR_OUT_OF_MEMORY, "Insufficient memory to allocate CowData.");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants