Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to fix macOS hang when building Python 2 #1343

Merged
merged 7 commits into from
Feb 15, 2024
Merged

Conversation

BillyONeal
Copy link
Member

  • Don't open multiple handles to the same file when looking for absolute paths.
  • Don't use std::future.

Rather than opening separate file handles, and reading the first part with one and the rest with the other, introduce best_effort_read_contents_if_shebang which implements this test directly when doing the first read buffer.

Also fixes read_contents to avoid needing to copy the entire file down by 3 bytes if the file starts with a BOM using the same idea.
@BillyONeal BillyONeal marked this pull request as ready for review February 13, 2024 01:15
@BillyONeal
Copy link
Member Author

I kicked off 3 copies of the osx-x64 build to see if this helps:

Initial results are promising since all 3 of these got past Python2 without issues but without a full understanding of the cause of the problem it is difficult to be sure.


std::vector<std::future<void>> workers;
workers.reserve(num_threads - 1);
WorkCallbackContext(F init_f, size_t work_count) : work(init_f), work_count(work_count), next_offset(0) { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
WorkCallbackContext(F init_f, size_t work_count) : work(init_f), work_count(work_count), next_offset(0) { }
WorkCallbackContext(F init_f, size_t work_count) : work(std::move(init_f)), work_count(work_count), next_offset(0) { }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a functor/callable, which are usually pointers, and for which std::move is a pessimization. (This is also why they are passed by value)

Copy link
Contributor

@ras0219-msft ras0219-msft Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why taking it by rvalue reference would be a pessimization, but how would std::move reduce performance? For simple pointers, isn't it the same as a copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move forces that the value be bound to an rvalue-reference. This is why it, for example, breaks NRVO in return std::move(blah); cases. It is likely that it wouldn't matter in this case; I'm just used to 'pass the functors by value' resulting from the mess that caused us to add _Pass_fn: passing functors by reference broke the ability to inline through function pointers. Maybe my paranoia here is out of date by now but given that we control all the functors here and aren't going to ever pass ones that are expensive to copy I'm inclined to keep following the convention.

https://github.com/microsoft/STL/blob/bd3d740ae5de7255c720b8133c5d23aa131e0760/stl/inc/xutility#L558-L584

auto offset = next_offset.load(std::memory_order_relaxed);
while (offset < work_count)
{
if (!next_offset.compare_exchange_weak(offset, offset + 1, std::memory_order_relaxed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use fetch_add instead, since we can assume that work_count + threads < SIZE_MAX?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can make that assumption. (fetch_add was @Thomas1664 's initial solution and I requested this form to avoid overflow)

I suppose we can assume this is unlikely and add other edge cases for it though I'm not sure that would be better... stand by

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use fetch_add instead, since we can assume that work_count + threads < SIZE_MAX?

work_count could reach SIZE_MAX. I remember that some PRs that used smaller types in context of the install plan were rejected.

include/vcpkg/base/parallel-algorithms.h Show resolved Hide resolved
include/vcpkg/base/parallel-algorithms.h Outdated Show resolved Hide resolved
src/vcpkg/base/files.cpp Show resolved Hide resolved
src/vcpkg/base/files.cpp Outdated Show resolved Hide resolved
src/vcpkg/base/files.cpp Outdated Show resolved Hide resolved
if (ptp_work)
{
auto max_threads = (std::min)(work_count, static_cast<size_t>(get_concurrency()));
max_threads = (std::min)(max_threads, SIZE_MAX - work_count); // to avoid overflow in fetch_add
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_threads = (std::min)(max_threads, SIZE_MAX - work_count); // to avoid overflow in fetch_add
if (SIZE_MAX - work_count > max_threads) Checks::unreachable(VCPKG_LINE_INFO);

I don't think trying to be fancy here to support more than 18,446,744,073,709,550,000-ish (or 4,294,966,000-ish) items is worth it.

Also, I think the current line is pessimistic by 1 item/thread -- if work_count == SIZE_MAX, 1 thread is permissible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think trying to be fancy here to support more than 18,446,744,073,709,550,000-ish (or 4,294,966,000-ish) items is worth it.

I think it is worth it, at least as long as 32 bit is not dead. (And arm-linux is still very much a thing, even if we won't get much parallelism there)

Also, I think the current line is pessimistic by 1 item/thread -- if work_count == SIZE_MAX, 1 thread is permissible.

Sure. (Note that work_count == SIZE_MAX is impossible because the end() element can't exist in the address space. :) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Moreover, a dependency from this stuff on Checks:: makes this more complex, not less)

@BillyONeal BillyONeal merged commit 05320e4 into main Feb 15, 2024
5 checks passed
@BillyONeal BillyONeal deleted the try-to-fix-macos-hang branch February 15, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants