Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tell in detail why remote building fails on -v #3927

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Aug 13, 2020

Motivation

See #1572. Further improves upon #3897 (CC @bburdette).

Also further improve the final error message, giving a practical example of
how to set machine features because that's what most people need to succeed
in building large packages remotely.

Context

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests - tests/nixos/*
  • documentation in the manual
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or incompatible change: updated release notes

Priorities

Add 👍 to pull requests you find important.

@nh2 nh2 force-pushed the issue-1572-log-remote-builder-decline-reasons branch 2 times, most recently from 7468a80 to 5063f8c Compare August 18, 2020 00:58
src/libstore/build.cc Outdated Show resolved Hide resolved
@bburdette
Copy link
Contributor

Got a screenshot of output from this?

@nh2 nh2 force-pushed the issue-1572-log-remote-builder-decline-reasons branch from 581e999 to ddd4556 Compare September 4, 2020 13:43
@nh2
Copy link
Contributor Author

nh2 commented Sep 4, 2020

I've squashed the commit suggestions.

@nh2
Copy link
Contributor Author

nh2 commented Sep 4, 2020

Got a screenshot of output from this?

@bburdette I didn't make a screenshot but it's litearlly just another of the many lines in the -v output, like

...
declined remote machine 'static-haskell-nix-ci' for building '/nix/store/package-hash1234' because the derivation does not have all mandatory features to build on that machine: [ big-parallel ]
...

@domenkozar
Copy link
Member

@nh2 could you rebase and we'll merge it?

@nh2
Copy link
Contributor Author

nh2 commented Nov 25, 2020

@nh2
Copy link
Contributor Author

nh2 commented Jun 2, 2021

@Ericson2314 short ping :)

@domenkozar
Copy link
Member

🙏 this one is a huge UX improvement, it would be a shame to let it stale.

@domenkozar domenkozar added the error-messages Confusing messages and better diagnostics label Jul 28, 2021
@stale
Copy link

stale bot commented Apr 16, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Apr 16, 2022
@nh2
Copy link
Contributor Author

nh2 commented May 27, 2022

@Ericson2314 Short ping 2 years later ^ :)

@stale stale bot removed the stale label May 27, 2022
@bitonic
Copy link

bitonic commented May 27, 2022

Just to chime in, I lost half an hour on this today. It'd be great to get this merged.

@fricklerhandwerk fricklerhandwerk added the UX The way in which users interact with Nix. Higher level than UI. label Sep 9, 2022
@Ericson2314
Copy link
Member

Ericson2314 commented Mar 11, 2023

Ah! I am seeing and all the pings this now :(

@Ericson2314 Ericson2314 self-assigned this Mar 11, 2023
@Ericson2314
Copy link
Member

As penance, I will just rebase this myself.

@Ericson2314 Ericson2314 force-pushed the issue-1572-log-remote-builder-decline-reasons branch from ddd4556 to 28a52ff Compare March 11, 2023 21:44
@Ericson2314
Copy link
Member

@nh2 I have rebased, and applied the PR description template.

@Ericson2314
Copy link
Member

As to the content of the new error messages, I hope @fricklerhandwerk can review them.

Also further improve the final error message, giving a practical example of
how to set machine features because that's what most people need to succeed
in building large packages remotely.
@Ericson2314 Ericson2314 force-pushed the issue-1572-log-remote-builder-decline-reasons branch from 28a52ff to b1d22a4 Compare March 11, 2023 21:50
Copy link
Contributor

@fricklerhandwerk fricklerhandwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be more explicit with the expectations that were failed as represented by the error message. I'd be in favor of having a very structured presentation similar to what we had in the profile conflict PR, e.g

error condition

  expected: ...

  actual: ...

hint on what to do

@roberth we may want to write a guideline :)

debug("declined remote machine '%s' for building '%s' because it does not have the needed system type '%s", m.storeUri, drvstr, neededSystem);
} else if (!m.allSupported(requiredFeatures)) {
std::string joinedFeatures = concatStringsSep(" ", requiredFeatures); // includes leading space
debug("declined remote machine '%s' for building '%s' because it does not have all required features: [ %s ]", m.storeUri, drvstr, joinedFeatures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be explicit on what was expected and what is available.

debug("declined remote machine '%s' for building '%s' because it does not have all required features: [ %s ]", m.storeUri, drvstr, joinedFeatures);
} else if (!m.mandatoryMet(requiredFeatures)) {
std::string joinedFeatures = concatStringsSep(" ", requiredFeatures); // includes leading space
debug("declined remote machine '%s' for building '%s' because the derivation does not have all mandatory features to build on that machine: [ %s ]", m.storeUri, drvstr, joinedFeatures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this means. I thought a machine may be lacking features to build a drv, how would it work the other way round?

Copy link
Contributor

@tomberek tomberek Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely the message is backwards

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's accurate as is, but "decline" might not be the best word perhaps (for all of these)

Comment on lines +289 to +291
"For example, to enable the 'big-parallel' feature, use:"
"\n --builders 'yourbuilderhostname - - - - big-parallel'"
"\nhttps://nixos.org/nix/manual/stable/advanced-topics/distributed-builds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should instead refer to the features section in man nix.conf as we did for other hints in error messages recently.

"For example, to enable the 'big-parallel' feature, use:"
"\n --builders 'yourbuilderhostname - - - - big-parallel'"
"\nhttps://nixos.org/nix/manual/stable/advanced-topics/distributed-builds"
"\nYou can enable -v verbosity to see why any machine "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"\nYou can enable -v verbosity to see why any machine "
"\nEnable -v verbosity to show why any machine "

} else if (neededSystem != "builtin" &&
std::find(m.systemTypes.begin(), m.systemTypes.end(), neededSystem)
== m.systemTypes.end()) {
debug("declined remote machine '%s' for building '%s' because it does not have the needed system type '%s", m.storeUri, drvstr, neededSystem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one does it have then?

@roberth
Copy link
Member

roberth commented Mar 12, 2023

I think we should be more explicit with the expectations that were failed as represented by the error message. I'd be in favor of having a very structured presentation similar to what we had in the profile conflict PR, e.g

error condition

  expected: ...

  actual: ...

hint on what to do

Explaining why nothing doesn't satisfy a set of constraints is always a recipe for long and complicated error messages. That's just the nature of the problem: there's a lot of data involved and we can't guess what should match up with what.
In this case we'd have to track exactly why each of the remote builders does not satisfy the requirements and pass that information to the exception constructor.

Realistically the error message would have to be

error condition

expected:

not builder 1
  reasons

not builder 2
  reasons

not builder 3
  reasons

...

You could sort the build machines by relevance and truncate after some limit (say, 3) to reduce the length of the message.

To make sure that the algorithm for checking and the algorithm for giving reasons are in sync, we'll want to use some abstractions that we don't have yet, or perhaps we should accept that we're computing and remembering too much in the happy flow.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Mar 12, 2023

If I read the code correctly, the debug outputs are per machine, so all we need at this point are slightly more verbose texts that simply also contain the expected properties. And I only propose to make them multiline and possibly indented to make sure that the additional visual noise doesn't preclude noticing the relevant bits.

I don't see why this would require additional infrastructure.

@roberth
Copy link
Member

roberth commented Mar 13, 2023

I assumed that you wanted to incorporate the details into the exception message instead, so that extra log verbosity isn't needed. If it remains a lvlDebug thing, the performance cost isn't too important.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Mar 13, 2023

@roberth I have no opinion on whether it should be in the exception. Setting up the infra to lift it to the exception level sounds like quite some work, so I'd rather merge it as debug messages and allow for following up in a more principled manner when time allows.

@Ericson2314
Copy link
Member

#7865 structured exceptions would help with this.

m.mandatoryMet(requiredFeatures))
{
if (!m.enabled) {
debug("declined remote machine '%s' because it is disabled", storeUri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
debug("declined remote machine '%s' because it is disabled", storeUri);
debug("avoiding remote machine '%s' because it is disabled", storeUri);

How about avoid for all of these?

The problem with decline is that it seems like the machine is initiating the remote build or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error-messages Confusing messages and better diagnostics UX The way in which users interact with Nix. Higher level than UI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants