Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Android emulator startup failure messages #1630

Merged
merged 2 commits into from
Feb 4, 2024

Conversation

rmartin16
Copy link
Member

@rmartin16 rmartin16 commented Feb 3, 2024

Changes

UX

  • Conceptually, I broke the errors in to two categories:
    • Briefcase detects the emulator closed
    • The user aborts the start up process
  • In the first case, the emulator's output probably includes the recommended steps for the user.
  • In the second case, Briefcase probably won't be able to retrieve the emulator's output because the emulator will likely still be running....and retrieving stdout from a running process without blocking is non-trivial....but potentially really valuable here

Presentation when Briefcase detects startup failure:
image

Presentation when the User aborts the startup:
image

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@rmartin16
Copy link
Member Author

rmartin16 commented Feb 3, 2024

As it relates to #1573, these changes may make it less likely for users to report when the issue occurs. However, such reports aren't really producing any valuable information anyway right now.

Given this, it'd be nice if we could add some instrumentation to include potentially valuable information about the emulator when it occurs.

The only idea I have for that is trying to retrieve any output to stdout from the emulator since it started. This is tricky but is possible with the output thread streamer using something like stop_func=lambda: time.time() > [1 sec in future]. That said, I don't have any reason to believe the output will contain anything useful....especially if the process is simply blocked on something after starting.

From the provided logs from users, we can see the emulator is in an offline state....I don't know if that's useful but it's about all we have to go on.

Another thought to help users would be to print a message telling users to send CTRL+C if the Starting emulator... Wait Bar takes longer than a few minutes...

Open to ideas...

@rmartin16
Copy link
Member Author

rmartin16 commented Feb 3, 2024

wow....didn't expect the integration tests to try out the new errors right away lol

https://github.com/beeware/briefcase/actions/runs/7769288597/job/21188546568?pr=1630

[edit]
this looks like a non-transient problem...

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad direction of this looks like a good improvement - anything that makes it easier to surface what might be going on in this case will be a good improvement. A couple of cosmetic suggestions inline; but I don't have any additional ideas on other ways to improve the error reporting.

What's really frustrating is that the people who are experiencing this problem are the least likely to be able to help us diagnose it... but I'm willing to bet that as soon as we find someone who (a) experiences this problem and (b) has enough skills to know how to generate a useful log message, we'll be able to identify the problem and a fix really quickly.

The test failure is really weird... it reads like the specific machine that is being assigned for the job is lacking disk or memory space... but I would have though re-running the job would get you a new machine in the pool...

self.tools.logger.info(emulator_popen.communicate(timeout=1)[0])
except subprocess.TimeoutExpired:
self.tools.logger.info(
"> Briefcase failed to retrieve emulator output "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this chevron intentional? If it's intentional, I'm not sure what it's communicating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through a few iterations of what to print in this situation because initially, at least, I concluded that whatever is printed here could be construed as "the emulator failed to retrieve the logs". So, then I experimented with some bracket markup and then eventually just added "Briefcase" as the explicit actor that was trying to retrieve the logs.

At any rate, I can remove it.

# if the emulator exited, this should return its output immediately;
# if it is still running, this will quickly time out and print nothing.
with suppress(subprocess.TimeoutExpired):
self.tools.logger.info("Emulator output log", prefix=self.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to bump this to warning or error. As currently rendered, it looks a bit like "normal output"; but if we're at this point, we're already in "something is in the process of breaking" mode.

It might also be worth highlighting that it's not just the output log - it's "output because of error".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; I think I'm hesitant to make it red error output because that will automatically become the first thing the user looks at and perceived as the most important....and it might be. Mostly, I wanted to direct the user to where I wanted them to start reading...at least in the case of CTRL+C. For the other failure case, it's just a wall of red text.

Anyway, I think changing the emulator output heading to a yellow warning and saying Emulator Output Log for Startup Failure instead.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this from a functionality point of view. The CI failure is the only thing standing in the way here.

I'm wondering if this might be a configuration change that has been introduced in a recent update to the Ubuntu system image... if the user's home directory has a restriction on the amount of free space, that might explain what we're seeing here.

If you want to try and debug this, Toga has a (commented out) mechanism for getting a shell session on the CI machine.

@rmartin16
Copy link
Member Author

rmartin16 commented Feb 4, 2024

It's something with the runner image released on the 1st.

I re-ran the most recent successful CI; it used the latest runner image and failed as well.

I may try to dig in later. I haven't found anything saying the available disk space was changed....intentionally anyway.

I suppose it's also possible the emulator is requiring more space for the qemu disks now...

@freakboy3742
Copy link
Member

Interestingly, the problem doesn't appear to be affecting Toga. Fresh PRs are passing Android CI without any problem

@rmartin16
Copy link
Member Author

Yeah...although, Toga isn't also building a bunch of Docker images beforehand (among a lot of other stuff)...so, I'm also wondering if something changed about the way Docker images are being managed...

I may just create a quick PR for app-build-verify to run df -h && du --max-depth 5 -h / | sort -h....that might help narrow down the issue to begin with...

@rmartin16
Copy link
Member Author

Unfortunately, re-running a previous run of CI uses the same workflows it ran the first time. If a workflow is updated (as we did in beeware/.github#88), you need a new commit; I'll push an update.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good in CI, so 🚢 !

@freakboy3742 freakboy3742 merged commit e650795 into beeware:main Feb 4, 2024
44 checks passed
@rmartin16 rmartin16 deleted the android-emulator-debug branch February 4, 2024 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants