Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android CI hangs #3878

Closed
niyaznigmatullin opened this issue Aug 26, 2022 · 4 comments
Closed

Android CI hangs #3878

niyaznigmatullin opened this issue Aug 26, 2022 · 4 comments

Comments

@niyaznigmatullin
Copy link
Contributor

After updating chrono from 0.4.19 to 0.4.22 android CI hangs from time to time. If getting back to 0.4.19 it stops doing that, and works fine. I tested that on my fork, and on android emulator on local computer.

The changes happened in chrono between these versions are:

  • new timezone detection introduced
  • fixed timezone detection for different platforms and specifically for Android
  • new dependency iana_time_zone was used for that

I spent some time to investigate why it fails in a local emulator, almost every time it's the pr tests that don't terminate, and the pr binary is waiting in do_exit(), and the test is waiting in libc::syscall -> __kernel_vsyscall. So I was testing only test_pr:: tests. And hanging happens so indefinitely, for instance, if you call binary normally trying to get its exit code, it hangs often, if you call it in background with & not trying to get its exit code, it doesn't hang.

I would say that probably some undefined behavior happens somewhere in some library, or in our code.

@sylvestre
Copy link
Sponsor Contributor

well spotted

Do you have the full backtrace?

@niyaznigmatullin
Copy link
Contributor Author

I explored something else. I got another result this time.

So we have two processes alive while it hangs: the root testing binary (25942), the process that was forked from root for a single test (25946). And the pr process (25952) was already terminated, but others were unaware of that.

This is their strace output:
strace.tar.gz

And in 25952 exec wasn't even called. The last syscall of 25952 is futex. 25946, its parent, were trying to read from pipe the output of 25952, but was failing in the end. 25942 was just waiting when 25946 terminates.

@niyaznigmatullin
Copy link
Contributor Author

Probably somehow similar to this one: rust-lang/rust#88585

@niyaznigmatullin
Copy link
Contributor Author

Temporarily fixed with #3918

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants