-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix connect to PS on MacOS/X #7885
Conversation
3138 tests run: 3005 passed, 0 failed, 133 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
dc09a95 at 2024-05-27T09:58:51.470Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to commit as is to unblock yourself.
But what is the root cause here? async connect should works on macOS as well
Review from compute/safekeepr team is required
I wrote my hypothesis is Slack: WaitEventSet is implemented at MacOS using kevent. Because if I use In any case it was not the first time I found bug in MacOS (or may be even FreeBSD) socket implementation. So I am not so surprised. It will be excellent if somebody managed to make ti work but I do not want spent more time on it. |
I'm less worried about macOS itself, then the fact that this issue could be a manifestation of some bug or incorrect usage in async code.
pgfarm has both macos and freebsd. Async libpq is used logical replication tests, so it is covered by that tests. |
There are two suspicious places in async connect implementation in libpagestore.c (comparing with fe-connect.c and libpqwalreceiver.c):
So the only real difference with libpqwalreceiver.c is that last one uses |
Ok, I managed to make async connect top PS work also at MacOS, doing exactly the same as in libpqwalreceiver.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I vaguely remember having a similar issues with macOS in DMQ. (IIRC in some cases it also might have resulted in stuck connection attempts in linux; rewriting exactly as it is in walreceiver helped with that too).
(Don't forget to squash your commits into one) |
I still waiting for approval from somebody from @neondatabase/compute or @neondatabase/safekeepers team. |
I see nothing that prevents merging, you can merge now. @knizhnik |
## Problem After [0e4f182] which introduce async connect Neon is not able to connect to page server. ## Summary of changes Perform sync commit at MacOS/X ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Problem
After [0e4f182] which introduce async connect
Neon is not able to connect to page server.
Summary of changes
Perform sync commit at MacOS/X
Checklist before requesting a review
Checklist before merging