Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connect to PS on MacOS/X #7885

Merged
merged 2 commits into from
May 27, 2024
Merged

Fix connect to PS on MacOS/X #7885

merged 2 commits into from
May 27, 2024

Conversation

knizhnik
Copy link
Contributor

Problem

After [0e4f182] which introduce async connect
Neon is not able to connect to page server.

Summary of changes

Perform sync commit at MacOS/X

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@knizhnik knizhnik requested review from a team as code owners May 25, 2024 19:03
@knizhnik knizhnik requested review from tristan957 and jcsp May 25, 2024 19:03
Copy link

github-actions bot commented May 25, 2024

3138 tests run: 3005 passed, 0 failed, 133 skipped (full report)


Code coverage* (full report)

  • functions: 31.4% (6447 of 20543 functions)
  • lines: 48.3% (49949 of 103355 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
dc09a95 at 2024-05-27T09:58:51.470Z :recycle:

Copy link
Contributor

@kelvich kelvich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to commit as is to unblock yourself.

But what is the root cause here? async connect should works on macOS as well

@knizhnik
Copy link
Contributor Author

Feel free to commit as is to unblock yourself.

Review from compute/safekeepr team is required

But what is the root cause here? async connect should works on macOS as well

I wrote my hypothesis is Slack: WaitEventSet is implemented at MacOS using kevent.
Looks like there are some issues with kevent when there are different events registered for the same socket.

Because if I use WaitLatchOrSocket (which creates wait event set and delete at the end) instead of WaitEventSetWait then connection is established but then still blocked in read.

In any case it was not the first time I found bug in MacOS (or may be even FreeBSD) socket implementation. So I am not so surprised.

It will be excellent if somebody managed to make ti work but I do not want spent more time on it.

@kelvich
Copy link
Contributor

kelvich commented May 26, 2024

I'm less worried about macOS itself, then the fact that this issue could be a manifestation of some bug or incorrect usage in async code.

Looks like there are some issues with kevent when there are different events registered for the same socket.

pgfarm has both macos and freebsd. Async libpq is used logical replication tests, so it is covered by that tests.

@knizhnik
Copy link
Contributor Author

I'm less worried about macOS itself, then the fact that this issue could be a manifestation of some bug or incorrect usage in async code.

Looks like there are some issues with kevent when there are different events registered for the same socket.

pgfarm has both macos and freebsd. Async libpq is used logical replication tests, so it is covered by that tests.

There are two suspicious places in async connect implementation in libpagestore.c (comparing with fe-connect.c and libpqwalreceiver.c):

  1. It is calling PQconnectPoll before WaitEventSetWait
  2. wes_write includes WL_SOCKET_READABLE
    But fixing this issues does't solve the problem.

So the only real difference with libpqwalreceiver.c is that last one uses WaitLatchOrSocket which creates temporary wait even set and releases it after operation. And @MMeent implementation precreates two event sets which are reused in several operations. This two event sets refers to the same socket. Which may cause some problems, although I have not founding internet that it can not be done or there are some related bugs.

@knizhnik
Copy link
Contributor Author

Ok, I managed to make async connect top PS work also at MacOS, doing exactly the same as in libpqwalreceiver.c

Copy link
Contributor

@kelvich kelvich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I vaguely remember having a similar issues with macOS in DMQ. (IIRC in some cases it also might have resulted in stuck connection attempts in linux; rewriting exactly as it is in walreceiver helped with that too).

@kelvich
Copy link
Contributor

kelvich commented May 27, 2024

(Don't forget to squash your commits into one)

@knizhnik
Copy link
Contributor Author

I still waiting for approval from somebody from @neondatabase/compute or @neondatabase/safekeepers team.

@koivunej koivunej requested a review from MMeent May 27, 2024 12:27
@petuhovskiy
Copy link
Member

image

I see nothing that prevents merging, you can merge now. @knizhnik

@koivunej koivunej mentioned this pull request May 27, 2024
@knizhnik knizhnik merged commit d61e924 into main May 27, 2024
64 checks passed
@knizhnik knizhnik deleted the macos_fixes branch May 27, 2024 12:57
koivunej pushed a commit that referenced this pull request May 27, 2024
## Problem

After [0e4f182] which introduce async
connect
Neon is not able to connect to page server.

## Summary of changes

Perform sync commit at MacOS/X

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants