Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected (nil, nil) result from wincred.GetGenericCredential #32

Closed
frazeradam opened this issue Sep 13, 2022 · 11 comments
Closed

Unexpected (nil, nil) result from wincred.GetGenericCredential #32

frazeradam opened this issue Sep 13, 2022 · 11 comments

Comments

@frazeradam
Copy link

Let me start off by saying that I don't think this is a bug in the package, but I'm hoping you might have an idea on why I'm getting both a nil credential and a nil error coming back. I've traced it to unexpected results in sysCredRead where the wrapped system call is returning 1 (indicating success), yet pcred remains nil. Unsurprisingly this unexpected output causes invalid output up the chain of calls and panics if a nil error is presumed to mean the credential pointer is valid. The only at all atypical thing I'm doing is calling it from within CGo, but I can't see any reason why that should make a difference.

@danieljoos
Copy link
Owner

Sorry for the late response.

That issue sounds interesting. I could imagine some out-of-memory issues. Can you reproduce that behaviour reliably on your end? Is there anything interesting returned from GetLastError() when called after CredRead?

@mislav
Copy link

mislav commented Apr 7, 2023

I have successfully reproduced this in GitHub CLI based on a user report cli/cli#7283, but have been pulling my hair for one whole day while trying to come up with an isolated reproduction case for this. As soon as I touched anything about the (large) program that reproduces this, GetGenericCredential would start working again as designed. Sprinkling in some debug statements revealed that the failure occurs for some pointer values of pcred but not for others:

wincred/sys.go

Line 68 in 90f7265

var pcred *sysCREDENTIAL

If my hunch is true, then this is a memory issue and is dependent on the current memory allocation state of the program at the time that the syscall runs. When reading the Windows API documentation on CredReadW, I wasn't really clear on whether it expects us to pre-allocate a struct in memory, so I've tried that and it seemed to work. I've sent a PR that illustrates my workaround, but please take it with a grain of salt.

@danieljoos
Copy link
Owner

Thanks for the update.
I will also try to reproduce this on my Windows machine.
Any specific OS version that this appears on?

@mislav
Copy link

mislav commented Apr 19, 2023

@danieljoos I've been able to reproduce this on both Windows 10 and Windows 11.

To most reliably reproduce, you can try compiling cli/cli from source (via script\build), logging in via bin\gh.exe auth login -h github.com (to store the credential to wincred), and running bin\gh.exe release --repo cli/cli view v2.26.0.

Things I have tried that did not have any effect:

  • Use runtime.KeepAlive for variables passed to syscalls during sysCredRead()
  • Add //go:uintptrescapes compiler directive to sysCredRead() - this is likely a no-op since the implementation of Proc.Call already contains that directive
  • Adding a mutex around sysCredRead to ensure that they do not happen concurrently
  • Upgrading the sys dependency to latest version

Something interesting I have noticed: a some prominent callers of wincred.GetGenericCredentials actually handle cases when both return values were nil, indicating that they might have stumbled on this problem before and worked around it:

The internal function that does conversion from Windows to Go struct also has a guard that returns nil early if pcred is nil:

wincred/conversion.go

Lines 48 to 51 in 90f7265

func sysToCredential(cred *sysCREDENTIAL) (result *Credential) {
if cred == nil {
return nil
}

@danieljoos
Copy link
Owner

@mislav Thank you for the details. I was able to reproduce it following your instructions.

I really don't know what's happening here.
The CreadReadW syscall doesn't set the output object - the given pointer remains nil.

For your particular example, this always happens for the published release lookup and never for the draft release lookup. Running both without go-routines in sequence, causes everything to work just fine.

What I investigated so far:

  • Created an example project with hundreds of goroutines reading a single credential in parallel. Worked without issues.
  • Debugged the OS threads these goroutines do run in: Both different from the main thread, one of them works, the other not. So there's no strict need for this to run on the main thread (I remember some windows functions need this).
  • Tried to lock the OS thread to the given goroutines - no difference
  • Tried to lock access to the cred-read function via mutex - no difference
  • Investigated on the context cancelling below both goroutines (in cli/cli) - that point of the code isn't even reached.

There seems to be something that the publish release lookup (in cli/cli) does, which causes problems with the cred-read. Then again, when run in sequence without goroutines, everything works just fine.

@danieljoos
Copy link
Owner

danieljoos commented May 6, 2023

I traced the calls to CredReadW using API Monitor and Process Monitor.
It showed that the actual call to the DLL function sets the pointer value to a non-NULL value.
I'll continue to look into the issue and see if I can find out why the pointer on the Go-side of things is still nil.

@danieljoos
Copy link
Owner

danieljoos commented May 6, 2023

I'm pretty sure to have found the issue. It is discussed here:
golang/go#34684

Moving away from LazyProc Calling syscall.SyscallN directly seems to solve it. I'm still trying to play around with it a bit and see if I can come up with a PR soon.

@mislav
Copy link

mislav commented May 8, 2023

@danieljoos Thanks so much for diving into this.

I was also looking at that Go issue, but it seemed that since that issue was opened, changes landed to Go that made it "safer" to use LazyProc.Call w.r.t. pointers. However, since the issue is still open, maybe there are still some problems with pointer conversions in that space, but I must admit I don't understand the root of the problem.

If switching to SyscallN addresses the issue, that would be very welcome.

@chkohner
Copy link

chkohner commented May 8, 2023

We've started hitting this as well. Looking forward to your PR.

@danieljoos
Copy link
Owner

danieljoos commented May 14, 2023

Please have a look at the changes if you like:
#42

If you're all fine with that, I'll merge it and tag a new version.

@danieljoos
Copy link
Owner

Closing this for the moment. Issue is hopefully resolved in version v1.2.0.
Please feel free to reopen the issue if you still see it happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants