Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc.(*Process).Continue skips breakpoints [WIP] #259

Merged
merged 11 commits into from
Jan 9, 2016

Conversation

aarzilli
Copy link
Member

This is a fix for the problem unearthed by PR #255, it is submitted for discussion, not definitive.

There are two problems with (*Process).Continue

  1. The first is that between the time one breakpoint is triggered and the point where we loop over all threads and Halt eacho one the second goroutine may hit the breakpoint too, this second hit gets lost
  2. The second problem is that the way Continue resumes the process has a race condition with the tracee

Let's that thread A hit the breakpoint and thread B was sleeping when Continue called Halt on it. After we finish processing the breakpoint on A we call Continue again, Continue then loops through all threads, it calls resume on B, then moves onto thread A, sees that it's on a breakpoint so it calls (*Thread).Step().
If thread B happens to get to the breakpoint while (*Thread).Step on thread A is happening the breakpoint gets skipped, because we temporarily cleared it.

You can test this by reverting the changes and running TestBreakpointCounts. The fact that multiple breakpoints can be hit simultaneously needs to be propagated all the way to the client, either by changing CurrentBreakpoint in State to a slice or by returning one breakpoint at a time in Debugger.

@derekparker
Copy link
Member

The code changes look good, but I agree there is some more discussion that needs to happen here.

The fact that multiple breakpoints can be hit simultaneously needs to be propagated all the way to the client, either by changing CurrentBreakpoint in State to a slice or by returning one breakpoint at a time in Debugger.

There are some UI considerations with something like this. Let me organize some thoughts on this, because it warrants some discussion.

@aarzilli
Copy link
Member Author

I've also noticed that the test fails on OSX (ironically, by triggering the breakpoint too many times). Either there is a similar bug or my change broke it.

@aarzilli
Copy link
Member Author

aarzilli commented Oct 2, 2015

Thanks to @ignatov I found a bug with my changes to Continue which would cause it to never advance when a breakpoint was set on a 1 byte instruction, it was caused by a misunderstanding on my part of the behaviour of PC over soft-breakpoint instructions (on hindsight it's pretty obvious what happens).

I also found a second race condition in Continue when the traced program terminates. I've fixed this for linux on a second commit, I don't know if this also happens on OSX. I have seen this behaviour also in master, I don't think my patch introduced it.

@derekparker
Copy link
Member

@aarzilli please rebase on master. I'm going to review and also work on top of this branch, and continue the discussion above.

@aarzilli
Copy link
Member Author

aarzilli commented Oct 7, 2015

Done, note that this is still on top of #255.

@derekparker
Copy link
Member

#255 has been merged. Can you please rebase again, I'd like to work on merging your patches in.

@aarzilli aarzilli force-pushed the fix-247 branch 2 times, most recently from 9d546c3 to 7e95711 Compare October 10, 2015 07:26
@aarzilli
Copy link
Member Author

Rebased.

@@ -91,6 +91,7 @@ func (thread *Thread) Step() (err error) {
}

err = thread.singleStep()
pc, _ = thread.PC()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems leftover?

@derekparker
Copy link
Member

The test TestBreakpointCounts consistently fails for me on OSX, I'll try and dig in a bit further to see why. Could be related to the OSX test flickers. I would love to get rid of those...

@aarzilli
Copy link
Member Author

We also need to decide what to do with API and UI, right now we are counting breakpoint hits correctly but only reporting the first one of each group to the user.

@ignatov
Copy link

ignatov commented Oct 17, 2015

Please ping me when it will be merged.

@derekparker
Copy link
Member

We also need to decide what to do with API and UI, right now we are counting breakpoint hits correctly but only reporting the first one of each group to the user.

Thinking about this a bit more, I have some thoughts which I would love input on from @ignatov, @ironcladlou and @dlsniper since they are working on actual client integrations.

First let me ask, how you you all like this information presented to you? Specifically, since we have the concept of "current thread" and "current goroutine", is it enough for Delve to decide which "current breakpoint" to present to clients? If not, would it make sense to return a slice of current breakpoints? To me that would be confusing. Would it instead make more sense to continue to return information on the "current thread / goroutine" and additionally provide a slice containing all thread state, which includes current breakpoints. That way, you could easily use the "current breakpoint" of the "current thread", or check out all the other threads and see if they have a breakpoint set.

Would really like feedback on this, and I'd really like to get this merged in asap.

@ironcladlou
Copy link
Contributor

Having access to "all" plus an indicator of "current" sounds workable to me.

I have no background in debugger internals or debugger frontend development prior to messing with Delve/Sublime, so I'm winging it and happily defer to those who have already thought through these issues. I've been learning a lot from go-lang-idea-plugin, and since IntelliJ's debugger rocks, I normally assume whatever is needed to make the IntelliJ plugin work is probably more than enough for my little Sublime plugin. 😁

@aarzilli
Copy link
Member Author

I have moved Breakpoint and BreakpointInfo inside Thread and added a Threads slice to DebuggerState. Current breakpoints can be retrieved by iterating through Threads and looking at non nil Thread.Breakpoints, DebuggerState.CurrentThread.Breakpoint is the current thread's breakpoint.

Is this acceptable?

@derekparker
Copy link
Member

@aarzilli that seems fine to me.

@derekparker
Copy link
Member

The tests here seem to consistently fail on OSX. I'll try and dig into it so we can get this merged.

@aarzilli
Copy link
Member Author

Added a fix for #262

@aarzilli aarzilli mentioned this pull request Nov 10, 2015
@aarzilli
Copy link
Member Author

done.

@derekparker
Copy link
Member

@aarzilli pulled branch and ran tests several times, getting failures probably 80% of the time on OSX. Are you getting any failures in your OSX VM?

@aarzilli
Copy link
Member Author

aarzilli commented Dec 1, 2015

I see what I was seeing here. A rare failure where the process runs to completion immediately after we start it.

It did not happen with the time.Sleep, if it's still the race condition it would make sense that it's more frequent on real hardware.

@derekparker
Copy link
Member

@aarzilli update: didn't have time to look into this today. I'll be spending some time tomorrow trying to figure out the issue on OSX, with the goal being to get this merged in right after that's sorted out.

@aarzilli
Copy link
Member Author

Rebased on master at ad5097a (no conflicts)

@aarzilli
Copy link
Member Author

It seems that the behaviour of go/constant has changed on go-tip, that's what's breaking the tests.

@derekparker
Copy link
Member

@aarzilli interesting. I'm still trying to track down OSX issues, however I haven't had much time recently to dig into it. I may actually have time today, so let's hope I can figure something out and we can get this merged in. Unrelated, but I've also been working on implementing function calls. That work is sitting on a branch (maybe I'll send a WIP PR just to publicly track the work). I haven't touched it in a week or so (no time) but it was working in a very limited case, but there were things to consider that I wasn't sure about, which maybe would be another good reason to submit a WIP PR and begin discussion around a few things. Anyways, hopefully we can address the changes on tip and the OSX issues with this PR and keep pushing forward.

@derekparker
Copy link
Member

@aarzilli leaving these traces here in case they jump out to you for any reason:

make test-proc-run RUN=Next
go test -exec=/Users/derekparker/code/go/src/github.com/derekparker/delve/scripts/testsign -ldflags="-s -X main.Build=fd1f97d449541403a2fac75019a4df75d5d852be" -test.run="Next" github.com/derekparker/delve/proc
foo
fatal error: unexpected signal during runtime execution
hi 9
[signal 0xa code=0x2 addr=0xc80000331b pc=0x4012e14]

goroutine 50 [running]:
runtime.throw(0x453e1a0, 0x2a)
    /usr/local/go/src/runtime/panic.go:527 +0x90 fp=0xc82034b4f8 sp=0xc82034b4e0
runtime.sigpanic()
    /usr/local/go/src/runtime/sigpanic_unix.go:12 +0x5a fp=0xc82034b548 sp=0xc82034b4f8
runtime.mallocgc(0x9, 0x4379040, 0x1, 0x49672)
    /usr/local/go/src/runtime/malloc.go:585 +0x314 fp=0xc82034b618 sp=0xc82034b548
runtime.newarray(0x4379040, 0x9, 0x9f)
    /usr/local/go/src/runtime/malloc.go:777 +0xc9 fp=0xc82034b658 sp=0xc82034b618
runtime.makeslice(0x4369920, 0x9, 0x9, 0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/slice.go:32 +0x165 fp=0xc82034b6a8 sp=0xc82034b658
github.com/derekparker/delve/proc.parseG(0xc8202c8f40, 0x8201d7200, 0x0, 0xc8203212e0, 0x0, 0x0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/variables.go:320 +0x94 fp=0xc82034b8d8 sp=0xc82034b6a8
github.com/derekparker/delve/proc.(*Thread).GetG(0xc8202c8f40, 0xc8200804d0, 0x0, 0x0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/threads.go:281 +0x347 fp=0xc82034b980 sp=0xc82034b8d8
github.com/derekparker/delve/proc.(*Breakpoint).checkCondition(0xc8200925a0, 0xc8202c8f40, 0x0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/breakpoints.go:116 +0x3d fp=0xc82034b9a8 sp=0xc82034b980
github.com/derekparker/delve/proc.(*Thread).SetCurrentBreakpoint(0xc8202c8f40, 0x0, 0x0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/threads.go:330 +0x10c fp=0xc82034ba10 sp=0xc82034b9a8
github.com/derekparker/delve/proc.(*Process).setCurrentBreakpoints(0xc82007a2a0, 0xc8202c8f40, 0x0, 0x0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc_darwin.go:318 +0x31 fp=0xc82034ba50 sp=0xc82034ba10
github.com/derekparker/delve/proc.(*Process).Continue(0xc82007a2a0, 0x0, 0x0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc.go:342 +0x183 fp=0xc82034bb00 sp=0xc82034ba50
github.com/derekparker/delve/proc.(*Process).Next(0xc82007a2a0, 0x0, 0x0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc.go:290 +0x3b1 fp=0xc82034bbf0 sp=0xc82034bb00
github.com/derekparker/delve/proc.TestNextConcurrent.func1(0xc82007a2a0, 0x44c59f0, 0xd, 0xc8200ac820, 0x47, 0xc820527ec0, 0x56)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc_test.go:352 +0x74a fp=0xc82034bdc0 sp=0xc82034bbf0
github.com/derekparker/delve/proc.withTestProcess(0x44c59f0, 0xd, 0xc820340120, 0xc82034bf28)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc_test.go:41 +0x339 fp=0xc82034bed0 sp=0xc82034bdc0
github.com/derekparker/delve/proc.TestNextConcurrent(0xc820340120)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc_test.go:364 +0xb7 fp=0xc82034bf58 sp=0xc82034bed0
testing.tRunner(0xc820340120, 0x46bd4f8)
    /usr/local/go/src/testing/testing.go:456 +0x98 fp=0xc82034bf90 sp=0xc82034bf58
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc82034bf98 sp=0xc82034bf90
created by testing.RunTests
    /usr/local/go/src/testing/testing.go:561 +0x86d

goroutine 1 [chan receive]:
testing.RunTests(0x4574320, 0x46bd420, 0x25, 0x25, 0x1)
    /usr/local/go/src/testing/testing.go:562 +0x8ad
testing.(*M).Run(0xc820037f08, 0x0)
    /usr/local/go/src/testing/testing.go:494 +0x70
github.com/derekparker/delve/proc/test.RunTestsWithFixtures(0xc820037f08, 0x44bbcd0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/test/support.go:65 +0x30
github.com/derekparker/delve/proc.TestMain(0xc820037f08)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc_test.go:26 +0x21
main.main()
    github.com/derekparker/delve/proc/_test/_testmain.go:124 +0x113

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 51 [chan receive, locked to thread]:
github.com/derekparker/delve/proc.(*Process).handlePtraceFuncs(0xc82007a2a0)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc.go:656 +0xc3
created by github.com/derekparker/delve/proc.New
    /Users/derekparker/code/go/src/github.com/derekparker/delve/proc/proc.go:71 +0x1d4
FAIL    github.com/derekparker/delve/proc   2.064s
make: *** [test-proc-run] Error 1

AND

make test-proc-run RUN=Next                                                                                                            {2}
go test -exec=/Users/derekparker/code/go/src/github.com/derekparker/delve/scripts/testsign -ldflags="-s -X main.Build=fd1f97d449541403a2fac75019a4df75d5d852be" -test.run="Next" github.com/derekparker/delve/proc
foo
hi 0
hi 4
hi 6
hi 1
hi 8
hi 0
hi 8
hi 5
hi 9
hi 9
hi 7
hi 7
hi 4
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x8a0 pc=0x2041]

goroutine 7 [running]:
main.sayhi(0x2, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:8 +0x1
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x8201e047c)
    /usr/local/go/src/runtime/sema.go:43 +0x26
sync.(*WaitGroup).Wait(0x8201e0470)
    /usr/local/go/src/sync/waitgroup.go:126 +0xb4
main.main()
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:20 +0xb9

goroutine 5 [runnable]:
main.sayhi(0x0, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:8
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 6 [runnable]:
main.sayhi(0x1, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:8
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 8 [runnable]:
main.sayhi(0x3, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:8
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 9 [runnable]:
main.sayhi(0x4, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:8
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 10 [runnable]:
main.sayhi(0x5, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:9 +0xce
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 11 [runnable]:
main.sayhi(0x6, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:8
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 12 [runnable]:
main.sayhi(0x7, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:8
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 13 [runnable]:
main.sayhi(0x8, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:8
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88

goroutine 14 [runnable]:
fmt.glob.func1(0x0, 0x0)
    /usr/local/go/src/fmt/print.go:128 +0x2f
sync.(*Pool).getSlow(0x191f20, 0x0, 0x0)
    /usr/local/go/src/sync/pool.go:141 +0x188
sync.(*Pool).Get(0x191f20, 0x0, 0x0)
    /usr/local/go/src/sync/pool.go:117 +0x141
fmt.newPrinter(0x1b5830)
    /usr/local/go/src/fmt/print.go:133 +0x27
fmt.Fprintln(0x5f89000, 0x820200010, 0x8201fa788, 0x2, 0x2, 0xca00, 0x0, 0x0)
    /usr/local/go/src/fmt/print.go:253 +0x2d
fmt.Println(0x8201fa788, 0x2, 0x2, 0xc0100, 0x0, 0x0)
    /usr/local/go/src/fmt/print.go:264 +0x73
main.sayhi(0x9, 0x8201e0470)
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:9 +0x1a0
created by main.main
    /Users/derekparker/code/go/src/github.com/derekparker/delve/_fixtures/parallel_next.go:18 +0x88
--- FAIL: TestNextConcurrentVariant2 (0.30s)
    proc_test.go:61: failed assertion at proc_test.go:405: Continue 2 - Process 47797 has exited with status 2
foo
Hello, World!
bye
hi
FAIL
FAIL    github.com/derekparker/delve/proc   5.646s
make: *** [test-proc-run] Error 1

Those are the panics I get. I also get errors like "expected to be at line 11, was at line 10", e.g.:

--- FAIL: TestNextConcurrentVariant2 (0.27s)
    proc_test.go:410: Program did not continue to correct next location expected 10 was parallel_next.go:11

The other failure condition I see is that the test will just run forever. Also, occasionally it passes. Do you see any of these errors running bare metal on Linux? If not, I suspect it may be hidden in the C code (or more concretely the interfacing between Go and C) that is causing the issue, since that would be the major difference there since on Darwin we call into C a lot of syscalls and such.

I'm going to keep debugging this for a bit and see if I get anywhere but I figured I'd leave those traces around for you or anyone else who may want to take a stab at debugging this on OSX.

@derekparker
Copy link
Member

@aarzilli can you rebase this on master? I'm going to spend some time tomorrow in this branch, trying to fix the OSX issues.

@aarzilli
Copy link
Member Author

Done, did you have the time to check if it's the same weird launch race condition that's 'fixed' by the time.Sleep?

@derekparker
Copy link
Member

@aarzilli yeah I did look into that but it didn't seem to fix it. I think I'm running into another issue, but I think there may still be a race somewhere between Delve and the debugged process manifesting itself oddly. Looking at the stack traces above, the panic always happens in the same spot, when we're trying to parse the G struct, I just haven't had much time to actually dig into it, but I'm going to start now and see what I come up with either today or tomorrow.

Breakpoints are skipped either because:
1. when multiple breakpoints are hit simultaneously only one is
processed
2. a thread hits a breakpoint while another thread is being
singlestepped over the breakpoint.

Additionally fixed a race condition between Continue and tracee
termination.
Next sets its temporary breakpoints with the condition that they
must only activate on the current goroutine, and then calls Continue
When Continue encounters a temporary breakpoint it clears all
the breakpoint.

User visible changes: breakpoints that get hit while executing Next
are not ignored.

This commit does not implement full conditional breakpoints
functionality, the only condition that can be set is on the
goroutine id.

Fixes race conditions in Next affecting TestNextConcurrent.
resume loops in continueOnce moved to a OS specific resume function,
this makes the problem easier to deal with and seems to be more
appropriate to a windows port given what transpired from discussion
of Pull Request go-delve#276
Sometimes after PtraceSingleStep the thread does not advance of a
single instruction but is, instead, blocked immediately by a SIGSTOP
Made singleStep repeat the process until a SIGTRAP is observed.

Unsure where the SIGSTOP comes from.
@aarzilli
Copy link
Member Author

aarzilli commented Jan 9, 2016

added the stuff, rebased on master.

@derekparker derekparker merged commit 4266479 into go-delve:master Jan 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants