Fixed pernicious bug in autobatching #1127

neubig · 2017-12-08T04:56:37Z

OK, it took me literally all day to find a bug that can be fixed in 10 characters of code.

Basically, parallel memcpy (used in autobatching) requires memory to be transferred to the GPU, but this was being done in an asynchronous fashion, and as a result the memory wasn't necessarily arriving at the GPU before the memcpy kernel started executing.

This probably fixes #1057

redpony · 2017-12-08T09:21:54Z

Does this actually affect things? If these are executed on the same stream, the kernel shouldn’t start executing until after the copy completes. Is the story that this forced synch is expensive?

…

On Fri, 8 Dec 2017 at 4:56 am, Graham Neubig ***@***.***> wrote: OK, it took me literally all day to find a bug that can be fixed in 10 characters of code. Basically, parallel memcpy (used in autobatching) requires memory to be transferred to the GPU, but this was being done in an asynchronous fashion, and as a result the memory wasn't necessarily arriving at the GPU before the memcpy kernel started executing. This probably fixes #1057 <#1057> ------------------------------ You can view, comment on, or merge this pull request online at: #1127 Commit Summary - Fixed pernicious bug in autobatching File Changes - *M* dynet/exec.cc <https://github.com/clab/dynet/pull/1127/files#diff-0> (16) Patch Links: - https://github.com/clab/dynet/pull/1127.patch - https://github.com/clab/dynet/pull/1127.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1127>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAJba4ntwNqRtJ40Bb--2BJjnYWVu10jks5s-MGFgaJpZM4Q6lxk> .

neubig · 2017-12-08T13:53:48Z

I'm not actually sure whether the copy and the kernel are being executed on the same stream (probably not). Probably the best thing to do is enforce that the copy uses the same stream as the kernel, but I wasn't immediately sure how to do this.

And yes, I'm aware that the forced sync might be expensive, I haven't merged because I haven't had time to figure out the speed ramifications yet.

neubig · 2017-12-08T16:25:35Z

OK, so CudaMemcpy caused a performance degradation, but specifying a stream for CudaMemcpyAsync seems to be working for now. I'm not super-confident in this fix though, so after merging I'll continue to monitor for reported problems.

Fixed pernicious bug in autobatching

804a48b

neubig added 2 commits December 8, 2017 10:49

Merge branch 'master' of github.com:clab/dynet into fix-parallel-memcpy

84bc764

Reverted to async and added stream

3a35c66

neubig merged commit bc0bf05 into master Dec 8, 2017

neubig mentioned this pull request Dec 8, 2017

CUBLAS related error #1057

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed pernicious bug in autobatching #1127

Fixed pernicious bug in autobatching #1127

neubig commented Dec 8, 2017

redpony commented Dec 8, 2017 via email

neubig commented Dec 8, 2017

neubig commented Dec 8, 2017

Fixed pernicious bug in autobatching #1127

Fixed pernicious bug in autobatching #1127

Conversation

neubig commented Dec 8, 2017

redpony commented Dec 8, 2017 via email

neubig commented Dec 8, 2017

neubig commented Dec 8, 2017