-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO can be duplicated in MT #8438
Comments
Duplicate of #8140 |
Hmm that's related though I'm not sure if it's an exact dupe, since #8140 refers to the output getting "mangled" whereas here it's somehow being duplicated [??] The underlying cause might be related...with shorter streams it works almost as expected, with longer there appears to be duplication...? |
The different threads are competing against the internal buffer of STDOUT. puts tries to write the message then the LF which will trigger a buffer flush, but there are no protections so the same buffer may be printed by different threads in parallel which may still be mutated by other threads... I.e. output is mangled. |
Interesting. I would have expected to be able to write to an I/O from various fibers without mutated collisions. Maybe buffered IO should be synchronized internally...hmm... :) |
I think we should at least make the top-level |
No. IO isn't thread safe, and neither are STDIN, STDOUT or STDERR, for the same reasons that Array and Hash aren't. Each and every read/write would have to lock the IO and performance of every IO operation would be heavily impacted. I agree with @asterite that the global methods should be thread-safe. It's a little more wary to have the global STDIN , STDOUT and STDERR be also thread-safe, since they can (and are) used safely by Logger for example —messages are sent through a Channel. |
I agree. I crosses my initial thoughts about that as soon as I finished writing it.
👍 |
Would it be faster if we run a fiber for each of |
Hmm since underlying system calls to file descriptors are thread safe I expected IO to also be thread safe but I guess it's basically a judgment call. It seems a bit odd that it allows you to accidentally clobber internal buffers though, and write junk output, if you happen to write to the same file instance from two fibers. And the overhead of the IO is already so high, for file based IO might be worth synchronizing. It would also be nice if somehow clobbering arrays were detected in general, though that might be hard. Maybe a race detector compiler option or panic on concurrent read+write, like go: https://stackoverflow.com/a/38140573/32453. It might not be too hard to add "simple" race detection to Array for instance, maybe just have an integer that gets incremented then decremented so if a second thread enters, it can detect the integer is already 1 and raise? :) |
As far as I know, locks which are never contended are super cheap. I don't see why you'd summarily dismiss locking Please, benchmark this before throwing around "IO performance would be heavily impacted". |
But I guess for IO we'll have to use reentrant mutexes because |
@asterite you'd have a single mutex for flushing the buffer, and an atomic buffer is easy: store the pointer to the end of the buffer in an atomic, use This doesn't ensure any good ordering, but it does ensure each individual write is atomic, for cheap. |
I was impressed recently with how cheap Mutex.synchronize was for an experiment I ran.. :) |
Hello, is this issue closed now? It's been 2 years :) |
I don't think it has been fixed yet... |
Still a problem using Crystal 1.10.1:
|
I don't think putting a mutex on top-level methods would be a good idea. It would be super confusing if I'm totally up for testing the impact of mutexes. It could well be worth it with low overhead when uncontested. |
What about a compiler flag that decides how IO works (mutexed or raw)? |
Here's a return of experience, trying to print millions of lines to STDERR from many threads. I tried to trace the GC and the fiber scheduler using a mutex for MT and... performance got unbearably awful. I went to buffer each message (on the stack) then print it with a single We can reproduce the logic in simple crystal: # disable write buffering (for thread safety)
STDOUT.sync = true
STDOUT.flush_on_newline = true # probably not needed
1000.times do |n|
spawn do
# don't use #puts that writes twice:
STDOUT.print "hello [#{n}]\n"
end
end
sleep 2 $ crystal run -Dpreview_mt xyz.cr | wc -l
1000 |
The trick has already been used in #14220 for example. Though reading the patch I notice that I write then flush, which means there's still a time window for a race condition, but it works, so... 🤔 Ooooh WARNING: that only works for the STDIO objects! I forgot to say that only writes to pipes are guaranteed to be atomic up to |
Having to use class IO::FileDescriptor
def puts(obj) : self
buffer = String.build do |io|
io << obj
io << "\n" unless io.bytesize > 0 && io.buffer[bytesize - 1] === '\n'
end
write_string(buffer)
self
end
end For small strings, a vectored write ( |
We should avoid duplicating the string (no need to pressure the GC with thrown away allocations), and a workaround would be to call But |
The Win32 equivalent is |
@ysbaddaden |
I misread your example, because of I guess it would be fine it we can have a |
Apologies if I'm missing something.
This code:
If run in multi thread, the output will occasionally have repeated numbers in the lines. Output will seem to have extra lines. In non multi-thread mode, output is the expected 1000 lines always.
Thanks.
The text was updated successfully, but these errors were encountered: