Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO can be duplicated in MT #8438

Open
rdp opened this issue Nov 5, 2019 · 25 comments
Open

IO can be duplicated in MT #8438

rdp opened this issue Nov 5, 2019 · 25 comments

Comments

@rdp
Copy link
Contributor

rdp commented Nov 5, 2019

Apologies if I'm missing something.

This code:

1000.times do |n|
  spawn do
    puts "hello [#{n}]".inspect
  end
end
sleep 2

If run in multi thread, the output will occasionally have repeated numbers in the lines. Output will seem to have extra lines. In non multi-thread mode, output is the expected 1000 lines always.

$ ./bin/crystal run -Dpreview_mt --release ~/bad4.cr | wc -l
Using compiled compiler at `.build/crystal'
1517 # some are repeated, some are garbled, garbled is expected/OK
$ ./bin/crystal run -Dpreview_mt --release ~/bad4.cr | wc -l
Using compiled compiler at `.build/crystal'
1510
$ ./bin/crystal run  --release ~/bad4.cr | wc -l
Using compiled compiler at `.build/crystal'
1000
$ ./bin/crystal -v
Using compiled compiler at `.build/crystal'
Crystal 0.32.0-dev [d4c8dfdcb] (2019-11-05)

Thanks.

@ysbaddaden
Copy link
Contributor

Duplicate of #8140

@ysbaddaden ysbaddaden marked this as a duplicate of #8140 Nov 5, 2019
@rdp
Copy link
Contributor Author

rdp commented Nov 5, 2019

Hmm that's related though I'm not sure if it's an exact dupe, since #8140 refers to the output getting "mangled" whereas here it's somehow being duplicated [??] The underlying cause might be related...with shorter streams it works almost as expected, with longer there appears to be duplication...?

@ysbaddaden
Copy link
Contributor

The different threads are competing against the internal buffer of STDOUT. puts tries to write the message then the LF which will trigger a buffer flush, but there are no protections so the same buffer may be printed by different threads in parallel which may still be mutated by other threads...

I.e. output is mangled.

@rdp
Copy link
Contributor Author

rdp commented Nov 5, 2019

Interesting. I would have expected to be able to write to an I/O from various fibers without mutated collisions. Maybe buffered IO should be synchronized internally...hmm... :)

@asterite
Copy link
Member

asterite commented Nov 5, 2019

I think we should at least make the top-level puts, print and printf use a global mutex. With these functions it's not clear something is being shared so it's probably more intuitive, even though slightly slower, to make it work as expected. With an explicit IO or doing STDOUT.puts, io.puts, etc. it's a bit more clear which object is being shared, though I wouldn't mind making STDOUT, STDERR and STDIN all also have a mutex so that they work as expected in MT. On second thought, maybe just doing it in top-level puts, print, etc. is fine because for IO in general there are many low-level methods like read and write and it's a bit hard to protect them all using a mutex.

@ysbaddaden
Copy link
Contributor

No. IO isn't thread safe, and neither are STDIN, STDOUT or STDERR, for the same reasons that Array and Hash aren't. Each and every read/write would have to lock the IO and performance of every IO operation would be heavily impacted.

I agree with @asterite that the global methods should be thread-safe. It's a little more wary to have the global STDIN , STDOUT and STDERR be also thread-safe, since they can (and are) used safely by Logger for example —messages are sent through a Channel.

@asterite
Copy link
Member

asterite commented Nov 5, 2019

IO isn't thread safe, and neither are STDIN, STDOUT or STDERR

I agree. I crosses my initial thoughts about that as soon as I finished writing it.

the global methods should be thread-safe

👍

@jhass
Copy link
Member

jhass commented Nov 5, 2019

Would it be faster if we run a fiber for each of STDIN, STDOUT and STDERR and have the global methods use a channel to send their data to them?

@rdp
Copy link
Contributor Author

rdp commented Nov 6, 2019

Hmm since underlying system calls to file descriptors are thread safe I expected IO to also be thread safe but I guess it's basically a judgment call. It seems a bit odd that it allows you to accidentally clobber internal buffers though, and write junk output, if you happen to write to the same file instance from two fibers. And the overhead of the IO is already so high, for file based IO might be worth synchronizing. It would also be nice if somehow clobbering arrays were detected in general, though that might be hard. Maybe a race detector compiler option or panic on concurrent read+write, like go: https://stackoverflow.com/a/38140573/32453. It might not be too hard to add "simple" race detection to Array for instance, maybe just have an integer that gets incremented then decremented so if a second thread enters, it can detect the integer is already 1 and raise? :)

@RX14
Copy link
Contributor

RX14 commented Nov 13, 2019

As far as I know, locks which are never contended are super cheap. I don't see why you'd summarily dismiss locking unbuffered_write/unbuffered_read on IOs. It will not decrease performance in the case where the IO is uncontended (two memory writes is noise next to a context switch), and make writing to an IO somewhat atomic. I suspect that appending to the IO::Buffered buffer can be made lockless until the buffer fills too.

Please, benchmark this before throwing around "IO performance would be heavily impacted".

@asterite
Copy link
Member

But I guess for IO we'll have to use reentrant mutexes because unbuffered_write will be called from, say, puts, and you'll want to synchronize it at that level too. And reentrant mutexes are slower. But yeah, we could benchmark it.

@RX14
Copy link
Contributor

RX14 commented Nov 13, 2019

@asterite you'd have a single mutex for flushing the buffer, and an atomic buffer is easy: store the pointer to the end of the buffer in an atomic, use #add() to add the length of data you're about to write (doing bounds checks of course) and then write to the old pointer returned by #add. When a fiber needs to flush the buffer (old_ptr + len > buf_size), it waits on the mutex.

This doesn't ensure any good ordering, but it does ensure each individual write is atomic, for cheap.

@rdp
Copy link
Contributor Author

rdp commented Nov 19, 2019

I was impressed recently with how cheap Mutex.synchronize was for an experiment I ran.. :)

@Dan-Do
Copy link

Dan-Do commented Nov 19, 2021

Hello, is this issue closed now? It's been 2 years :)

@rdp
Copy link
Contributor Author

rdp commented Nov 25, 2021

I don't think it has been fixed yet...

@soupglasses
Copy link

soupglasses commented Feb 12, 2024

Still a problem using Crystal 1.10.1:

% crystal run -Dpreview_mt --release ./bad4.cr | wc -l 
1150
% crystal run -Dpreview_mt --release ./bad4.cr | wc -l 
1255
% crystal run --release ./bad4.cr | wc -l
1000
% crystal -v
Crystal 1.10.1 (1980-01-01)

LLVM: 15.0.7
Default target: x86_64-unknown-linux-gnu

@straight-shoota
Copy link
Member

I don't think putting a mutex on top-level methods would be a good idea. It would be super confusing if ::puts and STDOUT.puts behave differently. And when only one of them is safe against parallel execution, there could still be duplicate content.
So we should either put a mutex on the entire IO or leave it entirely to external synchronization (e.g. via Log).

I'm totally up for testing the impact of mutexes. It could well be worth it with low overhead when uncontested.

@vvilliamperez
Copy link

What about a compiler flag that decides how IO works (mutexed or raw)?

@ysbaddaden
Copy link
Contributor

Here's a return of experience, trying to print millions of lines to STDERR from many threads.

I tried to trace the GC and the fiber scheduler using a mutex for MT and... performance got unbearably awful. I went to buffer each message (on the stack) then print it with a single LibC.write that is guaranteed to be atomic up to PIPE_BUF (POSIX says its 512 bytes but on Linux it's 4096), and performance was only somewhat impacted by the trace (nothing unbearable).

We can reproduce the logic in simple crystal:

# disable write buffering (for thread safety)
STDOUT.sync = true
STDOUT.flush_on_newline = true # probably not needed

1000.times do |n|
  spawn do
    # don't use #puts that writes twice:
    STDOUT.print "hello [#{n}]\n"
  end
end
sleep 2
$ crystal run -Dpreview_mt xyz.cr | wc -l
1000 

@ysbaddaden
Copy link
Contributor

The trick has already been used in #14220 for example. Though reading the patch I notice that I write then flush, which means there's still a time window for a race condition, but it works, so... 🤔

Ooooh IO::FileDescriptor.from_stdio disables buffering for STDIO objects! That means to avoid interleaved messages (up to PIPE_BUF size) we can just print with an explicit newline character instead of calling puts.

WARNING: that only works for the STDIO objects! I forgot to say that only writes to pipes are guaranteed to be atomic up to PIPE_BUF by POSIX. Writing to a Socket doesn't have that guarantee, even with the buffer disabled, and writing to an IO::Memory object from multiple threads is unsafe (from corrupted output to segfaults).

@straight-shoota
Copy link
Member

Having to use print with explicit new-lines is rather inconvenient compared to puts. Perhaps we can use some tricks to make this work?
We could consider creating a string buffer at the IO level in order to add a trailing newline.

class IO::FileDescriptor
  def puts(obj) : self
    buffer = String.build do |io|
      io << obj
      io << "\n" unless io.bytesize > 0 && io.buffer[bytesize - 1] === '\n'
    end
    write_string(buffer)
    self
  end
end

For small strings, a vectored write (writev) could be useful.

@ysbaddaden
Copy link
Contributor

We should avoid duplicating the string (no need to pressure the GC with thrown away allocations), and a workaround would be to call print with an explicit newline character.

But writev(2) is appealing since it will write the two buffers atomically... with an exception for pipes —when the pipe buffer is full?

@HertzDevil
Copy link
Contributor

The Win32 equivalent is WriteFileGather, which requires page-aligned buffers...?

@straight-shoota
Copy link
Member

@ysbaddaden print with explicit newline character is only a workaround for puts with an argument that is a string literal or string interpolation. Printing arbitrary objects typically involves writing individual small pieces. That's how most #to_s(IO) methods are implemented. This cannot be made thread-safe without explicit buffering.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Apr 23, 2024

I misread your example, because of IO#<< that might call #print(String) —or is it #write(String)? I never recall which is what— but it might also call obj.to_s(io) I think?. Frankly, the IO interface can be very confusing.

I guess it would be fine it we can have a #puts(String|Slice) overload that avoids a "#{str}\n".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

10 participants