Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional encoding errors #483

Open
nigelmegitt opened this issue May 8, 2018 · 10 comments
Open

Occasional encoding errors #483

nigelmegitt opened this issue May 8, 2018 · 10 comments
Assignees
Labels
Milestone

Comments

@nigelmegitt
Copy link
Collaborator

Using the EBU-TT-D Encoder I'm occasionally getting Unicode errors like:

Unhandled Error
Traceback (most recent call last):
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/twisted/python/log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/twisted/python/log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/twisted/internet/selectreactor.py", line 149, in _doReadOrWrite
    why = getattr(selectable, method)()
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/twisted/internet/tcp.py", line 208, in doRead
    return self._dataReceived(data)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/twisted/internet/tcp.py", line 214, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/autobahn/twisted/websocket.py", line 131, in dataReceived
    self._dataReceived(data)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/autobahn/websocket/protocol.py", line 1175, in _dataReceived
    self.consumeData()
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/autobahn/websocket/protocol.py", line 1187, in consumeData
    while self.processData() and self.state != WebSocketProtocol.STATE_CLOSED:
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/autobahn/websocket/protocol.py", line 1553, in processData
    fr = self.onFrameEnd()
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/autobahn/websocket/protocol.py", line 1674, in onFrameEnd
    self._onMessageEnd()
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/autobahn/twisted/websocket.py", line 159, in _onMessageEnd
    self.onMessageEnd()
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/autobahn/websocket/protocol.py", line 627, in onMessageEnd
    self._onMessage(payload, self.message_is_binary)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/venv/lib/python2.7/site-packages/autobahn/twisted/websocket.py", line 162, in _onMessage
    self.onMessage(payload, isBinary)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/ebu_tt_live/twisted/websocket.py", line 362, in onMessage
    self._write_to_consumer(payload, sequence_identifier=self._sequence_identifier)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/ebu_tt_live/twisted/websocket.py", line 111, in _write_to_consumer
    self.consumer.write(data, **kwargs)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/ebu_tt_live/twisted/websocket.py", line 208, in write
    self._custom_consumer.on_new_data(data, **kwargs)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/ebu_tt_live/carriage/websocket.py", line 32, in on_new_data
    self.consumer_node.process_document(data, **kwargs)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/ebu_tt_live/adapters/node_carriage.py", line 174, in process_document
    self.consumer_node.process_document(conv_doc, **new_kwargs)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/ebu_tt_live/node/encoder.py", line 48, in process_document
    self.producer_carriage.emit_data(data=converted_doc, sequence_identifier='default', time_base='media', **kwargs)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/ebu_tt_live/adapters/node_carriage.py", line 116, in emit_data
    self.producer_carriage.emit_data(conv_data, **new_kwargs)
  File "/Users/megitn02/Code/ebu/ebu-tt-live-toolkit/ebu_tt_live/carriage/filesystem.py", line 158, in emit_data
    destfile.write(data)
exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 1523: ordinal not in range(128)

This is annoying. I don't know what's causing it, but there's probably an easy fix (though possibly a dangerous one) - https://docs.python.org/2.7/howto/unicode.html#the-unicode-type suggests using codecs.open and setting errors='ignore' will at least make the error go away...

@nigelmegitt nigelmegitt self-assigned this May 8, 2018
nigelmegitt added a commit that referenced this issue May 8, 2018
This strips out unencodable characters and if not fixes, at least masks #483 by using `codecs.open` and telling it to ignore errors.
@nigelmegitt
Copy link
Collaborator Author

Using codecs.open with errors='ignore' doesn't fix the issue - it still sometimes arises. Need to do more digging into the content that triggers it and trace back to the source of the error. It could be something to do with a specific feed and the way that is made.

@nigelmegitt
Copy link
Collaborator Author

Needs to be re-reviewed in the context of Python3, where the issue may no longer arise.

@spoeschel
Copy link
Collaborator

The exception seems to be caused by characters which occupy more than just a single byte in UTF-8 i.e. characters with a Unicode code point > 127 (= not from the lower half of ASCII). For example also the German umlauts äöüÄÖÜ and the "sharp s" ß - I'm affected, too.

With codecs.open and encoding='utf-8' (taken from Python 2's Unicode HOWTO), tested with the filesystem output, the exception doesn't occur.

@nigelmegitt
Copy link
Collaborator Author

Sounds promising @spoeschel , does this mean you can generate a test case? That would be great because even if we fix it for Python2, we will also need to check it still works in Python3 when we migrate.

@nigelmegitt
Copy link
Collaborator Author

@spoeschel I made a comment in #484 a long long time ago suggesting this was worth re-testing in Python3. I don't know if Python3 would work for you, but I've pushed a working Python3 build to the release/3.0 branch; if you have a repeatable test case would you be interested in trying that branch and seeing if this bug is indeed resolved by moving to Python3?

@spoeschel
Copy link
Collaborator

I havent't yet worked into the testing subsystem, but I will create a test case for this.

Testing with the Python 3 branch this issue indeed no longer occurs when using one of the German letters mentioned above.

However I get an exception when using the WebSocket output with the Python 3 branch (the WS input works), regardless of using any of the problematic letters or not. The filesystem output works though. I will have a look into that and probably open a new issue.

@nigelmegitt
Copy link
Collaborator Author

Thank you @spoeschel !

@spoeschel
Copy link
Collaborator

With codecs.open and encoding='utf-8' (taken from Python 2's Unicode HOWTO), tested with the filesystem output, the exception doesn't occur.

It just turned out that this quick fix for the Python 2 branch only worked when I used the Resequencer. With the buffer-delay, the exception still occurs though the UTF-8 encoding is set for writing the output file. So it seems that the processing of the Resequencer somehow helps/sanitizes here - and the received documents cannot be forwarded to the output without such further processing, without triggering the exception. So it is maybe the easiest to go the Python 3 way here.

@nigelmegitt
Copy link
Collaborator Author

I think this is a strong argument for tying up the release/2.1.2 work, releasing it as our final Python2 release and moving all future work into release/3.0.

@spoeschel
Copy link
Collaborator

I agree; this makes more sense than fixing a complex issue for a Python version that will be deprecated very soon anyway.

@nigelmegitt nigelmegitt added this to the Release 4 milestone Oct 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants