-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 encoded mails are not sent #30
Comments
Viktor Dick wrote at 2020-7-14 00:01 -0700:
According to the documentation in `smptlib.sendmail`,
msg may be a string containing characters in the ASCII range, or a byte
string. A string is encoded to bytes using the ascii codec, and lone
\\r and \\n characters are converted to \\r\\n characters.
I read this as: the message must be ASCII only.
This is not surprising as it is a major requirement for low level mail
messages. If the orignal (high level) message contains non ASCII, it must be
encoded to ASCII (typically via "quoted-printable" or "base64").
Python's `email` package contains logic to do this.
|
Thanks for the reply. I actually read this as: Either ensure that you handled the encoding yourself (passing bytes) or only use ASCII. I am pretty sure that current mail clients assume that every MTA on the way supports https://tools.ietf.org/html/rfc6152. At least if I send a message with Thunderbird with default settings, it uses an 8bit-encoding (In fact, utf-8). More to the point, passing UTF-8 encoded messages to |
Viktor Dick wrote at 2020-7-14 01:49 -0700:
...
More to the point, passing UTF-8 encoded messages to `smtplib.send` yields the expected result (assuming the correct headers are set and only the body uses UTF-8, not the headers).
Rereading your original post I have the impression that
the problem cause is `Products.MailHost`: because messages are at lower level
byte sequences,
`MailHost` should not convert a byte sequence into `str` - or at least
not guessing an encoding.
I would not put encoding guessing into `zope.sendmail` or sniff
encoding information from the message headers. Encoding handling
is implemented elsewhere (e.g. the `email` package); there is no need
to implement is another time.
|
I agree. I posted the issue here because I could not exactly pinpoint the location in Is it possible to move the issue there? Otherwise I will close this and re-open another issue there (and take another look at what needs to be fixed there). |
@jugmac00 can you give me a hint how exactly you are using
What exactly am I missing? |
Viktor Dick wrote at 2020-7-15 17:16 +0000:
@jugmac00 can you give me a hint how exactly you are using `Products.MailHost` in Zope4, in particular with non-ASCII characters? My minimal working example for creating a failure is something like a PythonScript containing
mail = '''\
To: ***@***.***
From: ***@***.***>
Subject: Test
Mime-Version: 1.0
Content-Type: text/html; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
Testü
</body>
</html>'''
context.mailhost.send(messageText=mail)
What exactly am I missing?
I like to look at things in an abstract way:
Logically, a message is a sequence of bytes and especially
its body is a sequence of bytes. The `Content-Type: text; charset=UTF-8`
tells us that the body uses the `UTF-8` encoding.
Your message is not a sequence of bytes - but a string
representing a message. Especially, the body is not a sequence
of bytes. The conversion into a corresponding byte sequence is not trivial.
With `Content-Type: text; charset=UTF-8`, one could argue that
the body should be encoded with "UTF-8"; but, what about a binary
"content-type": how should a string body in this case be converted
into a sequence of bytes?
I think Python's `email` package has a bug here.
```
>> from email import message_from_string, message_from_bytes
>> mail = '''\
... To: test@test.de
... From: <no-reply@test.de>
... Subject: Test
... Mime-Version: 1.0
... Content-Type: text/html; charset=UTF-8; format=flowed
... Content-Transfer-Encoding: 8bit
...
... <!DOCTYPE HTML>
... <html>
... <head>
...
... <meta http-equiv="content-type" content="text/html; charset=UTF-8">
... </head>
... <body>
... Testü
... </body>
... </html>'''
>> mb = message_from_bytes(mail.encode("utf-8"))
>> ms = message_from_string(mail)
b'To: test@test.de\nFrom: <no-reply@test.de>\nSubject: Test\nMime-Version: 1.0\nContent-Type: text/html; charset=UTF-8; format=flowed\nContent-Transfer-Encoding: 8bit\n\n<!DOCTYPE HTML>\n<html>\n<head>\n\n<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n</head>\n<body>\nTest\xc3\xbc\n</body>\n</html>'
```
Here `mb` and `ms` should represent the same message.
But:
```
>> mb.as_bytes()
b'To: test@test.de\nFrom: <no-reply@test.de>\nSubject: Test\nMime-Version: 1.0\nContent-Type: text/html; charset=UTF-8; format=flowed\nContent-Transfer-Encoding: 8bit\n\n<!DOCTYPE HTML>\n<html>\n<head>\n\n<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n</head>\n<body>\nTest\xc3\xbc\n</body>\n</html>'
>> ms.as_bytes()
Traceback (most recent call last):
...
File "/usr/local/lib/python3.9/email/generator.py", line 155, in _write_lines
self.write(line)
File "/usr/local/lib/python3.9/email/generator.py", line 406, in write
self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 4: ordinal not in range(128)
```
|
Dieter Maurer wrote at 2020-7-15 20:26 +0200:
Viktor Dick wrote at 2020-7-15 17:16 +0000:
...
I think Python's `email` package has a bug here.
I filed bug report "https://bugs.python.org/issue41307".
Unfortunately, this bug prevents the use of `as_bytes` (as I
proposed in an earlier comment) for the messages you are interested
in (unless we use a `Message` class where this bug is fixed).
|
Dieter Maurer wrote at 2020-7-15 21:25 +0200:
Dieter Maurer wrote at 2020-7-15 20:26 +0200:
>Viktor Dick wrote at 2020-7-15 17:16 +0000:
> ...
>I think Python's `email` package has a bug here.
I filed bug report "https://bugs.python.org/issue41307".
Unfortunately, this bug prevents the use of `as_bytes` (as I
proposed in an earlier comment) for the messages you are interested
in (unless we use a `Message` class where this bug is fixed).
"https://bugs.python.org/issue41307#msg373724" contains
a workaround for your kind of messages.
It applies "charset" and targets "8bit" transfer encodings.
For other transfer encodings, this likely needs to be
applied in addition to "charset".
As a side note:
the Python 3 `email` package uses "surrogateescape" to convert
between a sequence of bytes and "str". To be specific:
byte sequence "bs" is represented as "str" by
`bs.decode("ascii", "surrogateescape")`;
`.decode("ascii", "surrogateescape")` gives back the original
`bs`.
|
@viktordick Could you check whether #32 solves your issue? |
@viktordick When I inherited my Zope app 2015 I just continued to work with the idioms my predecessor left in the app. So, it is something similar like this...
|
…esponding messages could not be sent anyway due to #30
Thanks very much for your efforts so far! |
@jugmac00 I tested this and it results in a |
Viktor Dick wrote at 2020-7-18 02:11 -0700:
...
we have 2020 and I do not think that there is a MTA out there that is unable to use a Content-Transfer-Encoding of 8bit.
Python's `email` package is not yet fully "Content-Transfer-Encoding: 8bit"
ready.
|
This is based on the discussion in `https://github.com/zopefoundation/Products.MailHost/issues/30` and `https://github.com/zopefoundation/Products.MailHost/pull/32`. In order to allow messages with 8bit-encoding to be sent using smtplib, they have to be prepared as bytes. Since any preparation and header parsing and manipulation is done in `Products.MailHost`, `zope.sendmail` only needs to store messages internally as bytes and allow them to be passed as bytes.
@d-maurer and @viktordick - could you add a test somewhere, which makes sure the way I create emails ( see #30 (comment) ) will still work with your pull requests? Don't get me wrong - I trust you both you know what you do... it is just... if something gets fixed which worked for 15 years I am always a bit .. anxious :-) |
Jürgen Gmach wrote at 2020-7-20 12:42 -0700:
@d-maurer and @viktordick - could you add a test somewhere, which makes sure the way I create emails ( see #30 (comment) ) will still work with your pull requests?
Don't get me wrong - I trust you both you know what you do... it is just... if something gets fixed which worked for 15 years I am always a bit .. anxious :-)
As I wrote to Viktor, I do not mind if others contribute to "my" PR.
I invite you to add the test.
Note that is difficult to have a black box test; it would involve
really sending the message - which all available tests avoid.
That's why Victor's problem was not detected by a test.
As a consequence, your test may pass and nevertheless in real life
something goes wrong.
I therefore propose an alternative: include the versions
from the relevant branches into your environment and
make a functional test there (involving real sending).
Report back if you should observe any problems.
|
Is this test ok? Then I push it to your pr / branch (except I will put the import statements at the top).
|
Jürgen Gmach wrote at 2020-7-21 01:39 -0700:
...
@d-maurer
Is this test ok? Then I push it to your pr / branch (except I will put the import statements at the top).
I likely would try to avoid the "noqa: E501" - but that is not
a serious objection (I am often very unhappy with the `flake8`
enforced policies -- especially those regarding continuation lines
which are either too much indented or where the indentation
is the same of that for a nested level).
Go ahead with your push.
|
@d-maurer Thanks your reply. I pushed the code to your branch, after I updated my code slightly to also pass the Python 2 tests. The two While possible, breaking those lines into multiple lines seemed very unnatural. |
This is based on the discussion in `https://github.com/zopefoundation/Products.MailHost/issues/30` and `https://github.com/zopefoundation/Products.MailHost/pull/32`. In order to allow messages with 8bit-encoding to be sent using smtplib, they have to be prepared as bytes. Since any preparation and header parsing and manipulation is done in `Products.MailHost`, `zope.sendmail` only needs to store messages internally as bytes and allow them to be passed as bytes. Co-authored-by: dieter <dieter@handshake.de>
I saw test failures earlier when I tried it in combination with other new packages. Now let's try it on its own. I expect failures due to the changes here: zopefoundation/Products.MailHost#30
Possible regression on Python 3, at least in Plone tests: #33 |
I saw test failures earlier when I tried it in combination with other new packages. Now let's try it on its own. I expect failures due to the changes here: zopefoundation/Products.MailHost#30
According to the documentation in
smptlib.sendmail
,Since switching to Zope 4, we are unable to send mails that have a UTF-8 encoded body - Even if we transform it to bytes ourselves,
Products.MailHost
converts it back to strings before sending it tozope.sendmail
and smptlib fails to encode the passed message using ASCII during the second phase of the two phase commit. Our quick fix was to changezope.sendmail
to encode the message usingutf-8
before passing it tosmtplib.sendmail
.A really clean way would probably be to inspect the headers (which should be ASCII) and check for a
Content-Type
andContent-Transfer-Encoding
. Maybe it is even possible to have a multipart message with different encodings (not sure). But since UTF-8 is such a popular choice, we used that for now.I am not sure if the fix should happen in
zope.sendmail
or somewhere further up likeProducts.MailHost
. If it should bezope.sendmail
, I could try to supply a PR. Otherwise I would need to be pointed to the right direction.The text was updated successfully, but these errors were encountered: