FATAL: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte #149

PatrickSiqueira · 2020-10-15T13:44:41Z

Hello people!
I am currently using pg_activity-1.6.1 with PostgreSQL 12.3, after a certain run time, I get the following error:

Traceback (most recent call last):
File "./pg_activity", line 332, in main
procs, disp_procs)
File "/var/lib/pgsql/install/pg_activity-1.6.1/pgactivity/UI.py", line 1191, in poll
disp_proc)
File "/var/lib/pgsql/install/pg_activity-1.6.1/pgactivity/UI.py", line 1337, in __poll_activities
queries = self.data.pg_get_activities(self.duration_mode)
File "/var/lib/pgsql/install/pg_activity-1.6.1/pgactivity/Data.py", line 510, in pg_get_activities
ret = cur.fetchall()
File "/usr/lib64/python3.6/site-packages/psycopg2/extras.py", line 100, in fetchall
res = super(DictCursorBase, self).fetchall()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte
FATAL: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte

Previously with version 9 of PostgreSQL, I did not receive these errors, after the migration to version 12 is that I am having this. can you help me?

blogh · 2020-10-16T09:25:06Z

Hi,

Are you using third party extensions on postgres ?
We had a similar issue related to powa before : #130

Benoit

PatrickSiqueira · 2020-10-16T13:19:02Z

Hello,

Only pg_stat_statements, but I've been using it since version 9 of postgres, could this problem be related to some incompatibility of pg_activity with postgres 12.3 or CentOS 8?

blogh · 2020-10-28T08:34:46Z

Hi sorry for the long delay,

could this problem be related to some incompatibility of pg_activity with postgres 12.3 or CentOS 8?
I dont think so.

I'll try to reproduce this on my side.

Benoit.

dlax · 2020-10-29T08:31:11Z

I could reproduce on a debian system with postgres 11 (from debian) by executing the following SQL while pg_activity is running:

$ psql postgres
denis@postgres=# CREATE DATABASE latin1 ENCODING 'latin1' TEMPLATE template0 LC_COLLATE 'fr_FR.latin1' LC_CTYPE 'fr_FR.latin1';
CREATE DATABASE
denis@postgres=# \c latin1 
You are now connected to database "latin1" as user "denis".
denis@latin1=# CREATE TABLE test (data text);
CREATE TABLE
denis@latin1=# BEGIN;
BEGIN
denis@latin1=# INSERT INTO test VALUES ('é');
INSERT 0 1
denis@latin1=#

By keeping the last transaction uncommitted, the INSERT ... makes pg_activity crashes as described above. I think this is because the é is attempted to be decoded with utf-8 whereas the database uses another encoding.

dlax · 2020-10-29T14:49:00Z

Actually, the problem shows up by using psycopg2 directly:

>>> conn = psycopg2.connect(database="postgres")
>>> conn.encoding
'UTF8'
>>> cur = conn.cursor()
>>> cur.execute("select query from pg_stat_activity")
>>> cur.fetchall()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26: invalid continuation byte

so perhaps it's not really related to pg_activity, not sure what we can do about this?

@PatrickSiqueira, at the moment, you might try to use pg_activity with a database that has the correct encoding: pg_activity -d <dbname>.

dlax · 2020-10-29T14:51:00Z

And incidentally, psql seems smarter and just escapes characters that cannot be decoded:

$ psql postgres -t -c "select query from pg_stat_activity;"
 INSERT INTO test VALUES ('');
...
$ psql latin1 -t -c "select query from pg_stat_activity;"
 INSERT INTO test VALUES ('é');
...

dvarrazzo · 2020-10-30T17:06:48Z

psql is not smarter: it's dumber, so it works :) it just emits a stream of bytes to the console, and the console has a replace policy.

You can do the same by asking psycopg2 to return bytes instead of unicode strings: see https://www.psycopg.org/docs/faq.html#faq-bytes.

dlax · 2020-10-30T17:09:34Z

@dvarrazzo, thanks for the tip and explanation.

dvarrazzo · 2020-10-30T17:19:02Z

Another option would be to cast the query field to bytea. psycopg2 returns a memoryview object for it, which you can convert to bytes, or you can create your own typecaster to convert Postgres bytea to Python bytes.

The result is the same, you get in control of the decoding, but this way you can choose column-by-column what to retrieve in as unicode and what as bytes, and you can register the typecaster globally.

blogh · 2020-11-02T11:03:38Z

We can also get the encoding of the field from pg_database.

SELECT convert_from(query::bytea, pg_catalog.pg_encoding_to_char(b.encoding)),  
       pg_catalog.pg_encoding_to_char(b.encoding)                               
FROM pg_stat_activity a                                                         
     INNER JOIN pg_database b ON a.datid = b.oid;

The join filters all the non client backends. But right now, we don't care and it would be easy enough to fix.

When the encoding of a database is not UTF8. Queries with special caracters might crash pg_activity with the message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26: ++ invalid continuation byte This patch fixes the issue by querying pg_database.encoding and using it to encode the string.

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue dalibo#149

When the encoding of a database is not UTF8. Queries with special caracters might crash pg_activity with the message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26: ++ invalid continuation byte This patch fixes the issue by querying pg_database.encoding and using it to encode the string.

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue dalibo#149

When the encoding of a database is not UTF8. Queries with special caracters might crash pg_activity with the message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26: ++ invalid continuation byte This patch fixes the issue by querying pg_database.encoding and using it to encode the string.

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue dalibo#149

When the encoding of a database is not UTF8. Queries with special caracters might crash pg_activity with the message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26: ++ invalid continuation byte This patch fixes the issue by querying pg_database.encoding and using it to encode the string.

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue #149

blogh · 2021-09-24T12:49:04Z

It took a long while. but it's fixed..
Thanks for reporting and helping !

blogh added the bug label Oct 16, 2020

dlax mentioned this issue Oct 30, 2020

Ignore decode errors in conn_decode() psycopg/psycopg2#1179

Closed

dlax mentioned this issue May 12, 2021

Add tests for the "data" module #217

Merged

blogh added a commit to blogh/pg_activity that referenced this issue Jun 25, 2021

Add test for issue dalibo#149

752ed1d

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue dalibo#149

blogh added a commit to blogh/pg_activity that referenced this issue Jun 25, 2021

Add test for issue dalibo#149

0bf3548

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue dalibo#149

blogh added a commit to blogh/pg_activity that referenced this issue Jun 25, 2021

Fix test for dalibo#149

5fc3802

blogh added a commit to blogh/pg_activity that referenced this issue Sep 21, 2021

Add test for issue dalibo#149

8af1a29

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue dalibo#149

blogh added a commit to blogh/pg_activity that referenced this issue Sep 23, 2021

Add test for issue dalibo#149

10484a3

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue dalibo#149

blogh added a commit that referenced this issue Sep 23, 2021

Add test for issue #149

8ab2dc0

This commit adds : * the ability to create a connexion to a different backend with the asynchronous "execute" function * a test case for the issue #149

blogh closed this as completed Sep 24, 2021

blogh mentioned this issue Jan 21, 2022

Encoding bug, still not out of the woods ... #275

Closed

blogh mentioned this issue Jan 11, 2023

Encoding mess recap #332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FATAL: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte #149

FATAL: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte #149

PatrickSiqueira commented Oct 15, 2020

blogh commented Oct 16, 2020

PatrickSiqueira commented Oct 16, 2020

blogh commented Oct 28, 2020

dlax commented Oct 29, 2020

dlax commented Oct 29, 2020

dlax commented Oct 29, 2020 •

edited

Loading

dvarrazzo commented Oct 30, 2020

dlax commented Oct 30, 2020

dvarrazzo commented Oct 30, 2020

blogh commented Nov 2, 2020

blogh commented Sep 24, 2021

FATAL: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte #149

FATAL: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte #149

Comments

PatrickSiqueira commented Oct 15, 2020

blogh commented Oct 16, 2020

PatrickSiqueira commented Oct 16, 2020

blogh commented Oct 28, 2020

dlax commented Oct 29, 2020

dlax commented Oct 29, 2020

dlax commented Oct 29, 2020 • edited Loading

dvarrazzo commented Oct 30, 2020

dlax commented Oct 30, 2020

dvarrazzo commented Oct 30, 2020

blogh commented Nov 2, 2020

blogh commented Sep 24, 2021

dlax commented Oct 29, 2020 •

edited

Loading