Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte #149

Closed
PatrickSiqueira opened this issue Oct 15, 2020 · 11 comments
Labels

Comments

@PatrickSiqueira
Copy link

Hello people!
I am currently using pg_activity-1.6.1 with PostgreSQL 12.3, after a certain run time, I get the following error:

Traceback (most recent call last):
File "./pg_activity", line 332, in main
procs, disp_procs)
File "/var/lib/pgsql/install/pg_activity-1.6.1/pgactivity/UI.py", line 1191, in poll
disp_proc)
File "/var/lib/pgsql/install/pg_activity-1.6.1/pgactivity/UI.py", line 1337, in __poll_activities
queries = self.data.pg_get_activities(self.duration_mode)
File "/var/lib/pgsql/install/pg_activity-1.6.1/pgactivity/Data.py", line 510, in pg_get_activities
ret = cur.fetchall()
File "/usr/lib64/python3.6/site-packages/psycopg2/extras.py", line 100, in fetchall
res = super(DictCursorBase, self).fetchall()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte
FATAL: 'utf-8' codec can't decode byte 0xe7 in position 128: invalid continuation byte

Previously with version 9 of PostgreSQL, I did not receive these errors, after the migration to version 12 is that I am having this. can you help me?

@blogh blogh added the bug label Oct 16, 2020
@blogh
Copy link
Collaborator

blogh commented Oct 16, 2020

Hi,

Are you using third party extensions on postgres ?
We had a similar issue related to powa before : #130

Benoit

@PatrickSiqueira
Copy link
Author

Hello,

Only pg_stat_statements, but I've been using it since version 9 of postgres, could this problem be related to some incompatibility of pg_activity with postgres 12.3 or CentOS 8?

@blogh
Copy link
Collaborator

blogh commented Oct 28, 2020

Hi sorry for the long delay,

could this problem be related to some incompatibility of pg_activity with postgres 12.3 or CentOS 8?
I dont think so.

I'll try to reproduce this on my side.

Benoit.

@dlax
Copy link
Member

dlax commented Oct 29, 2020

I could reproduce on a debian system with postgres 11 (from debian) by executing the following SQL while pg_activity is running:

$ psql postgres
denis@postgres=# CREATE DATABASE latin1 ENCODING 'latin1' TEMPLATE template0 LC_COLLATE 'fr_FR.latin1' LC_CTYPE 'fr_FR.latin1';
CREATE DATABASE
denis@postgres=# \c latin1 
You are now connected to database "latin1" as user "denis".
denis@latin1=# CREATE TABLE test (data text);
CREATE TABLE
denis@latin1=# BEGIN;
BEGIN
denis@latin1=# INSERT INTO test VALUES ('é');
INSERT 0 1
denis@latin1=# 

By keeping the last transaction uncommitted, the INSERT ... makes pg_activity crashes as described above. I think this is because the é is attempted to be decoded with utf-8 whereas the database uses another encoding.

@dlax
Copy link
Member

dlax commented Oct 29, 2020

Actually, the problem shows up by using psycopg2 directly:

>>> conn = psycopg2.connect(database="postgres")
>>> conn.encoding
'UTF8'
>>> cur = conn.cursor()
>>> cur.execute("select query from pg_stat_activity")
>>> cur.fetchall()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26: invalid continuation byte

so perhaps it's not really related to pg_activity, not sure what we can do about this?

@PatrickSiqueira, at the moment, you might try to use pg_activity with a database that has the correct encoding: pg_activity -d <dbname>.

@dlax
Copy link
Member

dlax commented Oct 29, 2020

And incidentally, psql seems smarter and just escapes characters that cannot be decoded:

$ psql postgres -t -c "select query from pg_stat_activity;"
 INSERT INTO test VALUES ('');
...
$ psql latin1 -t -c "select query from pg_stat_activity;"
 INSERT INTO test VALUES ('é');
...

@dvarrazzo
Copy link

psql is not smarter: it's dumber, so it works :) it just emits a stream of bytes to the console, and the console has a replace policy.

You can do the same by asking psycopg2 to return bytes instead of unicode strings: see https://www.psycopg.org/docs/faq.html#faq-bytes.

@dlax
Copy link
Member

dlax commented Oct 30, 2020

@dvarrazzo, thanks for the tip and explanation.

@dvarrazzo
Copy link

Another option would be to cast the query field to bytea. psycopg2 returns a memoryview object for it, which you can convert to bytes, or you can create your own typecaster to convert Postgres bytea to Python bytes.

The result is the same, you get in control of the decoding, but this way you can choose column-by-column what to retrieve in as unicode and what as bytes, and you can register the typecaster globally.

@blogh
Copy link
Collaborator

blogh commented Nov 2, 2020

We can also get the encoding of the field from pg_database.

SELECT convert_from(query::bytea, pg_catalog.pg_encoding_to_char(b.encoding)),  
       pg_catalog.pg_encoding_to_char(b.encoding)                               
FROM pg_stat_activity a                                                         
     INNER JOIN pg_database b ON a.datid = b.oid;    

The join filters all the non client backends. But right now, we don't care and it would be easy enough to fix.

blogh added a commit to blogh/pg_activity that referenced this issue May 10, 2021
When the encoding of a database is not UTF8. Queries with special
caracters might crash pg_activity with the message :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26:
++  invalid continuation byte

This patch fixes the issue by querying pg_database.encoding and using it
to encode the string.
blogh added a commit to blogh/pg_activity that referenced this issue May 21, 2021
When the encoding of a database is not UTF8. Queries with special
caracters might crash pg_activity with the message :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26:
++  invalid continuation byte

This patch fixes the issue by querying pg_database.encoding and using it
to encode the string.
blogh added a commit to blogh/pg_activity that referenced this issue Jun 25, 2021
When the encoding of a database is not UTF8. Queries with special
caracters might crash pg_activity with the message :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26:
++  invalid continuation byte

This patch fixes the issue by querying pg_database.encoding and using it
to encode the string.
blogh added a commit to blogh/pg_activity that referenced this issue Jun 25, 2021
This commit adds :

* the ability to create a connexion to a different backend with the
  asynchronous "execute" function
* a test case for the issue dalibo#149
blogh added a commit to blogh/pg_activity that referenced this issue Jun 25, 2021
This commit adds :

* the ability to create a connexion to a different backend with the
  asynchronous "execute" function
* a test case for the issue dalibo#149
blogh added a commit to blogh/pg_activity that referenced this issue Jun 25, 2021
blogh added a commit to blogh/pg_activity that referenced this issue Sep 21, 2021
When the encoding of a database is not UTF8. Queries with special
caracters might crash pg_activity with the message :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26:
++  invalid continuation byte

This patch fixes the issue by querying pg_database.encoding and using it
to encode the string.
blogh added a commit to blogh/pg_activity that referenced this issue Sep 21, 2021
This commit adds :

* the ability to create a connexion to a different backend with the
  asynchronous "execute" function
* a test case for the issue dalibo#149
blogh added a commit to blogh/pg_activity that referenced this issue Sep 23, 2021
When the encoding of a database is not UTF8. Queries with special
caracters might crash pg_activity with the message :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26:
++  invalid continuation byte

This patch fixes the issue by querying pg_database.encoding and using it
to encode the string.
blogh added a commit to blogh/pg_activity that referenced this issue Sep 23, 2021
This commit adds :

* the ability to create a connexion to a different backend with the
  asynchronous "execute" function
* a test case for the issue dalibo#149
blogh added a commit that referenced this issue Sep 23, 2021
When the encoding of a database is not UTF8. Queries with special
caracters might crash pg_activity with the message :

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 26:
++  invalid continuation byte

This patch fixes the issue by querying pg_database.encoding and using it
to encode the string.
blogh added a commit that referenced this issue Sep 23, 2021
This commit adds :

* the ability to create a connexion to a different backend with the
  asynchronous "execute" function
* a test case for the issue #149
@blogh
Copy link
Collaborator

blogh commented Sep 24, 2021

It took a long while. but it's fixed..
Thanks for reporting and helping !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants