Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Do not use canonicaljson to magically handle decoding bytes from JSON #7802

Merged
merged 7 commits into from
Jul 10, 2020

Conversation

clokep
Copy link
Member

@clokep clokep commented Jul 8, 2020

Related to #7674, this is another reference to simplejson that can go away.

Unfortunately we can't actually remove this code until Python 3.6, which is when json gained support for parsing bytes directly. (Note that simplejson does actually parse both bytes and str.)

This adds a shim for decoding JSON from bytes. If Python 3.5 is not supported in the future we can switch to directly using json.loads.

This adds manual decoding of bytes to UTF-8 to be compatible with Python 3.5.

This only converts a couple of callers since this is a bit nuanced. Annoyingly there's a potential to actually break Python 3.6/3.7 here since you can't decode a str. We could add a type check and handle bytes or str as the input to the function if we think that would be better.

This attempts to handle some of our more generic code:

  • API error handling (in HttpResponseException)
  • The federation server
  • HTTP client
  • HTTP servlet

It also corrects a type hint in the CAS handler.

@clokep clokep requested a review from a team July 8, 2020 11:30
@@ -0,0 +1 @@
Switch from simplejson to the standard library json.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches #7800, seemed silly to have a separate changelog entry for changing a comment. :)

@clokep clokep removed the request for review from a team July 9, 2020 12:08
@clokep
Copy link
Member Author

clokep commented Jul 9, 2020

There's some similar instances of this so I'll group it into this same PR.

@clokep clokep changed the title Clarify a comment about simplejson. Do not use canonicaljson to magically handle decoding bytes from JSON Jul 9, 2020
@clokep clokep requested a review from a team July 9, 2020 16:04
@clokep
Copy link
Member Author

clokep commented Jul 9, 2020

I'm curious on people's thoughts here, especially whether it makes sense to have a single function which handles str & bytes (as json.loads in Python >= 3.6 does).

@richvdh
Copy link
Member

richvdh commented Jul 9, 2020

I kinda feel like making the .decode() call explicit is a good thing, even on python 3.6 and later; as such my vote would be just to do json.loads(self.response.decode('utf-8')).

Not a strong opinion though.

@clokep
Copy link
Member Author

clokep commented Jul 9, 2020

That works too. I actually originally made a separate function for error handling reasons, but then realized that both UnicodeDecodeError and JSONDecodeError are both ValueError, so it makes the error handling quite sane.

@clokep
Copy link
Member Author

clokep commented Jul 9, 2020

Part of my reasoning was also that I was assuming that once we dropped support for Python 3.5 we wouldn't do the decoding and would just pass it directly into json.dumps. Those should be easy enough to find regardless of it being a separate function though.

@clokep clokep removed the request for review from a team July 10, 2020 13:44
@clokep clokep requested a review from a team July 10, 2020 15:56
@clokep
Copy link
Member Author

clokep commented Jul 10, 2020

I think this is ready now, as I mentioned above this doesn't try to hit every instance, but some of the core areas.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit '66a4af8d9':
  Do not use canonicaljson to magically handle decoding bytes from JSON. (#7802)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants