Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fix broken numbers at the end of the string #91

Merged
merged 2 commits into from
May 4, 2023

Conversation

yGuy
Copy link
Contributor

@yGuy yGuy commented Apr 28, 2023

When the string ends in the middle of a number, sometimes the JSON cannot be repaired.
This already works for simple sequences of digits, but this pull request adds the ability to also fix numbers if they end after a ., e, e+ or -.
This doesn't change the numbers any more than when the sequence is truncated in mid-air, but keeps the "nature of the number" by appending a '0' in these cases.

  • 1. becomes 1.0
  • 1.e becomes 1.e0
  • - becomes -0

and so on.

@josdejong
Copy link
Owner

Thanks Sebastian.

I'm a bit in doubt on what is best to do here. If a JSON document is truncated halfway a number, it is almost certainly not repaired like it originally was by just appending a zero. The same holds of course by a string that is halfway truncated, though fixing numbers like this feel a bit more significant than a missing piece of text.

From all repair possibilities, some are conservative at we're quite sure that the repaired JSON is correct (like a missing comma or missing quotes around a key). In other cases it's a bit of a guess, and we know that the contents of a repaired document are not correct, and it is more like we can make it valid JSON again and that's about it. The latter is the case for truncated JSON documents for example.

Would it make sense to introduce a config option to either apply only conservative fixes? And also, it would be helpful to get more information in the output about what repair actions have been applied, as discussed in #79. What do you think?

@yGuy
Copy link
Contributor Author

yGuy commented May 2, 2023

I understand your concern. However IMHO (as I wrote) the current implementation will also "fix" numbers that are half-way broken (12345) will become 12345, even if it was 123456789. The same arguments would hold for closing arrays, objects, etc. My PR was specifically meant to make it possible to parse JSONs that have been cut-off in the middle or rather "at the end". This works very well for the current version of the project except for that one use-case where numbers are cut off at the wrong place.

I was trying to use this project to interpret a streaming JSON response (generated by ChatGPT) and I would like to parse the (partial) results as early as possible. This worked beautifully for all my test cases except for when the GPT token ended with a dot, minus, or E in a number. That's what I was trying to improve.

I agree with you that it would be best to get some kind of feedback that tells you whether there were just cosmetic changes or the JSON was incomplete or worse syntax errors, however with the current state of the project and API I think the PR improves the situation and does not make it worse, because you can run into the very same issues with cut-off numbers, strings, arrays, and objects just the same.

@josdejong
Copy link
Owner

Ah, thanks, that is a very good point about some truncated numbers already being "fixed". So your PR will only make the current behavior more consistent, fixing all truncated numbers instead of some.

Ok let's go for this PR, and think through the ideas for a "conservative" mode separately.

Copy link
Owner

@josdejong josdejong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR, it looks very neat and well tested, thanks.

I made one small inline comment, can you have a look at that?

src/jsonrepair.ts Outdated Show resolved Hide resolved
@josdejong
Copy link
Owner

Thanks for the update. This PR is a nice improvement, I'll merge and publish it now.

@josdejong josdejong merged commit 9ad00fd into josdejong:develop May 4, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants