-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: fix broken numbers at the end of the string #91
Conversation
Thanks Sebastian. I'm a bit in doubt on what is best to do here. If a JSON document is truncated halfway a number, it is almost certainly not repaired like it originally was by just appending a zero. The same holds of course by a string that is halfway truncated, though fixing numbers like this feel a bit more significant than a missing piece of text. From all repair possibilities, some are conservative at we're quite sure that the repaired JSON is correct (like a missing comma or missing quotes around a key). In other cases it's a bit of a guess, and we know that the contents of a repaired document are not correct, and it is more like we can make it valid JSON again and that's about it. The latter is the case for truncated JSON documents for example. Would it make sense to introduce a config option to either apply only conservative fixes? And also, it would be helpful to get more information in the output about what repair actions have been applied, as discussed in #79. What do you think? |
I understand your concern. However IMHO (as I wrote) the current implementation will also "fix" numbers that are half-way broken (12345) will become 12345, even if it was 123456789. The same arguments would hold for closing arrays, objects, etc. My PR was specifically meant to make it possible to parse JSONs that have been cut-off in the middle or rather "at the end". This works very well for the current version of the project except for that one use-case where numbers are cut off at the wrong place. I was trying to use this project to interpret a streaming JSON response (generated by ChatGPT) and I would like to parse the (partial) results as early as possible. This worked beautifully for all my test cases except for when the GPT token ended with a dot, minus, or E in a number. That's what I was trying to improve. I agree with you that it would be best to get some kind of feedback that tells you whether there were just cosmetic changes or the JSON was incomplete or worse syntax errors, however with the current state of the project and API I think the PR improves the situation and does not make it worse, because you can run into the very same issues with cut-off numbers, strings, arrays, and objects just the same. |
Ah, thanks, that is a very good point about some truncated numbers already being "fixed". So your PR will only make the current behavior more consistent, fixing all truncated numbers instead of some. Ok let's go for this PR, and think through the ideas for a "conservative" mode separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the PR, it looks very neat and well tested, thanks.
I made one small inline comment, can you have a look at that?
Thanks for the update. This PR is a nice improvement, I'll merge and publish it now. |
When the string ends in the middle of a number, sometimes the JSON cannot be repaired.
This already works for simple sequences of digits, but this pull request adds the ability to also fix numbers if they end after a
.
,e
,e+
or-
.This doesn't change the numbers any more than when the sequence is truncated in mid-air, but keeps the "nature of the number" by appending a '0' in these cases.
1.
becomes1.0
1.e
becomes1.e0
-
becomes-0
and so on.