feat: fix broken numbers at the end of the string #91

yGuy · 2023-04-28T13:56:57Z

When the string ends in the middle of a number, sometimes the JSON cannot be repaired.
This already works for simple sequences of digits, but this pull request adds the ability to also fix numbers if they end after a ., e, e+ or -.
This doesn't change the numbers any more than when the sequence is truncated in mid-air, but keeps the "nature of the number" by appending a '0' in these cases.

1. becomes 1.0
1.e becomes 1.e0
- becomes -0

and so on.

josdejong · 2023-05-02T07:40:50Z

Thanks Sebastian.

I'm a bit in doubt on what is best to do here. If a JSON document is truncated halfway a number, it is almost certainly not repaired like it originally was by just appending a zero. The same holds of course by a string that is halfway truncated, though fixing numbers like this feel a bit more significant than a missing piece of text.

From all repair possibilities, some are conservative at we're quite sure that the repaired JSON is correct (like a missing comma or missing quotes around a key). In other cases it's a bit of a guess, and we know that the contents of a repaired document are not correct, and it is more like we can make it valid JSON again and that's about it. The latter is the case for truncated JSON documents for example.

Would it make sense to introduce a config option to either apply only conservative fixes? And also, it would be helpful to get more information in the output about what repair actions have been applied, as discussed in #79. What do you think?

yGuy · 2023-05-02T09:14:03Z

I understand your concern. However IMHO (as I wrote) the current implementation will also "fix" numbers that are half-way broken (12345) will become 12345, even if it was 123456789. The same arguments would hold for closing arrays, objects, etc. My PR was specifically meant to make it possible to parse JSONs that have been cut-off in the middle or rather "at the end". This works very well for the current version of the project except for that one use-case where numbers are cut off at the wrong place.

I was trying to use this project to interpret a streaming JSON response (generated by ChatGPT) and I would like to parse the (partial) results as early as possible. This worked beautifully for all my test cases except for when the GPT token ended with a dot, minus, or E in a number. That's what I was trying to improve.

I agree with you that it would be best to get some kind of feedback that tells you whether there were just cosmetic changes or the JSON was incomplete or worse syntax errors, however with the current state of the project and API I think the PR improves the situation and does not make it worse, because you can run into the very same issues with cut-off numbers, strings, arrays, and objects just the same.

josdejong · 2023-05-03T18:03:03Z

Ah, thanks, that is a very good point about some truncated numbers already being "fixed". So your PR will only make the current behavior more consistent, fixing all truncated numbers instead of some.

Ok let's go for this PR, and think through the ideas for a "conservative" mode separately.

josdejong

I reviewed the PR, it looks very neat and well tested, thanks.

I made one small inline comment, can you have a look at that?

src/jsonrepair.ts

josdejong · 2023-05-04T08:20:35Z

Thanks for the update. This PR is a nice improvement, I'll merge and publish it now.

feat: fix broken numbers at the end of the string

c42d9dd

josdejong reviewed May 3, 2023

View reviewed changes

src/jsonrepair.ts Outdated Show resolved Hide resolved

style: simplify and rename expectDigitOrRepair

ce700fc

josdejong merged commit 9ad00fd into josdejong:develop May 4, 2023

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fix broken numbers at the end of the string #91

feat: fix broken numbers at the end of the string #91

yGuy commented Apr 28, 2023

josdejong commented May 2, 2023

yGuy commented May 2, 2023

josdejong commented May 3, 2023

josdejong left a comment

josdejong commented May 4, 2023

feat: fix broken numbers at the end of the string #91

feat: fix broken numbers at the end of the string #91

Conversation

yGuy commented Apr 28, 2023

josdejong commented May 2, 2023

yGuy commented May 2, 2023

josdejong commented May 3, 2023

josdejong left a comment

Choose a reason for hiding this comment

josdejong commented May 4, 2023