Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Read additional JSON objects from a Stream #20

Open
pkoppstein opened this issue Nov 17, 2021 · 1 comment
Open

Request: Read additional JSON objects from a Stream #20

pkoppstein opened this issue Nov 17, 2021 · 1 comment

Comments

@pkoppstein
Copy link

As best as I've been able to tell, the JSON module currently provided
by wren-essentials does not directly support the efficient entity-by-entity
reading of JSON texts in files with streams of JSON entities, unless
each such entity is on a separate line.

Consider for example a file with the four lines:

{
  "a:" 1
} 
2

This illustrates what is no doubt the main need, but it is worth
keeping in mind that streams of JSON entities can be presented without
any newlines at all, e.g.:

1 2 3

For reference, jq's input built-in meets the requirements I have in mind.

A related but also quite distinct need is to be able to process very large JSON entities
in a "stream-oriesnted manner" (in the sense of streaming XML parsers).
Again, jq's --stream option provides a convenient point of reference, especially
since jq allows this option to be used in conjunction with streams of JSON texts.

@joshgoebel joshgoebel changed the title ER: reading a stream of JSON texts Request: Read additional JSON objects from a Stream Nov 17, 2021
@joshgoebel
Copy link
Owner

joshgoebel commented Nov 17, 2021

The underlying parser pdjson (which we wrap) would seem to suggest this is supported:

By default only one value is read from the stream. The parser can be reset to read more objects. The overall line number and position are preserved.

So I'd imagine it's something that could be added if someone wanted to hook things up (and decide on an appropriate API). Perhaps even as simple as exposing access to json_reset from JSONStream... I think we'd be open to a PR (and tests) if it's truly that simple.

Of course at this point you're already working with a string object... doing something like streaming from a file as you're reading it (in the MOST performant way possible) might be a lot harder. I'd suggest that even without any changes that something simple like JSONL could be parsed by what we have now with a little glue on top of the input stream to first split it into lines, etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants