Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink the raw report page size #146

Closed
Julian opened this issue Feb 23, 2023 · 27 comments
Closed

Shrink the raw report page size #146

Julian opened this issue Feb 23, 2023 · 27 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@Julian
Copy link
Member

Julian commented Feb 23, 2023

Right now each HTML report when run on the suite is ~10MB, probably because of all the JSON blobs for schemas and instances that we write into each page.

A compressed zip of the site though is only a few MB total, so clearly stuff compresses well.

We likely anyhow should split the raw data from the DOM-related parts of the report (and have the latter read into the former from some simple client-side JS), but perhaps the above is a reason to even write the raw data out compressed and decompress bits of it client-side.

@Julian Julian added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Feb 23, 2023
@AgniveshChaubey

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@AgniveshChaubey

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@AgniveshChaubey

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@AgniveshChaubey

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@AgniveshChaubey
Copy link
Collaborator

AgniveshChaubey commented Mar 9, 2023

Hello @Julian, all the problems I was having have been fixed, and the bowtie is now fully configured and running properly, thanks for your kind support!

As you mentioned in the description itself that one of the ways to reduce the size of the generated HTML report would be to split the raw data from the DOM-related parts of the report, one of the ways to do this is by using zlib library. We can easily compress the raw HTML using zlib's compress() function in order to make the report lightweight and can decompress and dynamically populate the data in the report by using some client-side JS code (let's say by using the fetch() method). Some feedback on this would be really helpful.

@Julian
Copy link
Member Author

Julian commented Mar 9, 2023

Great glad to hear it! Will hide the above comments then so that they're not confusing this specific issue.

We can easily compress the raw HTML using zlib's compress() function in order to make the report lightweight and can decompress and dynamically populate the data in the report by using some client-side JS code

Compressing with zlib sounds fine to me, I didn't particularly compare compression formats -- the idea isn't specifically to get the smallest possible size, anything which takes us from ~10MB to around 1-2MB should be fine, and zlib is hopefully indeed easy to use both on the Python and JS side.

Don't forget though that there's no web server here, it's a pre-generated HTML file (which of course can still run JS when it's opened) -- so there's nowhere to fetch from, but you still can certainly lookup elements in the DOM (i.e. the data we currently emit) and decompress them on the fly.

Let me know if that helps, but sounds like you're indeed on the right track if you're saying you intend to give this issue a shot!

@AgniveshChaubey
Copy link
Collaborator

AgniveshChaubey commented Mar 10, 2023

That's cool! I've started working on it and will let you know if I require any help in between!

@AgniveshChaubey AgniveshChaubey mentioned this issue Mar 11, 2023
@Shrini3
Copy link

Shrini3 commented Mar 11, 2023

Hey @Julian I've successfully setup bowtie as per the documentation but I'm not able to generate any bowtie-report file from the command given in the documentation. Pls let me know if I'm missing anything

Also what is the role of podman on using bowtie

@AgniveshChaubey
Copy link
Collaborator

Hey @Julian I've successfully setup bowtie as per the documentation but I'm not able to generate any bowtie-report file from the command given in the documentation. Pls let me know if I'm missing anything

Hi @Shrini3, could you share a bit more about the result you're getting? Btw, in order to keep this thread issue specific, these discussions would be best on Slack. Cheers!😃

@AgniveshChaubey
Copy link
Collaborator

AgniveshChaubey commented Mar 12, 2023

Hi @Julian, zlib has been implemented to compress the raw data of the report and on the client side, I have used the pako library to decompress the data again. I have created a draft PR in order to show you the code. Please let me know if I'm on the right path.

Another thing is, in what way should the decompressed data be rendered in the HTML elements? Would it be a good choice to assign each element an id and render the data within it using JavaScript (by grabbing the id using getElementById() method and adding the data using .innerHtml(), .innerText() methods)

@Julian
Copy link
Member Author

Julian commented Mar 13, 2023

Hey @AgniveshChaubey -- the PR looks like it may indeed be on the right track, though if I run it, I get an error:

⊙  bowtie suite -i lua-jsonschema https://github.com/json-schema-org/JSON-Schema-Test-Suite/tree/main/tests/draft4/type.json | bowtie report                                                                                                 julian@Postmac
/Users/julian/Development/bowtie/venv/lib/python3.11/site-packages/click/types.py:82: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=2, type=1, proto=0, laddr=('10.254.6.90', 49470), raddr=('140.82.112.5', 443)>
  return self.convert(value, param, ctx)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/julian/Development/bowtie/venv/lib/python3.11/site-packages/click/types.py:82: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=2, type=1, proto=0, laddr=('10.254.6.90', 49471), raddr=('140.82.113.9', 443)>
  return self.convert(value, param, ctx)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2023-03-13 11:03.35 [info     ] Will speak                     dialect=http://json-schema.org/draft-04/schema#
2023-03-13 11:03.36 [warning  ] Implicit dialect not acknowledged. Proceeding, but implementation may not have configured itself to handle schemas without $schema. [ghcr.io/bowtie-json-schema/lua-jsonschema] dialect=http://json-schema.org/draft-04/schema# response=StartedDialect(ok=False)
2023-03-13 11:03.36 [info     ] Finished                       count=11
Traceback (most recent call last):
  File "/Users/julian/Development/bowtie/venv/bin//bowtie", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/julian/Development/bowtie/venv/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/julian/Development/bowtie/venv/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/julian/Development/bowtie/venv/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/julian/Development/bowtie/venv/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/julian/Development/bowtie/venv/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/julian/Development/bowtie/bowtie/_cli.py", line 107, in report
    report_data.encode()
    ^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'encode'

does it run properly for you?

You also probably will not be able to do things precisely as you have them because some things need to be available when rendering the template, rather than at page-load time -- if you're asking whether it's OK to reorder things there that's fine with me I think -- the main goal would be to get something identical to what we see right now but which is smaller -- so if you do want to instead dynamically render HTML by pulling elements out of the compressed blob and creating something dynamically using JS that sounds fine to me -- does that help at all? If not I can elaborate further!

Thanks again for giving this a shot.

@AgniveshChaubey
Copy link
Collaborator

AgniveshChaubey commented Mar 13, 2023

Yes, it would be really helpful if you can elaborate a bit more!

does it run properly for you?

Actually, I'm not sure how to run the tests in a development environment. But yes, I am getting the HTML report when running one of the following commands

bowtie suite -i lua-jsonschema https://github.com/json-schema-org/JSON-Schema-Test-Suite/tree/main/tests/draft4/type.json | bowtie report 
python __main__.py suite -i lua-jsonschema https://github.com/json-schema-org/JSON-Schema-Test-Suite/tree/main/tests/draft4/type.json | bowtie report 

.

Edit:
Got the error - In this code, compressed_data = zlib.compress(report_data.encode()) report_data is a dictionary object and .encode() can only be applied to strings . That's why the error occurred 'AttributeError: 'dict' object has no attribute 'encode'

Please let me know how to test the changes locally to check if it works as expected.

@Julian
Copy link
Member Author

Julian commented Mar 13, 2023

You might be running into the same thing another contributor was -- basically make sure Bowtie is installed in editable mode -- I added a note here: https://bowtie-json-schema.readthedocs.io/en/latest/contributing/#installation -- if you don't have -e added you won't see changes you make "affect" report generation.

Does that help perhaps?

@AgniveshChaubey
Copy link
Collaborator

AgniveshChaubey commented Mar 14, 2023

Hi @Julian, I have been working extremely hard for the past two days to anyhow reduce the HTML report size, but getting new challenges every time when one is resolved. After doing trial and error for an entire day, I finally came up with a solution to keep the raw data in a separate file. (The main problem was figuring out how to parse custom Python objects to json file bcoz only primitive object types are allowed for parsing and in order to parse objects with custom type, we need to create a custom serialization method). I also have successfully compressed the report data using zlib library.

As of now, I am having some questions:

  1. To access the data from the separately created file (using client-side JS and XHTMLRequest), we'll have to provide some permissions explicitly to the browser. Rendering the report this way will tremendously reduces the file size but it adds some extra steps and might seem like having some security concerns as well. I wonder if there is any other way to read the data into report without any extra permission?
  2. How to grab the dynamically generated compressed data inside jinja2 template? I tried with several ways but was not able to do so.
  3. Is this issue really a bit tricky, or just I am making it complex? Is there any simpler way possible to handle this issue? (Asking as it is labeled as good first issue)

@Julian
Copy link
Member Author

Julian commented Mar 15, 2023

Hey @AgniveshChaubey!

First -- it's perfectly fine to shift to something else if things seem more challenging! There's certainly no issue with doing so, you may find it more fun to shift to making some purely UI-related cosmetic improvement and find that a bit smoother, and I can certainly help find something if you prefer to try that!

Will respond to your comment though in case it helps -- and to be sure, I haven't fully thought through this issue (otherwise I likely would have implemented it :), so I am trying to give helpful breadcrumbs but they very well may lead you down dead ends!

After doing trial and error for an entire day, I finally came up with a solution to keep the raw data in a separate file. (The main problem was figuring out how to parse custom Python objects to json file bcoz only primitive object types are allowed for parsing and in order to parse objects with custom type, we need to create a custom serialization method)

I don't 100% follow what you mean here -- the data we build a report from is already valid JSON -- i.e. what is spit out by bowtie suite ... is JSON, and we use that JSON data to feed into bowtie report. So it's precisely that JSON file that we could indeed split off and retrieve separately from the UI (such that bowtie report produces a piece of HTML that assumes it will make a request to retrieve that JSON from somewhere, which I think is what you were suggesting), but I don't think it should involve any additional serialization work. Let me know if I've misunderstood what you mean or what you've done.

To access the data from the separately created file (using client-side JS and XHTMLRequest), we'll have to provide some permissions explicitly to the browser.

Are you referring to reading (other) files from the local filesystem from the .HTML file?

Just to make sure -- there are 2 ways someone may want to view the report:

I think the kind of permissions you're talking about probably apply more to the second than the first, yes? For the "hosted" HTML page, we could have it reach out to a JSON file which is hosted alongside the page (I mentioned something similar here in the context of JSON for badges -- I'm not 100% sure that works "easily" but I think it's very likely) -- in other words having the data external, retrieving it via client-side JS, and then dynamically rendering what we want, but here I think as you're pointing out that's easier for the hosted case than the local case (which I think is fine! we can probably concentrate on the hosted case for now, and if the local case requires someone to enable some extra permissions they can choose to do so).

How to grab the dynamically generated compressed data inside jinja2 template? I tried with several ways but was not able to do so.

I think the above is related here -- I would wonder whether if we produce a gzipped JSON file and then try to retrieve it from Javascript, does GitHub pages properly serve it with the right Content-Type header such that it's automatically decompressed.

But again, there's still some things to check here so it's perfectly possible things are indeed not 100% straightforward!

I hope some of the above at least helps a bit, but let me know what you think.

@AgniveshChaubey
Copy link
Collaborator

AgniveshChaubey commented Mar 15, 2023

First -- it's perfectly fine to shift to something else if things seem more challenging! There's certainly no issue with doing so, you may find it more fun to shift to making some purely UI-related cosmetic improvement and find that a bit smoother, and I can certainly help find something if you prefer to try that!

I totally agree with you, but still I wanna work on this issue as I am learning a lot along the way. I have good experience with UI-related stuff, but this issue is quite new as well as challenging for me, and I am really getting more value working on it. If you are having some UI-related improvements in mind, do consider mentioning them, I'll surely check them out after this one!

I don't 100% follow what you mean here -- the data we build a report from is already valid JSON - .....

below is the code to dump the data in a separate file....

@click.option(
    "--data",
    "-d",
    "data_file",
    help="Where to write the raw data used to generate the report.",
    default="bowtie-data.json",
    type=click.File("w")
)

inside report function

.....

 report_data = _report.from_input(input)  # to generate the report data

    # Write the raw data to a file
    with open("bowtie-data.json", 'w') as data_file:
        json.dump(report_data, data_file)
.....

when running the tests with these changes, the following error occurs which I have mentioned here

(The main problem was figuring out how to parse custom Python objects to json file bcoz only primitive object types are allowed for parsing and in order to parse objects with custom type, we need to create a custom serialization method).

TypeError: Object of type _Summary is not JSON serializable

Should I just use data_file.write(str(report_data)) instead of

with open("bowtie-data.json", 'w') as data_file:
        json.dump(report_data, data_file)

to write into the new file? (but this would be in string formate)

Are you referring to reading (other) files from the local filesystem from the .HTML file?

I am referring to just reading the newly generated file json file from HTML file.

@Julian
Copy link
Member Author

Julian commented Mar 15, 2023

below is the code to dump the data in a separate file....

What's the need for this new file? What does it do differently than what the output of bowtie suite already does?

@AgniveshChaubey
Copy link
Collaborator

AgniveshChaubey commented Mar 15, 2023

What's the need for this new file? What does it do differently than what the output of bowtie suite already does?

This is the different way I was trying other than compress-decompress one
The rough idea is to dump the entire data into a separate file. And rendering the HTML report with just some basic data (this basic data would render directly at the time of rendering the template) and the rest of the data would be rendered using some client-side JavaScript (by retrieving the data from the newly generated file). This will reduce the file size and the remaining data would be rendered on the fly.

This is the reason why I'm trying to create a new file.

Hosted, right now at https://bowtie-json-schema.github.io/bowtie/ which essentially is an HTML file we just publish via GitHub pages

I would like to know a bit more about how hosted reports are updated with the data received by the user run tests?

@Julian
Copy link
Member Author

Julian commented Mar 15, 2023

The rough idea is to dump the entire data into a separate file.

Yes but that's already what bowtie suite does!

Right now, you run:

bowite suite <some arguments>.

This spits out JSON data.

If you pipe that into bowtie report, it consumes that input data, combines it with some HTML templating and spits out a piece of HTML for you to load.

I agree it's good (both for compression and otherwise) to just leave all the data as JSON and render everything using some client-side JavaScript -- I'm again curious what you're proposing that isn't already just "leave the data exactly as it is currently emitted by bowtie suite" -- that data contains all the test results (everything that's needed to build a report currently), and is serializable already!

I'm not saying it's perfect as-is, I could certainly imagine tweaking it -- but I don't understand what you'd want out of a new file -- that one already should be exactly the data you're trying to get, no?

I would like to know a bit more about how hosted reports are updated with the data received by the user run tests?

On a daily (or manually triggered basis), we run bowtie suite <...> | bowtie report and then upload the results to GitHub pages -- that happens here -- is that what you're looking for?

@AgniveshChaubey
Copy link
Collaborator

If I am not wrong, when piping bowtie suite ... with bowtie report, it would inject all the obtained data in the template, and thus, the report would become heavy. Keeping the raw data in a separate file would reduce the size of the report (as most of the weight will be in the data file) and thus, will take less time to load. I am not sure how to do this directly with the data coming out of bowtie suite... command, that's why I am creating a new file to keep the data within it.

If the above steps can be done directly (with bowtie suite...), some hints on this would be really helpful.

@Julian
Copy link
Member Author

Julian commented Mar 15, 2023

Right you're not wrong, we don't want to keep piping the output into bowtie report in this case -- but if you run e.g. >out.json bowtie suite -i lua-jsonschema whatever now you have a file called out.json that you can use over and over again -- so if you have some HTML file you can go retrieve that file probably the same way you were looking into doing.

Does that help?

@AgniveshChaubey
Copy link
Collaborator

AgniveshChaubey commented Mar 16, 2023

I think the above is related here -- I would wonder whether if we produce a gzipped JSON file and then try to retrieve it from Javascript.....

As per the suggestion, the json string (coming from bowtie suite...) has been compressed using gzip and has been successfully decompressed as well in a dummy HTML report (this report is just for testing purpose in order to avoid errors). I have precisely verified the json data before and after compression, there is no data lost in the process and both the json (before and after) are exactly same.

Before compression, the json data was accessible with the following expression ({{ run_info.started }}, {{ run_info.bowtie_version }}). But when it is decompressed and saved in a jsonData variable (note- data is still exactly same as before), it is not accessible using javascript (console.log(jsonData.run_info.started), console.log(jsonData.run_info.bowtie_version), etc shows the value as undefined). What might be the possible reason for this?

I have updated the PR, and have added some console logs for reference. Please have a look over it.

@Julian
Copy link
Member Author

Julian commented Jun 20, 2023

Fixed (I believe, though haven't double checked yet) by #301 -- thanks @harrel56!

@Julian Julian closed this as completed Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants