Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: option to limit number of events saved per tag per run #253

Closed
mdfirman opened this issue Oct 18, 2018 · 1 comment
Closed

Comments

@mdfirman
Copy link
Contributor

mdfirman commented Oct 18, 2018

edit - made a mistake in understanding of how tensorflow worked - updated issue now

Tensorboard only displays a maximum of 10 images per tag per run - see paragraph 2 in this thread. However, the events file will grow linearly with the number of images:

import numpy as np
from tensorboardX import SummaryWriter

for num_iters in [5, 10, 100]:

    writer = SummaryWriter(str(num_iters))

    for idx in range(num_iters):
        x = np.random.rand(100, 100)
        writer.add_image('Image', x, 0)

This code writes 3 events files - one with 5 steps, one 10 steps and the final with 100 steps.

It would be nice if the file with 100 steps was the same size as the one with 10 steps. However, it isn't, instead the filesize is proportional to the number of steps written:

Num events written Filesize Desired filesize
5 140260 140260
10 280860 280860
100 2779338 280860

My feature request is to allow for SummaryWriter to take an additional argument specifying how many unique steps to save (e,g, inf for backwards compatibility):

writer = SummaryWriter(str(num_iters), max_to_save=10)

I had a look through the code but I'm not sure how trivial this would be to implement. Any thoughts?

@lanpa
Copy link
Owner

lanpa commented Oct 18, 2018

You can display more than 10 images with tensorflow/tensorboard#1138

As for limiting the file size, I have two thoughts:

  1. Event file is written sequentially. Removing previous steps frequently makes lots of IO overhead. It can be done by reading and rewrite the proto files. I prefer post-processing instead of doing that during training.
  • post-processing (a separate program)
  • on the fly (very IO intensive)
  1. steps_to_save is better than max_to_save, where steps_to_save is a list of ints. Users should decide those step numbers. This can be inferred by len(dataloader) and max_to_save. But, once steps_to_save are determined, we can do:
if step in steps_to_save:
  writer.add_something(xx, step)

and there is no need to change SummaryWriter's code.

Both method should work if your training completes gracefully.

But if you ctrl-c your program or it dies somehow. Only method 1 is feasible (keeping all history and process later)

If you are interested in implement this, you should start with some IO stress test.
To read the data, you can see: https://github.com/lanpa/tensorboard-dumper for example.

@lanpa lanpa closed this as completed Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants