Suggestion: option to limit number of events saved per tag per run #253

mdfirman · 2018-10-18T16:45:22Z

edit - made a mistake in understanding of how tensorflow worked - updated issue now

Tensorboard only displays a maximum of 10 images per tag per run - see paragraph 2 in this thread. However, the events file will grow linearly with the number of images:

import numpy as np
from tensorboardX import SummaryWriter

for num_iters in [5, 10, 100]:

    writer = SummaryWriter(str(num_iters))

    for idx in range(num_iters):
        x = np.random.rand(100, 100)
        writer.add_image('Image', x, 0)

This code writes 3 events files - one with 5 steps, one 10 steps and the final with 100 steps.

It would be nice if the file with 100 steps was the same size as the one with 10 steps. However, it isn't, instead the filesize is proportional to the number of steps written:

Num events written	Filesize	Desired filesize
5	140260	140260
10	280860	280860
100	2779338	280860

My feature request is to allow for SummaryWriter to take an additional argument specifying how many unique steps to save (e,g, inf for backwards compatibility):

writer = SummaryWriter(str(num_iters), max_to_save=10)

I had a look through the code but I'm not sure how trivial this would be to implement. Any thoughts?

The text was updated successfully, but these errors were encountered:

lanpa · 2018-10-18T18:49:11Z

You can display more than 10 images with tensorflow/tensorboard#1138

As for limiting the file size, I have two thoughts:

Event file is written sequentially. Removing previous steps frequently makes lots of IO overhead. It can be done by reading and rewrite the proto files. I prefer post-processing instead of doing that during training.

post-processing (a separate program)
on the fly (very IO intensive)

steps_to_save is better than max_to_save, where steps_to_save is a list of ints. Users should decide those step numbers. This can be inferred by len(dataloader) and max_to_save. But, once steps_to_save are determined, we can do:

if step in steps_to_save:
  writer.add_something(xx, step)

and there is no need to change SummaryWriter's code.

Both method should work if your training completes gracefully.

But if you ctrl-c your program or it dies somehow. Only method 1 is feasible (keeping all history and process later)

If you are interested in implement this, you should start with some IO stress test.
To read the data, you can see: https://github.com/lanpa/tensorboard-dumper for example.

lanpa closed this as completed Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: option to limit number of events saved per tag per run #253

Suggestion: option to limit number of events saved per tag per run #253

mdfirman commented Oct 18, 2018 •

edited

Loading

lanpa commented Oct 18, 2018

Suggestion: option to limit number of events saved per tag per run #253

Suggestion: option to limit number of events saved per tag per run #253

Comments

mdfirman commented Oct 18, 2018 • edited Loading

lanpa commented Oct 18, 2018

mdfirman commented Oct 18, 2018 •

edited

Loading