Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

[MLH Phase 2] Refactor Logging System #284

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

akainth015
Copy link
Contributor

@akainth015 akainth015 commented Apr 8, 2021

The new logging system will be based around an event storage for each SelfSupervisionTask and events will be logged to their corresponding output device by VisslEventWriters like the JsonWriter and TensorboardWriter

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 8, 2021
if key == "ep"
else f"{stdout_data}{key}: {value}; "
evt_stg.put_scalars(
rolling_btime=rolling_btime, rolling_eta=rolling_eta_secs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I prefer keeping all times in human readable string. We may want to add a VisslEventStorage#put_string or change #put_scalars to #put_values and make it type agnostic.

@@ -126,6 +126,7 @@ config:
# ----------------------------------------------------------------------------------- #
TENSORBOARD_SETUP:
# whether to use tensorboard for the visualization
VISUALIZATION_SAMPLE_PERIOD: -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment what this is.
nit: move comment below USE_TENSORBOARD

@@ -52,7 +52,7 @@ class SSLClassyHookFunctions(Enum):
on_end = auto()


def default_hook_generator(cfg: AttrDict) -> List[ClassyHook]:
def default_hook_generator(cfg: AttrDict, event_storage) -> List[ClassyHook]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Type hint for event_storage.

@@ -116,6 +116,13 @@ def standard_train_step(task):
sample = next(task.data_iterator)

sample = construct_sample_for_model(sample, task)
vis_period = task.config.HOOKS.TENSORBOARD_SETUP.VISUALIZATION_SAMPLE_PERIOD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we comment this code block?

storage = task.event_storage
storage.put_images()
name = f"Model input sample: iteration: {task.iteration_num}"
for idx, vis_img in enumerate(sample["input"]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: If batch size is large, this will upload a lot of images -- do we want to offer the option to only sample a small portion of the inputs?

several backends (json, tensorboard, etc)
"""

def __init__(self, start_iter=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add logic that ensures that we set the start_iter correctly upon loading a checkpoint, I don't see this logic anywherere?

@@ -86,6 +88,7 @@ def __init__(
)
logging.info("Setting up SSL Tensorboard Hook...")
self.tb_writer = tb_writer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need tb_writer anymore in this class? Can we refactor ActivationStatisticsMonitor to use event_storage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirming: none of these methods write to tensorboard anymore and that it's only handled in TensorBoardWriter#Write.

)
logging.info(stdout_data.strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I believe we lose the logging to stdout here, which imo we should definitely keep -- super helpful for debugging.

We could create another EventStorageWriter called StdOutWriter and add to event_storage_writers.

]

@property
def event_storage(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use self.event_storage internally and externally.

Add a scalar `value` to the `HistoryBuffer` associated with `name`.
"""
history = self._history[name]
value = float(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do we need this?

@pranavsinghps1
Copy link
Contributor

@iseessel I see some of the comments are unaddressed, do we need to work on these? Lmk if there is anything I could do!

facebook-github-bot pushed a commit that referenced this pull request Dec 28, 2022
Summary: Pull Request resolved: fairinternal/ssl_scaling#284

Reviewed By: odelalleau

Differential Revision: D42220017

Pulled By: QuentinDuval

fbshipit-source-id: 742419aa859fdbe4bc80f1f9e9f4771fee0f41a2
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants