Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neptune integration #648

Merged
merged 14 commits into from
Jan 14, 2020
Merged

Conversation

jakubczakon
Copy link
Contributor

@jakubczakon jakubczakon commented Dec 24, 2019

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • [] Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Creates integration described in #642

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

I sure did :)

Note

I had trouble finding how to set up local docs so I am not sure if those work property.
Also, I added note about logging to readme which links to docs which I am quite sure are not created.
So I presume there will be trouble here.

@jakubczakon
Copy link
Contributor Author

It seems that the failed tests are related to test_restore_module.py
Specifically, the accuracy of loaded models is < 0.5

 AssertionError: this model is expected to get > 0.5 in test set (it got 0.4375)

Not sure how to proceed.
Is there any way you could help me with this?

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition, Thx 👍
mainly remove try/except in tests

try:
import neptune
except ImportError:
raise ImportError('Missing neptune package. Run pip install neptune-client')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add ` to the command

"""Verify that basic functionality of neptune logger works."""
tutils.reset_seed()

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this try/except

"""Verify that pickling trainer with neptune logger works."""
tutils.reset_seed()

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this try/except

self.experiment.set_property(key, value)

@rank_zero_only
def append_tags(self, tag, *tags):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tag/tags as two variables sound quite strange, rather make it generic and just single var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to support all the following options:

logger.append_tags('resnet')
logger.append_tags('resnet', 'augmentations')
logger.append_tags(['resnet','augmentations'])

But I think for the simplicity I will go with the last option (list).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well you can make a simple fork via using single variable:

def append_tags(self, tags):
    if not isinstance(tags, (list, set, tuple)):
        tags = [tags]  # make it as an iterable is if it is not yet
    ...


:param str log_name: The name of log, i.e. bboxes, visualisations, sample_images.
:param str|PIL.Image|matplotlib.figure.Figure image: The value of the log (data-point).
Can be one of the following types: PIL image, matplotlib.figure.Figure, path to image file (str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add spaces at the beginning, so it is assigned to the paramaters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so more parameters in the docstring...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think added spaces wherever needed.

Do you mean I should refactor passing neptune-specific parameters as one dictionary or something like that?

Requires either an API Key (online mode) or a local directory path (offline mode)

:param str|None api_key: Required in online mode. Neputne API token, found on https://neptune.ml.
Read how to get your API key here https://docs.neptune.ml/python-api/tutorials/get-started.html#copy-api-token.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add spaces at the beginning, so it is assigned to the parameters

@Borda
Copy link
Member

Borda commented Dec 26, 2019

To the failing test, pls try increase number of training epochs, like a double...

@jakubczakon
Copy link
Contributor Author

Ok on it (failing test) @Borda

@jakubczakon
Copy link
Contributor Author

@Borda
Copy link
Member

Borda commented Dec 26, 2019

3 travis jobs worked but 2 still fail.
Could you please take a look @Borda

it looks strange, it seems that the checkpoint is not created, can you can the test locally?

@jakubczakon
Copy link
Contributor Author

Locally it passes and in those 3 other travis jobs it passes as well.

@kuynzereb
Copy link
Contributor

I have one hypothesis. Default checkpoint is initialized with save_top_k=1, so it saves only one checkpoint. And this checkpoint may be not from the last epoch, but the test tries to find the checkpoint from the last epoch

@kuynzereb
Copy link
Contributor

I have had the similar issue in #653 and setting save_top_k=-1 really helped.

@jakubczakon
Copy link
Contributor Author

Thank you @kuynzereb and @Borda it worked!

@Borda
Copy link
Member

Borda commented Dec 27, 2019

I have one hypothesis. Default checkpoint is initialized with save_top_k=1, so it saves only one checkpoint. And this checkpoint may be not from the last epoch, but the test tries to find the checkpoint from the last epoch

true, in theory, it may happen that the next epoch does not lower training/validation loss as the loss is usually oscillating and the test case is to be minimal in terms of time and computational power...

@jakubczakon
Copy link
Contributor Author

Hi there @Borda, happy new years!

I was wondering if there is anything else I could do to help merge this PR?

@Borda
Copy link
Member

Borda commented Jan 7, 2020

@jakubczakon nop, it is good from my side, Nice work!
@williamFalcon could you have look, it LGTM

@williamFalcon
Copy link
Contributor

@jakubczakon great addition!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants