Feat(wandb): Refactor to be more flexible #767

NanoCode012 · 2023-10-22T14:40:33Z

Closes #236

Breaking change:

Swap wandb_run_id -> wandb_name following wandb eng's recommendation Improve documentation and implementation of W&B options #236 (comment) . I have added a warning and backward compatible fix for now.
RemovesWANDB_DISABLED env if wandb_project passed. This should be the expected behavior.

New feature:

Now allows any wandb_ config to be passed to env for more flexibility
More tests!

winglian

Does this PR make a major difference since it still points to the same keyword arg here?

NanoCode012 · 2023-10-22T16:25:16Z

Yes!

it should allow other wandb env without us having to manually add.
we use run_id and instead of letting wandb automatically generate that. The Engineer said this wasn't good to do. They recommended using name.
fixed an unexpected issue with conflict of WANDB_DISABLED

Edit: I noticed one edge case from reading their docs on run_id. During resume, it might not continue in the same training and show as new/separate one.

IgnacioFDM · 2023-10-24T00:48:44Z

Is resume fixed with this? Currently it starts overriding old data (because step begins from 0 after resuming I think).

NanoCode012 · 2023-10-24T00:53:48Z

@IgnacioFDM , yes! That should fix this. Previously, we set run_id which is their internal tracking method. Now, we use name. However, I'm not sure if this will create a new separate run or actually resume. If anyone has the bandwidth, feel free to do a quick test.

IgnacioFDM · 2023-10-24T02:55:44Z

Just tested it and each time you resume, it creates a new run (with the same name) on wandb.

NanoCode012 · 2023-10-24T03:24:05Z

@tcapelle , thanks for your code snippet a while back and sorry for this late PR. Following your feedback to use wandb_name instead of wandb_run_id, may I ask if you have any idea how to perform a resume in such a case? According to the comment above, it creates a new run instead.

tcapelle · 2023-10-24T11:53:51Z

Ohh great! Yeah, for resuming, you need the id, so you would need to keep both attributed. Sorry for missing that.

IgnacioFDM · 2023-10-24T12:37:07Z

I tried setting the WANDB_RUN_ID and it works like it did before this PR: it continues the run, but still leads to wrong data (either new steps don't show up, or they show up but it starts deleting earlier steps).

I think the issue arises from the fact that step (not global_step) resets to 0 when you resume.

NanoCode012 · 2023-11-29T14:39:47Z

Rebased! Updated it to support both wandb_name and wandb_run_id instead of forcing all to name following idea from #824

* Feat: Update to handle wandb env better * chore: rename wandb_run_id to wandb_name * feat: add new recommendation and update config * fix: indent and pop disabled env if project passed * feat: test env set for wandb and recommendation * feat: update to use wandb_name and allow id * chore: add info to readme

winglian reviewed Oct 22, 2023

View reviewed changes

NanoCode012 force-pushed the feat/wandb_refactor branch from 7193ae9 to b648845 Compare October 22, 2023 16:30

winglian mentioned this pull request Nov 5, 2023

Add the option wandb_run_name #824

Closed

NanoCode012 added 6 commits November 29, 2023 23:25

Feat: Update to handle wandb env better

2f3ddcf

chore: rename wandb_run_id to wandb_name

1a71047

feat: add new recommendation and update config

3ee7368

fix: indent and pop disabled env if project passed

bcd5882

feat: test env set for wandb and recommendation

f194e0b

feat: update to use wandb_name and allow id

bfbaa88

NanoCode012 force-pushed the feat/wandb_refactor branch from b648845 to bfbaa88 Compare November 29, 2023 14:37

NanoCode012 requested a review from winglian November 29, 2023 14:39

winglian approved these changes Dec 4, 2023

View reviewed changes

chore: add info to readme

646a9f0

NanoCode012 merged commit a1da39c into axolotl-ai-cloud:main Dec 4, 2023
4 checks passed

NanoCode012 deleted the feat/wandb_refactor branch December 4, 2023 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat(wandb): Refactor to be more flexible #767

Feat(wandb): Refactor to be more flexible #767

NanoCode012 commented Oct 22, 2023

winglian left a comment

NanoCode012 commented Oct 22, 2023 •

edited

Loading

IgnacioFDM commented Oct 24, 2023

NanoCode012 commented Oct 24, 2023

IgnacioFDM commented Oct 24, 2023

NanoCode012 commented Oct 24, 2023

tcapelle commented Oct 24, 2023 •

edited

Loading

IgnacioFDM commented Oct 24, 2023 •

edited

Loading

NanoCode012 commented Nov 29, 2023

Feat(wandb): Refactor to be more flexible #767

Feat(wandb): Refactor to be more flexible #767

Conversation

NanoCode012 commented Oct 22, 2023

winglian left a comment

Choose a reason for hiding this comment

NanoCode012 commented Oct 22, 2023 • edited Loading

IgnacioFDM commented Oct 24, 2023

NanoCode012 commented Oct 24, 2023

IgnacioFDM commented Oct 24, 2023

NanoCode012 commented Oct 24, 2023

tcapelle commented Oct 24, 2023 • edited Loading

IgnacioFDM commented Oct 24, 2023 • edited Loading

NanoCode012 commented Nov 29, 2023

NanoCode012 commented Oct 22, 2023 •

edited

Loading

tcapelle commented Oct 24, 2023 •

edited

Loading

IgnacioFDM commented Oct 24, 2023 •

edited

Loading