Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor fixes 20231211 #943

Closed
wants to merge 3 commits into from
Closed

minor fixes 20231211 #943

wants to merge 3 commits into from

Conversation

winglian
Copy link
Collaborator

No description provided.

@winglian winglian changed the title misc fixes minor fixes 20231211 Dec 12, 2023
Copy link
Collaborator

@NanoCode012 NanoCode012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add nano in base docker :)

@@ -191,6 +191,7 @@ def load_model(

# TODO refactor as a kwarg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's this todo here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure what this TODO is for atm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's for the .from_pretrained call where these are kwargs for that? should we add this to model_kwargs and remove those from all the from_pretrained calls?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just above

   load_in_8bit = cfg.load_in_8bit
     load_in_4bit = cfg.load_in_4bit

@@ -535,7 +536,7 @@ def load_model(

model, lora_config = load_adapter(model, cfg, cfg.adapter)

if cfg.ddp and not load_in_8bit:
if cfg.ddp and not load_in_8bit and not load_in_4bit:
model.to(f"cuda:{cfg.local_rank}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the side effect of this change? Does it mean, the models will all live on gpu0 for 4 bit now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's an edge case that I had to have this set when doing DPO qlora for Mixstral. maybe it was a deepspeed thing and I should change this?

@NanoCode012
Copy link
Collaborator

I found that you separated your apt installs into 2 separate docker file. Maybe, these should also be moved to base

https://github.com/OpenAccess-AI-Collective/axolotl/blob/7fabc4d95e9b34e0cfaace2a497cd4fedc43db3b/docker/Dockerfile#L13C1-L13C32

winglian added a commit that referenced this pull request Jan 10, 2024
@winglian winglian mentioned this pull request Jan 10, 2024
@winglian winglian closed this Jan 10, 2024
winglian added a commit that referenced this pull request Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants