Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update __init__.py #83

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Update __init__.py #83

wants to merge 4 commits into from

Conversation

Koziev
Copy link

@Koziev Koziev commented Dec 27, 2023

Make MambaConfig class available from outside in order to allow MambaLMHeadModel customization via constructor config argument.

Making MambaConfig available from outside
Making MambaLMHeadModel more transformers.Trainer-friendly
@Koziev
Copy link
Author

Koziev commented Dec 28, 2023

  1. Minor changes to make MambaLMHeadModel more transformers.Trainer-friendly: return loss as a first element from forward.
  2. Adding filter to config keys in from_pretrained so config.json can have fields _name_or_path and architectures like other models on hf.

Adding method ```to_json_string``` required by transformers.Trainer in some finetuning scenarios
@RonanKMcGovern
Copy link

Would be great if this could be merged and included in a pre-built wheel.

Implementing MambaConfig.to_dict() for compatibility with transformers.Trainer 4.36.2
@albertfgu albertfgu force-pushed the main branch 2 times, most recently from 6d45666 to 41d30ce Compare June 3, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants