Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to convert a AutoModelForCausalLM object to a dspy model object? #1018

Closed
pawanGithub10 opened this issue May 13, 2024 · 7 comments
Closed

Comments

@pawanGithub10
Copy link

import dspy

llm = dspy.HFModel(model='model')

This method takes a string as input for the model if i have a quantized model object of the class AutoModelForCausalLM How i can convert the model object to dspy object?

direct assignment gives error on inference

llm = model #previously created as AutoModelForCausalLM class object

llm("Testing testing, is anyone out there?")

Error After Code Line 4 File /opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:623, in LlamaModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict) 621 raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time") 622 elif input_ids is not None: --> 623 batch_size, seq_length = input_ids.shape 624 elif inputs_embeds is not None: 625 batch_size, seq_length, _ = inputs_embeds.shape

AttributeError: 'str' object has no attribute 'shape'

@Anindyadeep
Copy link
Contributor

I see, but internally hf module internally uses AutoModel module to instantiate weights. So can you explain me why we require to use already loaded model to dspy instead of giving the weight path?

@pawanGithub10
Copy link
Author

I see, but internally hf module internally uses AutoModel module to instantiate weights. So can you explain me why we require to use already loaded model to dspy instead of giving the weight path?

Thanks for reply, The reason is I have a 4 bit quantized model and i want to use it directly. I have tried to save it first to hugging face so that i can load it from weight path but then error comes that the Hugging face does not support saving 4 bit quantized model.

@Anindyadeep
Copy link
Contributor

can you please share the full code for the loading process and your approach? Would appreciate this.

@pawanGithub10
Copy link
Author

pawanGithub10 commented May 14, 2024

dspy_4bitquantized_llama2_error.zip
I have attached the jupyter notebook. In this notebook when I am converting the quantized model then it searches for the config.json as I am giving the AutoModel variable please suggest some workaround or API call to use the quantized model.

@Anindyadeep
Copy link
Contributor

Anindyadeep commented May 20, 2024

Hey @pawanGithub10 I have started to raise a PR by seeing the issue that you faced. Here are some of the cases of loading models would look like:

from dsp.modules.hf_new import HFLocalModel
from transformers import AutoTokenizer, BitsAndBytesConfig 
from transformers import AutoModelForCausalLM

model_path = "../models/llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"


def case1():
    model = HFLocalModel(
        model=model_path,
        tokenizer=tokenizer, 
        load_in_4bit=True ,
        bnb_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4", 
            bnb_4bit_compute_dtype="float16", 
            bnb_4bit_use_double_quant=False
        )
    )

    response = model("hello", do_sample=True)
    print(response)


def case2():
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4", 
            bnb_4bit_compute_dtype="float16", 
            bnb_4bit_use_double_quant=False
        )
    )

    model_ = HFLocalModel(
        model=model,
        tokenizer=tokenizer, 
    )
    response = model_("hello", do_sample=True)
    print(response)

if __name__ == "__main__":
    case1()
    print("---------------------------")
    case2()

Additionally, adding PEFT models are also now supported, with multi gpu support. Now the problem is, I would be able to test till PEFT, for multi gpu support, the tests are not possible, since have no access with multi gpu setting.

@pawanGithub10
Copy link
Author

pawanGithub10 commented May 22, 2024

@Anindyadeep thanks a lot for the detailed help but i feel that i have missed the documentation details.

nit signature:
dspy.HFModel(
model: str,
checkpoint: Optional[str] = None,
is_client: bool = False,
hf_device_map: Literal['auto', 'balanced', 'balanced_low_0', 'sequential'] = 'auto',
token: Optional[str] = None,
model_kwargs: Optional[dict] = {},
)
Docstring: Abstract class for language models.
Init docstring:

Args:
model (str): HF model identifier to load and use
checkpoint (str, optional): load specific checkpoints of the model. Defaults to None.
is_client (bool, optional): whether to access models via client. Defaults to False.
hf_device_map (str, optional): HF config strategy to load the model.
Recommeded to use "auto", which will help loading large models using accelerate. Defaults to "auto".
model_kwargs (dict, optional): additional kwargs to pass to the model constructor. Defaults to empty dict.
File: /opt/conda/lib/python3.11/site-packages/dsp/modules/hf.py
Type: ABCMeta
Subclasses: HFClientTGI, HFClientVLLM, Together, Anyscale, ChatModuleClient, HFClientSGLang

So after reading this i have made following changes and the code work

import dspy
model_specific_param = {"torch_dtype": torch.float16,'quantization_config':bnb_config}
model_name = '/tmp/models/llama2/7b'
llm = dspy.HFModel(model=model_name,model_kwargs = model_specific_param)

@pawanGithub10
Copy link
Author

As per the previous comment i think the issue can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants