Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问huggingface版本如何在intel CPU和intel GPU上跑呢 #92

Closed
pengyb2001 opened this issue Jan 16, 2024 · 2 comments
Closed

请问huggingface版本如何在intel CPU和intel GPU上跑呢 #92

pengyb2001 opened this issue Jan 16, 2024 · 2 comments

Comments

@pengyb2001
Copy link

我想用CPU跑这个模型,我尝试按huggingface上面的https://huggingface.co/IEITYuan/Yuan2-2B-hf/blob/main/README.md
的调用方法并修改为

import torch, transformers
import sys, os
sys.path.append(
    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)))
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer

print("Creating tokenizer...")
tokenizer = LlamaTokenizer.from_pretrained('/mnt/disk1/models/Yuan2-2B-hf', add_eos_token=False, add_bos_token=False, eos_token='<eod>')
tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)

print("Creating model...")
# 注意这里移除了对GPU的特定参数
model = AutoModelForCausalLM.from_pretrained('/mnt/disk1/models/Yuan2-2B-hf', use_flash_attention=False)
print(model.config)

inputs = tokenizer("请问目前最先进的机器学习算法有哪些?", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, do_sample=False, max_length=100)
print(tokenizer.decode(outputs[0])) 

但是还是会报错ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn 请问针对intel CPU和intel GPU能否能跑通此模型呢?

@pengyb2001 pengyb2001 changed the title 请问huggingface版本如何在CPU上跑呢 请问huggingface版本如何在intel CPU和intel GPU上跑呢 Jan 16, 2024
@pengyb2001
Copy link
Author

我在另一个issue中找到了方法能让模型在intel CPU上跑,但是如何指定让intel GPU跑还不清楚

将模型文件 [https://huggingface.co/IEITYuan/Yuan2-2B-hf/tree/main]下载到指定路径yuan-2B-path,并手工关掉Flash attention即可。(手工关掉Flash attention:1. 修改 config.json中"use_flash_attention"为 false;
2. 注释掉 yuan_hf_model.py中第35、36行;3. 修改yuan_hf_model.py中第271行为 inference_hidden_states_memory = torch.empty(bsz, 2, hidden_states.shape[2], dtype=hidden_states.dtype)
3.调用代码中修改为model = AutoModelForCausalLM.from_pretrained(yuan-2B-path, device_map="cpu", trust_remote_code=True).eval() #cpu 代码

@pengyb2001
Copy link
Author

已解决,可以使用ipex让intel GPU跑该模型 参见

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant