Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU and M1/M2 GPU platform support #71

Closed
wants to merge 0 commits into from

Conversation

xiezhq-hermann
Copy link
Collaborator

Minimal modification to extend FlexGen to CPU and M1/M2 GPU platforms.
Not fully tested with various offloading settings.
@Ying1123 @merrymercy

@HIRANO-Satoshi
Copy link

I tried.

0af9051 *   main Merge branch 'xiezhq-hermann/main'
        |\  
18482fa | * xiezhq-hermann/main update CPU and m1/m2
980ca74 | *   merge latest main
        | |\  
332849a | * | enable CPU and M1/M2 platform
fea8321 * | | origin/main update version
896e1e0 * | | Update README.md
50ae8ad * | | Delete README.md
9d888e5 * | Move apps into flexgen package (#70)

Seems something wrong.

ppa-hirano:FlexGen hirano-s$ python3 -m flexgen.flex_opt --model facebook/opt-1.3b
Exception in thread Thread-1 (copy_worker_func):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Exception in thread Thread-2 (copy_worker_func):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Exception in thread Thread-3 (copy_worker_func):
model size: 2.443 GB, cache size: 0.398 GB, hidden size (prefill): 0.008 GB
init weight...
Exception in thread Thread-4 (copy_worker_func):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    self.run()
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self.run()
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self.run()
    self._target(*self._args, **self._kwargs)
  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    self.run()
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    self._target(*self._args, **self._kwargs)
  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    self._target(*self._args, **self._kwargs)
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
    exec(code, run_globals)
    torch._C._cuda_setDevice(device)
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
    torch._C._cuda_setDevice(device)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 1334, in <module>
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
    torch._C._cuda_setDevice(device)
    torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
    run_flexgen(args)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 1218, in run_flexgen
    model = OptLM(opt_config, env, args.path, policy)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 617, in __init__
    self.load_weight_stream = torch.cuda.Stream()
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/streams.py", line 34, in __new__
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
Exception ignored in: <function OptLM.__del__ at 0x11b250040>
Traceback (most recent call last):
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 1148, in __del__
    self.delete_all_weights()
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 803, in delete_all_weights
    self.delete_weight(j, 0)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 669, in delete_weight
    for x in self.weight_home[j].pop():
AttributeError: 'OptLM' object has no attribute 'weight_home'
ppa-hirano:FlexGen hirano-s$ 

@xiezhq-hermann
Copy link
Collaborator Author

@HIRANO-Satoshi Did you just run the code on your Mac machine? If so, you should add --platform "mps:0" into the command. If you tested it on a machine with NVIDIA GPU, can you try the latest commit (I merged it for you) and rebuild FlexGen? I am not sure what the codes you just ran are.

@HIRANO-Satoshi

This comment was marked as outdated.

@HIRANO-Satoshi
Copy link

I don't have NVIDIA. With --platform cpu, it start working. Thanks much!

Maybe apps/completion.py needs the --platform option.

ppa-hirano:FlexGen hirano-s$ python3 -m flexgen.apps.completion --model facebook/opt-1.3b
...
 File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
    torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
'''

ppa-hirano:FlexGen hirano-s$ python3 -m flexgen.apps.completion --model facebook/opt-1.3b --platform cpu
usage: completion.py [-h] [--model MODEL] [--path PATH]
[--offload-dir OFFLOAD_DIR]
[--percent PERCENT [PERCENT ...]]
[--pin-weight [PIN_WEIGHT]] [--compress-weight]
[--compress-cache]
completion.py: error: unrecognized arguments: --platform cpu

@HIRANO-Satoshi
Copy link

A proper default without an explicit option would be better.

I'm curious how Apple Neural Engine is fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants