Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running segment_anything_fast example locally #3186

Open
yousofaly opened this issue Jun 11, 2024 · 1 comment
Open

Running segment_anything_fast example locally #3186

yousofaly opened this issue Jun 11, 2024 · 1 comment
Labels
triaged Issue has been reviewed and triaged

Comments

@yousofaly
Copy link

yousofaly commented Jun 11, 2024

馃悰 Describe the bug

I have followed the installation instruction in the main readme file, followed by the instructions to run the segment_anything_fast example. I am encountering an odd error.

Error logs

java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1765) ~[?:?]
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) ~[model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1570) [?:?]
2024-06-11T13:46:54,390 [WARN ] W-9006-sam-fast_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: sam-fast, error: Worker died.
2024-06-11T13:46:54,390 [DEBUG] W-9006-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - W-9006-sam-fast_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-06-11T13:46:54,390 [WARN ] W-9006-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-06-11T13:46:54,390 [INFO ] W-9006-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9006 in 55 seconds.
2024-06-11T13:46:54,390 [INFO ] W-9006-sam-fast_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9006-sam-fast_1.0-stdout
2024-06-11T13:46:54,390 [INFO ] W-9006-sam-fast_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9006-sam-fast_1.0-stderr
2024-06-11T13:46:54,396 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - Listening on addr:port: 127.0.0.1:9009
2024-06-11T13:46:54,401 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - Successfully loaded /opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2024-06-11T13:46:54,401 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - [PID]86770
2024-06-11T13:46:54,401 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - Torch worker started.
2024-06-11T13:46:54,401 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - Python runtime: 3.10.14
2024-06-11T13:46:54,401 [DEBUG] W-9009-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - W-9009-sam-fast_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2024-06-11T13:46:54,402 [INFO ] W-9009-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9009
2024-06-11T13:46:54,402 [DEBUG] W-9009-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1718138814402
2024-06-11T13:46:54,402 [INFO ] W-9009-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1718138814402
2024-06-11T13:46:54,402 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - Connection accepted: ('127.0.0.1', 9009).
2024-06-11T13:46:54,403 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - model_name: sam-fast, batchSize: 1
2024-06-11T13:46:54,463 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - Backend worker process died.
2024-06-11T13:46:54,463 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/model_loader.py", line 108, in load
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - module, function_name = self._load_handler_file(handler)
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/model_loader.py", line 153, in _load_handler_file
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - module = importlib.import_module(module_name)
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/importlib/init.py", line 126, in import_module
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - return _bootstrap._gcd_import(name[level:], package, level)
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 1050, in _gcd_import
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 1027, in _find_and_load
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 1006, in _find_and_load_unlocked
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 688, in _load_unlocked
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 883, in exec_module
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 241, in _call_with_frames_removed
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/Users/yousof/Desktop/serve/examples/large_models/segment_anything_fast/model_store/sam-fast/custom_handler.py", line 11, in
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - from segment_anything_fast import SamAutomaticMaskGenerator, sam_model_fast_registry
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/segment_anything_fast/init.py", line 7, in
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - from .build_sam import (
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/segment_anything_fast/build_sam.py", line 11, in
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - from .modeling import ImageEncoderViT, MaskDecoder, PromptEncoder, Sam, TwoWayTransformer
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/segment_anything_fast/modeling/init.py", line 7, in
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - from .sam import Sam
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/segment_anything_fast/modeling/sam.py", line 13, in
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - from .image_encoder import ImageEncoderViT
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/segment_anything_fast/modeling/image_encoder.py", line 15, in
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - from segment_anything_fast.flash_4 import _attention_rel_h_rel_w
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/segment_anything_fast/flash_4.py", line 23, in
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - import triton
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'triton'
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG -
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG -
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/model_service_worker.py", line 263, in
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - worker.run_server()
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/model_service_worker.py", line 231, in run_server
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket)
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/model_service_worker.py", line 194, in handle_connection
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg)
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/model_service_worker.py", line 131, in load_model
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - service = model_loader.load(
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/model_loader.py", line 110, in load
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - module = self._load_default_handler(handler)
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/site-packages/ts/model_loader.py", line 159, in _load_default_handler
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - module = importlib.import_module(module_name, "ts.torch_handler")
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "/opt/anaconda3/envs/trchsrv/lib/python3.10/importlib/init.py", line 126, in import_module
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - return _bootstrap._gcd_import(name[level:], package, level)
2024-06-11T13:46:54,464 [INFO ] nioEventLoopGroup-5-28 org.pytorch.serve.wlm.WorkerThread - 9009 Worker disconnected. WORKER_STARTED
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 1050, in _gcd_import
2024-06-11T13:46:54,464 [DEBUG] W-9009-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 1027, in _find_and_load
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 992, in _find_and_load_unlocked
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 241, in _call_with_frames_removed
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 1050, in _gcd_import
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 1027, in _find_and_load
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - File "", line 1004, in _find_and_load_unlocked
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'ts.torch_handler.custom_handler'
2024-06-11T13:46:54,464 [DEBUG] W-9009-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:300sec
java.lang.InterruptedException: null

Installation instructions

Ihave also tried to follow the docker instructions and am encountering the same error. I am on a MacBook, so no access to a GPU, but as far as I can tell, that shouldn't be an issue as

./build_image.sh
installs a CPU version by default.

Model Packaging

the model was packaged according to the instructions in there sam_fast README.MD file.

config.properties

config.properties is unchanged from the original cloned file.

Versions

I am running python 3.10 and the followng torch packages

pytorch-labs-segment-anything-fast @ git+https://github.com/pytorch-labs/segment-anything-fast.git@3e9c47d2ef18ddf4f179128e8c0f677dd5e989b8
torch==2.2.2
torch-model-archiver @ file:///usr/share/miniconda/envs/__setup_conda/conda-bld/torch-model-archiver_1715885178714/work
torch-workflow-archiver @ file:///usr/share/miniconda/envs/__setup_conda/conda-bld/torch-workflow-archiver_1715885227278/work
torchao==0.1
torchserve @ file:///usr/share/miniconda/envs/__setup_conda/conda-bld/torchserve_1715885095944/work
torchvision==0.17.2

Repro instructions

git clone https://github.com/pytorch/serve.git`
cd serve

Install dependencies

cuda is optional

python ./ts_scripts/install_dependencies.py --cuda=cu121

Latest release

pip install torchserve torch-model-archiver torch-workflow-archiver

Nightly build

pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

cd to the example folder examples/large_models/segment_anything_fast

cd ../examples/large_models/segment_anything_fast

install segment_anything_fast

chmod +x install_segment_anything_fast.sh
source install_segment_anything_fast.sh

download weights

wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

If you are not using A100 for inference, turn off the A100 specific optimization using

export SEGMENT_ANYTHING_FAST_USE_FLASH_4=0

generate model archive

mkdir model_store
torch-model-archiver --model-name sam-fast --version 1.0 --handler custom_handler.py --config-file model-config.yaml --archive-format no-archive  --export-path model_store -f
mv sam_vit_h_4b8939.pth model_store/sam-fast/

start and run

torchserve --start --ncs --model-store model_store --models sam-fast
python inference.py

Possible Solution

No response

@mreso
Copy link
Collaborator

mreso commented Jun 11, 2024

Hi @yousofaly, sorry, but the example will not run on a MacBook as the segment_anything_fast fork is specifically optimized for running on GPU (specifically A100s).

In you case it fails because the triton package is missing which is used to create a custom CUDA kernel to replace an auto generated on in torch.compile:

2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - import triton
2024-06-11T13:46:54,464 [INFO ] W-9009-sam-fast_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'triton'

You can try to run the original segment_anything version in the handler but this might need some modifications to the original example.

@mreso mreso added the triaged Issue has been reviewed and triaged label Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been reviewed and triaged
Projects
None yet
Development

No branches or pull requests

2 participants