Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for ONNX Conversion Script #6

Closed
godiclee opened this issue Jun 25, 2024 · 4 comments
Closed

Request for ONNX Conversion Script #6

godiclee opened this issue Jun 25, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@godiclee
Copy link

Thank you for your amazing work on the project. Could you kindly provide a script to convert the SigClip-large model (specifically the image encoder) to ONNX format? I would greatly appreciate your assistance with this.

@rhysdg rhysdg self-assigned this Jun 25, 2024
@rhysdg rhysdg added the enhancement New feature or request label Jun 25, 2024
@rhysdg
Copy link
Owner

rhysdg commented Jun 25, 2024

Hey there @godiclee! I'm glad it's all proving of use to you. Forsure, you just need to leverage Huggingface's AutoProcessor , and use opset_version=13 likes so:

import onnx
from PIL import Image
from transformers import AutoProcessor, AutoModel

variant = "google/siglip-large-patch16-384"
model = AutoModel.from_pretrained(variant).eval()
processor = AutoProcessor.from_pretrained(variant)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
texts = ["a photo of 2 cats", "a photo of 2 dogs"]
inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt")


inputs_list = ['input_ids', 'pixel_values']
outputs = ['logits_per_image', 
                  'logits_per_text',
                 'image_embeds',
                 'text_embeds',
                 'text_model_hidden',
                 'text_model_pooler',
                 'vision_model_hidden',
                 'vision_model_pooler'
                 ]

dynamic_axes = {'input_ids': {0: 'text_batch_size', 1: 'sequence_length'},
 'pixel_values': {0: 'image_batch_size', 1: 'num_channels', 2: 'height', 3: 'width' },
 'logits_per_image': {0: 'image_batch_size', 1: 'text_batch_size'},
 'logits_per_text': {0: 'text_batch_size', 1: 'image_batch_size'},
 'image_embeds': {0: 'image_batch_size'},
 'text_embeds': {0: 'text_batch_size'},
}


# export
torch.onnx.export(
    model, 
    (inputs['input_ids'],inputs['pixel_values']),
    "siglip_large/siglip-large.onnx",  
    export_params=True,
    input_names=inputs_list, 
    output_names=outputs,
    dynamic_axes=dynamic_axes,
    do_constant_folding=True, 
    opset_version=13, 
)

I'm yet to formalise it in here as I'm trying to strip back the back need for the transformers library for lightweight deployment on Jetson boards, Raspbery Pi etc but I'll like have an export repo up and running soon, or it looks like full support is on it's way at optimum soon too

@rhysdg
Copy link
Owner

rhysdg commented Jun 25, 2024

^ Notice that I'm assigning the weights to a separate folder too as they have to stay external with a model that's 2gb+, otherwise you'll run into a protobuf error

I've verified with Netron and the following snippet and all is well ;)

import onnxruntime

session =  onnxruntime.InferenceSession(
                    'siglip_large/siglip-large.onnx', providers=onnxruntime.get_available_providers())

inputs = processor(text=texts, images=image, padding="max_length", return_tensors="np")

res = session.run(None, {'input_ids': inputs['input_ids'],
                                        'pixel_values': inputs['pixel_values']})[0]

res = scipy.special.expit(res)

@rhysdg rhysdg pinned this issue Jun 25, 2024
@godiclee
Copy link
Author

Thanks a lot! It works well.

@rhysdg
Copy link
Owner

rhysdg commented Jul 3, 2024

Thanks a lot! It works well.

Glad to hear it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants