Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: ImageLoader should also load images from remote URLs, not just local URIs #2401

Open
ahron1 opened this issue Jun 22, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@ahron1
Copy link

ahron1 commented Jun 22, 2024

Describe the problem

Currently, the ImageLoader data loader loads only locally stored images.

For loading remote images, one needs to write a wrapper to fetch the images first. It will be very helpful ImageLoader would also directly load remote images over HTTP/S.

Describe the proposed solution

It is an easy fix. Just add an extra block to the _load_image definition in ImageLoader, to check if it is a remote or local URI. If it is a HTTP/S URI, use PIL and BytesIO to fetch the image before converting it to a NumPy array.

I already have the code working locally. Happy to submit a PR if people are open to the idea.

Alternatives considered

No response

Importance

would make my life easier

Additional Information

No response

@jeffchuber
Copy link
Contributor

@ahron1 thanks for suggesting this!

@atroyn what do you think here?

@jeffchuber
Copy link
Contributor

@ahron1 looking at the PR you opened - and reviewing the multimodal docs - I think that this functionality should be in a new loader class... something like ImageUrLoader. I'm not seeing the case where someone would want to pass both remote and local filesystem requests and the magic of supporting both could lead to confusion/bugs.

@ahron1
Copy link
Author

ahron1 commented Jun 24, 2024

Thanks for checking!

Indeed, it is better to split it off. With @tazarov 's idea about adding in authentication, it becomes a standalone remote image loader.

@atroyn
Copy link
Contributor

atroyn commented Jun 24, 2024

I agree that this ought to be standalone, though the counter-argument is that other data loaders, e.g. HuggingFace datasets and similar support both kinds of loading. I think in our case however, we do want these to be separate, since they're not about downloading data, but getting data into Chroma.

@ahron1
Copy link
Author

ahron1 commented Jun 25, 2024

HuggingFace was why I originally thought of handling both URIs and URLs together. But indeed, here it should be standalone. I just pushed the changes.

@atroyn atroyn self-assigned this Jun 25, 2024
@ahron1
Copy link
Author

ahron1 commented Jul 14, 2024

Hi @atroyn any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants