Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "ValueError: not enough values to unpack" when using text_embedding models #533

Closed
devnarekm opened this issue Apr 28, 2023 · 6 comments · Fixed by #535
Closed

Comments

@devnarekm
Copy link

devnarekm commented Apr 28, 2023

Hello!

I'm getting the following error when trying to deploy a model with the task type "text_embedding"

image

Here is the command I used:

docker run -it --rm --network host
elastic/eland
eland_import_hub_model
--url "my_url"
-u "my_usrnm" -p "my_pswd"
--hub-model-id sentence-transformers/msmarco-MiniLM-L-12-v3
--task-type text_embedding
--start

I tried all-mpnet-base-v2 as well as some other models and got the same error. Strangely task type text_classification works just fine.

I tried changing line 653 in eland/ml/pytorch/transformers.py so that it unpacks one value only but got a further api error.

Any suggestions are appreciated!

@devnarekm
Copy link
Author

Update:

I managed to run this by applying two alterations:

  1. in \usr\local\lib\python3.9\dist-packages\eland\ml\pytorch\transformers.py change line 653 to
sample_embedding = self._traceable_model.sample_output()

Reason: Only 1 value is being unpacked

After this change I started getting another error (now rest api related) stating that the request body couldn't be parsed due to an extra field "embedding_size", which leads to the second change.

  1. in \usr\local\lib\python3.9\dist-packages\elasticsearch_sync\client_base.py I added the following snippet to line 288
try:
    body['inference_config']['text_embedding'].pop('embedding_size', None)
except:
    pass

Reason: as per the documentation here https://www.elastic.co/guide/en/elasticsearch/reference/current/put-trained-models.html, the request body's inference_config/text_embedding entry does not have a field 'embedding_size'.

This temporarily solves my problem, hope it gets resolved soon on your end!

@davidkyle
Copy link
Member

Thanks for reporting this @NarekMargaryan I opened #535 to check the output of the model

The embedding_size field was added in elastic/elasticsearch#95176 (version 8.8). It is helpful to know the number of dimensions the embedding has when creating the dense_vector field mapping

It is documented in the 8.8 docs: https://www.elastic.co/guide/en/elasticsearch/reference/8.8/put-trained-models.html

davismcphee added a commit to davismcphee/eland that referenced this issue May 12, 2023
@melfebulu
Copy link

so, how to use the eland in 8.7.1 now.. why I can not find the \usr\local\lib\python3.9\dist-packages\elasticsearch_sync\client_base.py ? the error info : ValueError: not enough values to unpack (expected 2, got 1)

@melfebulu
Copy link

I try this, comment the embedding_size....

elif self._task_type == "text_embedding":
sample_embedding = self._traceable_model.sample_output()
#sample_embedding, _ = self._traceable_model.sample_output()
#embedding_size = sample_embedding.size(-1)
inference_config = TASK_TYPE_TO_INFERENCE_CONFIG[self._task_type](
tokenization=tokenization_config,
# embedding_size=embedding_size,
)

@davidkyle
Copy link
Member

Hi @melfebulu the problem will be fixed in the next release.

If commenting out the code works for you then great, alternatively you can checkout the 8.7.0 release which does not have the problematic code:

git checkout v8.7.0

One question if you don't mind, are you building Eland from source or installing the latest release via pip or a similar mechanism? Thanks

@melfebulu
Copy link

thx, I build from source to use, the commenting is working now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants