Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT add method to call inference backend #105

Merged
merged 8 commits into from
Aug 24, 2022

Conversation

adrinjalali
Copy link
Member

This PR introduces a wrapper to make it convenient for users to get outputs for array like inputs.

I'm not sure about the name of the function.

I'll be adding tests.

@BenjaminBossan would you mind having a look and see if this looks okay as a start to you?

@adrinjalali adrinjalali added this to the 0.2 milestone Aug 18, 2022
@adrinjalali
Copy link
Member Author

adrinjalali commented Aug 18, 2022

Note: this cannot be tested or released until huggingface/huggingface_hub#998 is released (unless we patch that variable in this library, but rather not do)

@BenjaminBossan
Copy link
Collaborator

LGTM so far.

Regarding the naming, I have no strong opinion. Is there a reason to prefer "output" over "prediction"? At least when it comes to the planned features, we would always call model.predict, right? Not sure if we ever would want to have, say, transform or decision_function. Alternatively, we could use the word "inference", since we're using the inference API.

@adrinjalali
Copy link
Member Author

yeah that's why I didn't call it predict. I'm thinking a feature we can add is to let users add a "predict_method" to the configuration file, and get predict_proba or transform instead of predict for instance.

@BenjaminBossan
Copy link
Collaborator

In that case, it makes sense to not call it predict (except if that feature would be provided through separate functions).

@adrinjalali
Copy link
Member Author

I'm still not sure on that design though. But we'll get there when we start working on it.

@adrinjalali
Copy link
Member Author

The classifier test passes, the regression one fails, need to figure out why.

@adrinjalali
Copy link
Member Author

Seems like huggingface/api-inference-community#83 is not deployed yet. Once that's done, the tests here should pass.

@adrinjalali
Copy link
Member Author

@BenjaminBossan why do you think the tests work for a classifier but not a regressor?

>       assert all(output == data.target[:5])
E       assert False
E        +  where False = all(array([206.11...128.45984241]) == 0    151.0\n1 ...dtype: float64
E           Full diff:
E           + array([206.11706979,  68.07234761, 176.88406035, 166.91796559,
E           +        128.45984241],
E           + )
E           - 0    151.0
E           - 1     75.0
E           - 2    141.0
E           - 3    206.0
E           - 4    135.0
E           - Name: target, dtype: float64)

@BenjaminBossan
Copy link
Collaborator

Please correct me if I misunderstand, but doesn't that check test if the prediction is 100% accurate? This may occur with classification but is almost impossible with regression.

@adrinjalali
Copy link
Member Author

so the issue was that I was comparing with the training data instead of the model output lol 🤦🏼

Question: should we add this to the user guide? If we do, the user guide will take like 30 seconds at least more to run.

@BenjaminBossan
Copy link
Collaborator

Question: should we add this to the user guide? If we do, the user guide will take like 30 seconds at least more to run.

Hmm, maybe it's sufficient to just document it well, without actually running it?

@adrinjalali
Copy link
Member Author

ready for review

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically perfect for me. I have two minor comments, please take a look.

inputs=inputs
)

if isinstance(res, list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rely on the assumption that if the output is a list, everything went well, else it didn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, and we can be sure that if an error is raised, the API will never return a list. Ideally, I would wish for a response object with a status code but api inference doesn't seem to support that.

On the other hand, when the response is ok, we know it's a list because of how the Pipelines are implemented. I guess this introduces some coupling, as future pipelines must always return a list as output too. For now it works, so we can proceed, but it doesn't feel very robust to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response comes as a json object, so if there are only values in it, it's a list. But I agree a more complex response or having warnings in the header would be better maybe. For now I'd say we can merge and we can change this if the logic on the backend changes. I don't think we can easily future proof this bit here.

X_test = data.data.head(5)
y_pred = model.predict(X_test)
output = get_model_output(repo_id, data=X_test, token=HF_HUB_TOKEN)
assert np.allclose(output, y_pred)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about moving the assert to the end of the test, so that clean up is performed even if the assert fails? Even better would be a fixture or context manager that does the setup and cleanup but moving the assert would already help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't end up using fixtures for this cause I couldn't figure out how to do them easily, and the pytest docs are really sparse when it comes to using request.

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Regarding the fixture, I agree that it would be overkill for this test.

If you want, I can merge, or do you want to wait for more reviews? (Edit: I see now that the tests failed, not sure why)

inputs=inputs
)

if isinstance(res, list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, and we can be sure that if an error is raised, the API will never return a list. Ideally, I would wish for a response object with a status code but api inference doesn't seem to support that.

On the other hand, when the response is ok, we know it's a list because of how the Pipelines are implemented. I guess this introduces some coupling, as future pipelines must always return a list as output too. For now it works, so we can proceed, but it doesn't feel very robust to me.

@BenjaminBossan BenjaminBossan merged commit 527d05c into skops-dev:main Aug 24, 2022
@adrinjalali adrinjalali deleted the inference branch August 24, 2022 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants