-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT add method to call inference backend #105
Conversation
Note: this cannot be tested or released until huggingface/huggingface_hub#998 is released (unless we patch that variable in this library, but rather not do) |
LGTM so far. Regarding the naming, I have no strong opinion. Is there a reason to prefer "output" over "prediction"? At least when it comes to the planned features, we would always call |
yeah that's why I didn't call it predict. I'm thinking a feature we can add is to let users add a "predict_method" to the configuration file, and get |
In that case, it makes sense to not call it predict (except if that feature would be provided through separate functions). |
I'm still not sure on that design though. But we'll get there when we start working on it. |
The classifier test passes, the regression one fails, need to figure out why. |
Seems like huggingface/api-inference-community#83 is not deployed yet. Once that's done, the tests here should pass. |
@BenjaminBossan why do you think the tests work for a classifier but not a regressor?
|
Please correct me if I misunderstand, but doesn't that check test if the prediction is 100% accurate? This may occur with classification but is almost impossible with regression. |
so the issue was that I was comparing with the training data instead of the model output lol 🤦🏼 Question: should we add this to the user guide? If we do, the user guide will take like 30 seconds at least more to run. |
Hmm, maybe it's sufficient to just document it well, without actually running it? |
ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically perfect for me. I have two minor comments, please take a look.
inputs=inputs | ||
) | ||
|
||
if isinstance(res, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rely on the assumption that if the output is a list, everything went well, else it didn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, on the inference side, we always raise when there's at least one warning or error: https://github.com/huggingface/api-inference-community/blob/5b9970bae8928019dd725f7f16832d9be753df3b/docker_images/sklearn/app/pipelines/tabular_classification.py#L55
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, and we can be sure that if an error is raised, the API will never return a list. Ideally, I would wish for a response object with a status code but api inference doesn't seem to support that.
On the other hand, when the response is ok, we know it's a list because of how the Pipeline
s are implemented. I guess this introduces some coupling, as future pipelines must always return a list as output too. For now it works, so we can proceed, but it doesn't feel very robust to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response comes as a json object, so if there are only values in it, it's a list. But I agree a more complex response or having warnings in the header would be better maybe. For now I'd say we can merge and we can change this if the logic on the backend changes. I don't think we can easily future proof this bit here.
skops/hub_utils/tests/test_hf_hub.py
Outdated
X_test = data.data.head(5) | ||
y_pred = model.predict(X_test) | ||
output = get_model_output(repo_id, data=X_test, token=HF_HUB_TOKEN) | ||
assert np.allclose(output, y_pred) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How about moving the assert
to the end of the test, so that clean up is performed even if the assert fails? Even better would be a fixture or context manager that does the setup and cleanup but moving the assert would already help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't end up using fixtures for this cause I couldn't figure out how to do them easily, and the pytest docs are really sparse when it comes to using request
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Regarding the fixture, I agree that it would be overkill for this test.
If you want, I can merge, or do you want to wait for more reviews? (Edit: I see now that the tests failed, not sure why)
inputs=inputs | ||
) | ||
|
||
if isinstance(res, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, and we can be sure that if an error is raised, the API will never return a list. Ideally, I would wish for a response object with a status code but api inference doesn't seem to support that.
On the other hand, when the response is ok, we know it's a list because of how the Pipeline
s are implemented. I guess this introduces some coupling, as future pipelines must always return a list as output too. For now it works, so we can proceed, but it doesn't feel very robust to me.
This PR introduces a wrapper to make it convenient for users to get outputs for array like inputs.
I'm not sure about the name of the function.
I'll be adding tests.
@BenjaminBossan would you mind having a look and see if this looks okay as a start to you?