Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Frontend Failing Test: torch - reduction_ops.torch.mean #28278

Merged
merged 82 commits into from
Apr 9, 2024

Conversation

Kacper-W-Kozdon
Copy link
Contributor

@Kacper-W-Kozdon Kacper-W-Kozdon commented Feb 13, 2024

PR Description

torch.mean() from the torch frontend does not seem to be properly implemented. ivy.mean() has missing arguments (dtype). Comparison against the tensorflow's ground truth might be done with the wrong tensorflow's function, as tensorflow seems to only have tf.reduce_mean(), which works slightly differently from torch.mean()- adding mean() to tf backend might be needed.

  • The backends for mean() were tweaked to cover the behaviour of torch.mean() with the dtype argument being assigned a value (type-casting before mean).
  • A problematic condition set on dtypes from the test was removed. It should probably be covered by the decorator @with_supported_dtypes. This was one of the reasons for the failing tests, as complex dtypes were substituted with 'float32' even though the associated input was still complex.
  • Small changes in the classification of the arguments as keyword/positional to match the backends.
  • Added casting to the native dtypes from the values passed to the dtype argument in backends. This was one of the reasons for failing tests.

Related Issue

Closes #28276

Checklist

  • Did you add a function?
  • Did you add the tests?
  • Did you run your tests and are your tests passing?
  • Did pre-commit not fail on any check?
  • Did you follow the steps we provided?

Socials

@Kacper-W-Kozdon Kacper-W-Kozdon changed the title Fix torch mean test fix: Frontend Failing Test: torch - reduction_ops.torch.mean Feb 13, 2024
@ivy-leaves ivy-leaves added PyTorch Frontend Developing the PyTorch Frontend, checklist triggered by commenting add_frontend_checklist Ivy Functional API labels Feb 13, 2024
@ivy-leaves ivy-leaves added the TensorFlow Frontend Developing the TensorFlow Frontend, checklist triggered by commenting add_frontend_checklist label Feb 13, 2024
@Kacper-W-Kozdon
Copy link
Contributor Author

Kacper-W-Kozdon commented Feb 27, 2024

Hi, @Ishticode - all tests passing! Core and torch frontend alike! :)

Copy link
Contributor

@Ishticode Ishticode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the demos conflicts. And will wait on a freshi ci run. Currently its showing way too many failures. Thanks for the patience.

@Kacper-W-Kozdon
Copy link
Contributor Author

Kacper-W-Kozdon commented Mar 4, 2024

Could you fix the demos conflicts. And will wait on a freshi ci run. Currently its showing way too many failures. Thanks for the patience.

On it! The CI failures are most likely due to the updates to docs/demos made recently- and the PR still contains the old docs/demos.

UPDATE:
A lot of the failed checks refers to the files I did not touch and which do not call the ivy.statistics.mean() function. Frontends other than torch were failing on mean as well, so I can look into that next, considering that both the torch frontend and core ivy pass the tests (locally though, you might want to check it independently); the only times they fail is when the tests use bfloat16, in which case gradient tests do not reach the necessary precision on some values. To the best of my understanding, that is bound to happen with that dtype.

I might be wrong but it also seems that a lot of frontends were added recently.

@Kacper-W-Kozdon
Copy link
Contributor Author

Hey @vedpatwardhan , there's a surprisingly large number of failed tests in workflows, even though the frontend and core ivy tests passed- any suggestions? A lot (I can't say if all) of the failures seem unrelated to the mean() function.

@vedpatwardhan
Copy link
Contributor

Hey @vedpatwardhan , there's a surprisingly large number of failed tests in workflows, even though the frontend and core ivy tests passed- any suggestions? A lot (I can't say if all) of the failures seem unrelated to the mean() function.

I think it would be worth checking how many of those functions (for which the tests are failing) depend on ivy.mean to see if the failures are related to the changes made. Sometimes the intelligent tests pick up more tests than expected if the function changed is used in multiple places. Thanks @Kacper-W-Kozdon 😄

@ivy-seed
Copy link

This PR has been labelled as stale because it has been inactive for more than 7 days. If you would like to continue working on this PR, then please add another comment or this PR will be closed in 7 days.

@ivy-seed ivy-seed added the Stale label Mar 22, 2024
@Kacper-W-Kozdon
Copy link
Contributor Author

Kacper-W-Kozdon commented Mar 23, 2024

Hey @vedpatwardhan , there's a surprisingly large number of failed tests in workflows, even though the frontend and core ivy tests passed- any suggestions? A lot (I can't say if all) of the failures seem unrelated to the mean() function.

I think it would be worth checking how many of those functions (for which the tests are failing) depend on ivy.mean to see if the failures are related to the changes made. Sometimes the intelligent tests pick up more tests than expected if the function changed is used in multiple places. Thanks @Kacper-W-Kozdon 😄

Hey @vedpatwardhan , I'll try to go through the tests which fail by today's evening and let you know which of them use ivy.mean() in either the frontend or the backend.

EDIT:
One thing- the "newly introduced fails" do not come from my changes and seem to be an error in the coding for the array_api_tests (probably a typo, judging from the error message). I'd prefer not to touch that in this PR, as array_api_tests are quite a bit more abstract than the other tests.

@Kacper-W-Kozdon
Copy link
Contributor Author

Kacper-W-Kozdon commented Mar 23, 2024

Hey @vedpatwardhan , there's a surprisingly large number of failed tests in workflows, even though the frontend and core ivy tests passed- any suggestions? A lot (I can't say if all) of the failures seem unrelated to the mean() function.

I think it would be worth checking how many of those functions (for which the tests are failing) depend on ivy.mean to see if the failures are related to the changes made. Sometimes the intelligent tests pick up more tests than expected if the function changed is used in multiple places. Thanks @Kacper-W-Kozdon 😄

Listed failures which in any way call mean():

  • ivy_tests/test_ivy/test_frontends/test_jax/test_numpy/test_statistical.py::test_jax_mean - This seems to be a frontend issue, since all of the backends fail; I did not change anything in the jax frontend, I might do it in the next step but I think it would be better to do it in a separate PR (there's an issue dedicated to mean() failing on the jax frontend),
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_math.py::test_tensorflow_reduce_std - uses paddle_backend.mean(), could be related to the changes I made but I don't really see how; there are two error messages, they seem to be directly linked to the paddle.std(), not to the mean() function used in it: Hint: Expected input.numel() > 0, but received input.numel():0 <= 0:0.] (at /paddle/paddle/phi/kernels/funcs/reduce_function.h:1363), Hint: Expected kernel_iter == iter->second.end() && kernel_key.backend() == Backend::CPU != true, but received kernel_iter == iter->second.end() && kernel_key.backend() == Backend::CPU:1 == true:1.] (at /paddle/paddle/phi/core/kernel_factory.cc:262),
  • ivy_tests/test_ivy/test_frontends/test_jax/test_numpy/test_statistical.py::test_jax_std - again, paddle backend uses mean() for std() but the errors seem unrelated to my changes; error message: [Hint: Expected input.numel() > 0, but received input.numel():0 <= 0:0.] (at /paddle/paddle/phi/kernels/funcs/reduce_function.h:1363),
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_binary_accuracy, paddle backend - uses ivy.mean(), I do not know if that's the failing point; error message: E AssertionError: the results from backend paddle and ground truth framework tensorflow do not match E 1.0!=0.0
  • ivy_tests/test_ivy/test_frontends/test_torch/test_nn/test_functional/test_loss_functions.py::test_torch_binary_cross_entropy_with_logits, jax backend - uses mean() through the _reduce_loss() helper function; error: AssertionError: returned shape = (2,), ground-truth returned shape = (), returned values: values = [[array([0.7634394, 0.7634394], dtype=float32)], [array(0.7634394, dtype=float32)]]- I'd say this seems to be unrelated to mean() and is probably caused by improper dim expansion/reshaping,
  • ivy_tests/test_ivy/test_functional/test_experimental/test_nn/test_norms.py::test_group_norm, jax and tensorflow - uses ivy.mean() but it also contains ivy.var() and there are a lot of failed tests for var(); the error is related to dtypes: E ivy.utils.exceptions.IvyBackendException: tensorflow: group_norm: tensorflow: var: The DTypes <class 'numpy.dtype[bfloat16]'> and <class 'numpy.dtypes.Float16DType'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is object..
  • I'll just list the failed tests which contain the var() function:
    ivy_tests/test_ivy/test_frontends/test_numpy/test_ndarray/test_ndarray.py::test_numpy_var, all backends;
    ivy_tests/test_ivy/test_frontends/test_numpy/test_statistics/test_averages_and_variances.py::test_numpy_var, paddle;
    ivy_tests/test_ivy/test_frontends/test_torch/test_reduction_ops.py::test_torch_var_mean, paddle;
    ivy_tests/test_ivy/test_frontends/test_jax/test_array.py::test_jax_array_var, paddle;
    ivy_tests/test_ivy/test_frontends/test_jax/test_numpy/test_statistical.py::test_jax_nanvar, all backends;
    ivy_tests/test_ivy/test_frontends/test_numpy/test_statistics/test_averages_and_variances.py::test_numpy_nanvar, paddle;
    ivy_tests/test_ivy/test_functional/test_core/test_statistical.py::test_var, jax, torch and paddle (!!!) - core var() is failing and it's present in a lot of the functions alongside mean(), which passed the core tests;
    ivy_tests/test_ivy/test_frontends/test_paddle/test_tensor/test_tensor.py::test_paddle_var, paddle;
    ivy_tests/test_ivy/test_frontends/test_torch/test_tensor.py::test_torch_var, torch;
    ivy_tests/test_ivy/test_frontends/test_jax/test_numpy/test_statistical.py::test_jax_var, paddle;
    ivy_tests/test_ivy/test_frontends/test_paddle/test_stat.py::test_paddle_var, paddle;
  • ivy_tests/test_ivy/test_frontends/test_paddle/test_nn/test_functional/test_loss.py::test_paddle_margin_ranking_loss, all backends- does not seem to contain mean() but contains negative() and some of the failing functions with mean() also happened to contain negative()- however, it might be unrelated,
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_squared_hinge, all backends; error: TypeError: squared_hinge() got multiple values for argument 'y_true' (and I think the same issue is for the next function),
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_hinge, jax, numpy, torch and paddle - contains mean(),
  • ivy_tests/test_ivy/test_functional/test_experimental/test_nn/test_losses.py::test_huber_loss, torch - contains mean(); error: AssertionError: The length of results from backend torch and ground truth framework tensorflow does not match | | len(ret_np_flat) != len(ret_np_from_gt_flat): | | ret_np_flat: | | [array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32)] | | ret_np_from_gt_flat: | | [] - seems unrelated to mean() to me,
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_raw_ops.py::test_tensorflow_Mean - contains mean(),
  • ivy_tests/test_ivy/test_frontends/test_torch/test_reduction_ops.py::test_torch_std, paddle - contains mean(),
  • ivy_tests/test_ivy/test_functional/test_experimental/test_core/test_linalg.py::test_higher_order_moment, numpy and tensorflow - contains ivy.mean(),
  • ivy_tests/test_ivy/test_functional/test_experimental/test_core/test_linalg.py::test_make_svd_non_negative, jax, torch and paddle - contains ivy.mean(),
  • ivy_tests/test_ivy/test_frontends/test_numpy/test_ndarray/test_ndarray.py::test_numpy_mean, all backends - that's a frontend issue (there is an open issue for that),
  • ivy_tests/test_ivy/test_frontends/test_numpy/test_statistics/test_averages_and_variances.py::test_numpy_mean and ivy_tests/test_ivy/test_frontends/test_numpy/test_statistics/test_averages_and_variances.py::test_numpy_nanmean, all backends - there is an open frontend issue,
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_poisson, only jax - contains ivy.mean(),
  • ivy_tests/test_ivy/test_frontends/test_torch/test_reduction_ops.py::test_torch_std_mean, only paddle and tensorflow - contains ivy.mean(); error: Hint: Expected input.numel() > 0, but received input.numel():0 <= 0:0.] (at /paddle/paddle/phi/kernels/funcs/reduce_function.h:1363) - seems unrelated to mean(),
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_mean_absolute_percentage_error, tensorflow and jax - contains ivy.mean(); error: E numpy.exceptions.DTypePromotionError: The DTypes <class 'numpy.dtype[bfloat16]'> and <class 'numpy.dtypes.Float16DType'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is object. - does not seem to be related to mean(),
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_metrics_mean_squared_logarithmic_error, jax, numpy and torch - contains ivy.mean(); inconclusive- sometimes passes the tests (due to the number of functions I've been checking, the num_examples is set to only 25)
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_nn.py::test_tensorflow_moments, all backends - contains ivy.mean(); error: ValueError: too many values to unpack (expected 3) - seems unrelated to mean(),
  • ivy_tests/test_ivy/test_frontends/test_jax/test_array.py::test_jax_array_mean, only tensorflow - contains ivy.mean(),
  • ivy_tests/test_ivy/test_functional/test_experimental/test_nn/test_norms.py::test_batch_norm, tensorflow and jax - contains ivy.norm(),
  • ivy_tests/test_ivy/test_frontends/test_tensorflow/test_math.py::test_tensorflow_reduce_mean, torch - contains ivy.mean().

I think I got at the very least most of the functions which use mean() anywhere and fail the tests. A lot of the failures come from the frontends (other than torch, which this PR deals with alongside core ivy, arrays and containers) - and the errors pop up for all of the backends. All in all, it's significantly fewer than 97 tests- it would be at most ~27 (I might have missed some of the functions but 4 of them are frontend-related) if all of the failuers I listed were in separate tests.

@Kacper-W-Kozdon
Copy link
Contributor Author

@Ishticode would you mind checking the message right above as well and weighing in on the issue? :)

@Ishticode
Copy link
Contributor

Hi @Kacper-W-Kozdon
Sorry for silence on my end. I will give a detailed look today.

Copy link
Contributor

@Ishticode Ishticode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the ci have picked more than it needs to. The essential tests relevant like test_mean and test_torch_mean are fine and besides the changes are simple enough to just add dtype parameter which if specified, the input is casted to before the actual mean call.
The changes look good to me. I will go ahead and merge this. Will keep an eye on any relevant failures. Thank you very much @Kacper-W-Kozdon for you effort, patience and dilligence. Its much appreciated. 🙂

@Ishticode Ishticode merged commit 58de7f4 into ivy-llc:main Apr 9, 2024
41 of 141 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Array API Conform to the Array API Standard, created by The Consortium for Python Data API Standards Ivy Functional API Paddle Paddle Backend PyTorch Frontend Developing the PyTorch Frontend, checklist triggered by commenting add_frontend_checklist Stale TensorFlow Frontend Developing the TensorFlow Frontend, checklist triggered by commenting add_frontend_checklist
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Frontend Failing Test: torch - reduction_ops.torch.mean
6 participants