fix: Frontend Failing Test: torch - reduction_ops.torch.mean #28278

Kacper-W-Kozdon · 2024-02-13T15:33:56Z

PR Description

torch.mean() from the torch frontend does not seem to be properly implemented. ivy.mean() has missing arguments (dtype). Comparison against the tensorflow's ground truth might be done with the wrong tensorflow's function, as tensorflow seems to only have tf.reduce_mean(), which works slightly differently from torch.mean()- adding mean() to tf backend might be needed.

The backends for mean() were tweaked to cover the behaviour of torch.mean() with the dtype argument being assigned a value (type-casting before mean).
A problematic condition set on dtypes from the test was removed. It should probably be covered by the decorator @with_supported_dtypes. This was one of the reasons for the failing tests, as complex dtypes were substituted with 'float32' even though the associated input was still complex.
Small changes in the classification of the arguments as keyword/positional to match the backends.
Added casting to the native dtypes from the values passed to the dtype argument in backends. This was one of the reasons for failing tests.

Related Issue

Closes #28276

Checklist

Did you add a function?
Did you add the tests?
Did you run your tests and are your tests passing?
Did pre-commit not fail on any check?
Did you follow the steps we provided?

Socials

Kacper-W-Kozdon · 2024-02-27T20:28:01Z

Hi, @Ishticode - all tests passing! Core and torch frontend alike! :)

Ishticode

Could you fix the demos conflicts. And will wait on a freshi ci run. Currently its showing way too many failures. Thanks for the patience.

Kacper-W-Kozdon · 2024-03-04T13:35:02Z

Could you fix the demos conflicts. And will wait on a freshi ci run. Currently its showing way too many failures. Thanks for the patience.

On it! The CI failures are most likely due to the updates to docs/demos made recently- and the PR still contains the old docs/demos.

UPDATE:
A lot of the failed checks refers to the files I did not touch and which do not call the ivy.statistics.mean() function. Frontends other than torch were failing on mean as well, so I can look into that next, considering that both the torch frontend and core ivy pass the tests (locally though, you might want to check it independently); the only times they fail is when the tests use bfloat16, in which case gradient tests do not reach the necessary precision on some values. To the best of my understanding, that is bound to happen with that dtype.

I might be wrong but it also seems that a lot of frontends were added recently.

Kacper-W-Kozdon · 2024-03-13T12:31:45Z

Hey @vedpatwardhan , there's a surprisingly large number of failed tests in workflows, even though the frontend and core ivy tests passed- any suggestions? A lot (I can't say if all) of the failures seem unrelated to the mean() function.

vedpatwardhan · 2024-03-14T06:43:03Z

Hey @vedpatwardhan , there's a surprisingly large number of failed tests in workflows, even though the frontend and core ivy tests passed- any suggestions? A lot (I can't say if all) of the failures seem unrelated to the mean() function.

I think it would be worth checking how many of those functions (for which the tests are failing) depend on ivy.mean to see if the failures are related to the changes made. Sometimes the intelligent tests pick up more tests than expected if the function changed is used in multiple places. Thanks @Kacper-W-Kozdon 😄

ivy-seed · 2024-03-22T05:53:24Z

This PR has been labelled as stale because it has been inactive for more than 7 days. If you would like to continue working on this PR, then please add another comment or this PR will be closed in 7 days.

Kacper-W-Kozdon · 2024-03-23T10:59:03Z

Hey @vedpatwardhan , there's a surprisingly large number of failed tests in workflows, even though the frontend and core ivy tests passed- any suggestions? A lot (I can't say if all) of the failures seem unrelated to the mean() function.

I think it would be worth checking how many of those functions (for which the tests are failing) depend on ivy.mean to see if the failures are related to the changes made. Sometimes the intelligent tests pick up more tests than expected if the function changed is used in multiple places. Thanks @Kacper-W-Kozdon 😄

Hey @vedpatwardhan , I'll try to go through the tests which fail by today's evening and let you know which of them use ivy.mean() in either the frontend or the backend.

EDIT:
One thing- the "newly introduced fails" do not come from my changes and seem to be an error in the coding for the array_api_tests (probably a typo, judging from the error message). I'd prefer not to touch that in this PR, as array_api_tests are quite a bit more abstract than the other tests.

Kacper-W-Kozdon · 2024-03-23T17:57:16Z

Hey @vedpatwardhan , there's a surprisingly large number of failed tests in workflows, even though the frontend and core ivy tests passed- any suggestions? A lot (I can't say if all) of the failures seem unrelated to the mean() function.

I think it would be worth checking how many of those functions (for which the tests are failing) depend on ivy.mean to see if the failures are related to the changes made. Sometimes the intelligent tests pick up more tests than expected if the function changed is used in multiple places. Thanks @Kacper-W-Kozdon 😄

Listed failures which in any way call mean():

ivy_tests/test_ivy/test_frontends/test_jax/test_numpy/test_statistical.py::test_jax_mean - This seems to be a frontend issue, since all of the backends fail; I did not change anything in the jax frontend, I might do it in the next step but I think it would be better to do it in a separate PR (there's an issue dedicated to mean() failing on the jax frontend),
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_math.py::test_tensorflow_reduce_std - uses paddle_backend.mean(), could be related to the changes I made but I don't really see how; there are two error messages, they seem to be directly linked to the paddle.std(), not to the mean() function used in it: Hint: Expected input.numel() > 0, but received input.numel():0 <= 0:0.] (at /paddle/paddle/phi/kernels/funcs/reduce_function.h:1363), Hint: Expected kernel_iter == iter->second.end() && kernel_key.backend() == Backend::CPU != true, but received kernel_iter == iter->second.end() && kernel_key.backend() == Backend::CPU:1 == true:1.] (at /paddle/paddle/phi/core/kernel_factory.cc:262),
ivy_tests/test_ivy/test_frontends/test_jax/test_numpy/test_statistical.py::test_jax_std - again, paddle backend uses mean() for std() but the errors seem unrelated to my changes; error message: [Hint: Expected input.numel() > 0, but received input.numel():0 <= 0:0.] (at /paddle/paddle/phi/kernels/funcs/reduce_function.h:1363),
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_binary_accuracy, paddle backend - uses ivy.mean(), I do not know if that's the failing point; error message: E AssertionError: the results from backend paddle and ground truth framework tensorflow do not match E 1.0!=0.0
ivy_tests/test_ivy/test_frontends/test_torch/test_nn/test_functional/test_loss_functions.py::test_torch_binary_cross_entropy_with_logits, jax backend - uses mean() through the _reduce_loss() helper function; error: AssertionError: returned shape = (2,), ground-truth returned shape = (), returned values: values = [[array([0.7634394, 0.7634394], dtype=float32)], [array(0.7634394, dtype=float32)]]- I'd say this seems to be unrelated to mean() and is probably caused by improper dim expansion/reshaping,
ivy_tests/test_ivy/test_functional/test_experimental/test_nn/test_norms.py::test_group_norm, jax and tensorflow - uses ivy.mean() but it also contains ivy.var() and there are a lot of failed tests for var(); the error is related to dtypes: E ivy.utils.exceptions.IvyBackendException: tensorflow: group_norm: tensorflow: var: The DTypes <class 'numpy.dtype[bfloat16]'> and <class 'numpy.dtypes.Float16DType'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is object..
I'll just list the failed tests which contain the var() function:
ivy_tests/test_ivy/test_frontends/test_numpy/test_ndarray/test_ndarray.py::test_numpy_var, all backends;
ivy_tests/test_ivy/test_frontends/test_numpy/test_statistics/test_averages_and_variances.py::test_numpy_var, paddle;
ivy_tests/test_ivy/test_frontends/test_torch/test_reduction_ops.py::test_torch_var_mean, paddle;
ivy_tests/test_ivy/test_frontends/test_jax/test_array.py::test_jax_array_var, paddle;
ivy_tests/test_ivy/test_frontends/test_jax/test_numpy/test_statistical.py::test_jax_nanvar, all backends;
ivy_tests/test_ivy/test_frontends/test_numpy/test_statistics/test_averages_and_variances.py::test_numpy_nanvar, paddle;
ivy_tests/test_ivy/test_functional/test_core/test_statistical.py::test_var, jax, torch and paddle (!!!) - core var() is failing and it's present in a lot of the functions alongside mean(), which passed the core tests;
ivy_tests/test_ivy/test_frontends/test_paddle/test_tensor/test_tensor.py::test_paddle_var, paddle;
ivy_tests/test_ivy/test_frontends/test_torch/test_tensor.py::test_torch_var, torch;
ivy_tests/test_ivy/test_frontends/test_jax/test_numpy/test_statistical.py::test_jax_var, paddle;
ivy_tests/test_ivy/test_frontends/test_paddle/test_stat.py::test_paddle_var, paddle;
ivy_tests/test_ivy/test_frontends/test_paddle/test_nn/test_functional/test_loss.py::test_paddle_margin_ranking_loss, all backends- does not seem to contain mean() but contains negative() and some of the failing functions with mean() also happened to contain negative()- however, it might be unrelated,
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_squared_hinge, all backends; error: TypeError: squared_hinge() got multiple values for argument 'y_true' (and I think the same issue is for the next function),
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_hinge, jax, numpy, torch and paddle - contains mean(),
ivy_tests/test_ivy/test_functional/test_experimental/test_nn/test_losses.py::test_huber_loss, torch - contains mean(); error: AssertionError: The length of results from backend torch and ground truth framework tensorflow does not match | | len(ret_np_flat) != len(ret_np_from_gt_flat): | | ret_np_flat: | | [array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32)] | | ret_np_from_gt_flat: | | [] - seems unrelated to mean() to me,
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_raw_ops.py::test_tensorflow_Mean - contains mean(),
ivy_tests/test_ivy/test_frontends/test_torch/test_reduction_ops.py::test_torch_std, paddle - contains mean(),
ivy_tests/test_ivy/test_functional/test_experimental/test_core/test_linalg.py::test_higher_order_moment, numpy and tensorflow - contains ivy.mean(),
ivy_tests/test_ivy/test_functional/test_experimental/test_core/test_linalg.py::test_make_svd_non_negative, jax, torch and paddle - contains ivy.mean(),
ivy_tests/test_ivy/test_frontends/test_numpy/test_ndarray/test_ndarray.py::test_numpy_mean, all backends - that's a frontend issue (there is an open issue for that),
ivy_tests/test_ivy/test_frontends/test_numpy/test_statistics/test_averages_and_variances.py::test_numpy_mean and ivy_tests/test_ivy/test_frontends/test_numpy/test_statistics/test_averages_and_variances.py::test_numpy_nanmean, all backends - there is an open frontend issue,
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_poisson, only jax - contains ivy.mean(),
ivy_tests/test_ivy/test_frontends/test_torch/test_reduction_ops.py::test_torch_std_mean, only paddle and tensorflow - contains ivy.mean(); error: Hint: Expected input.numel() > 0, but received input.numel():0 <= 0:0.] (at /paddle/paddle/phi/kernels/funcs/reduce_function.h:1363) - seems unrelated to mean(),
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_mean_absolute_percentage_error, tensorflow and jax - contains ivy.mean(); error: E numpy.exceptions.DTypePromotionError: The DTypes <class 'numpy.dtype[bfloat16]'> and <class 'numpy.dtypes.Float16DType'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is object. - does not seem to be related to mean(),
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_keras/test_metrics.py::test_tensorflow_metrics_mean_squared_logarithmic_error, jax, numpy and torch - contains ivy.mean(); inconclusive- sometimes passes the tests (due to the number of functions I've been checking, the num_examples is set to only 25)
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_nn.py::test_tensorflow_moments, all backends - contains ivy.mean(); error: ValueError: too many values to unpack (expected 3) - seems unrelated to mean(),
ivy_tests/test_ivy/test_frontends/test_jax/test_array.py::test_jax_array_mean, only tensorflow - contains ivy.mean(),
ivy_tests/test_ivy/test_functional/test_experimental/test_nn/test_norms.py::test_batch_norm, tensorflow and jax - contains ivy.norm(),
ivy_tests/test_ivy/test_frontends/test_tensorflow/test_math.py::test_tensorflow_reduce_mean, torch - contains ivy.mean().

I think I got at the very least most of the functions which use mean() anywhere and fail the tests. A lot of the failures come from the frontends (other than torch, which this PR deals with alongside core ivy, arrays and containers) - and the errors pop up for all of the backends. All in all, it's significantly fewer than 97 tests- it would be at most ~27 (I might have missed some of the functions but 4 of them are frontend-related) if all of the failuers I listed were in separate tests.

Kacper-W-Kozdon · 2024-03-26T15:29:20Z

@Ishticode would you mind checking the message right above as well and weighing in on the issue? :)

Ishticode · 2024-04-08T04:18:00Z

Hi @Kacper-W-Kozdon
Sorry for silence on my end. I will give a detailed look today.

Ishticode

It seems to me the ci have picked more than it needs to. The essential tests relevant like test_mean and test_torch_mean are fine and besides the changes are simple enough to just add dtype parameter which if specified, the input is casted to before the actual mean call.
The changes look good to me. I will go ahead and merge this. Will keep an eye on any relevant failures. Thank you very much @Kacper-W-Kozdon for you effort, patience and dilligence. Its much appreciated. 🙂

Kacper-W-Kozdon added 6 commits February 13, 2024 13:34

main pull

9cd1f85

rm separate dtype helper from test handler

9753c8f

rm separate dtype helper from test handler

3c3564a

rm separate dtype helper from test handler

b411234

reformatted torch backend, added missing arguments, reformatted ivy

dc40439

reformatted torch backend, added missing arguments, reformatted ivy

bc96c08

Kacper-W-Kozdon changed the title ~~Fix torch mean test~~ fix: Frontend Failing Test: torch - reduction_ops.torch.mean Feb 13, 2024

Kacper-W-Kozdon mentioned this pull request Feb 13, 2024

Fix Frontend Failing Test: torch - reduction_ops.torch.mean #28276

Closed

Kacper-W-Kozdon added 2 commits February 13, 2024 16:39

reformatted torch backend, added missing arguments, reformatted ivy

ef52076

reformatted tf backend for mean

d477832

ivy-leaves added PyTorch Frontend Developing the PyTorch Frontend, checklist triggered by commenting add_frontend_checklist Ivy Functional API labels Feb 13, 2024

reformatted tf backend for mean

c2a2ac3

ivy-seed assigned Sarvesh-Kesharwani Feb 13, 2024

reformatted tf backend for mean

0b03740

ivy-leaves added the Paddle Paddle Backend label Feb 13, 2024

Kacper-W-Kozdon added 10 commits February 13, 2024 19:58

reformatted tf backend for mean

8e7747c

reformatted tf backend for mean

dbdeeee

reformatted tf backend for mean

176680f

reformatted tf backend for mean

149b57f

reformatted tf backend for mean

b91890e

reformatted tf backend for mean

3ffd456

reformatted tf backend for mean

56a459e

reformatted tf backend for mean

97571e5

reformatted tf backend for mean

9c2178e

reformatted tf backend for mean

0d1bf9f

ivy-leaves added the TensorFlow Frontend Developing the TensorFlow Frontend, checklist triggered by commenting add_frontend_checklist label Feb 13, 2024

Kacper-W-Kozdon added 3 commits February 13, 2024 21:58

reformatted tf frontend for mean

b1ef1ed

reproduce_failure

1f2665b

reproduce_failure

b498da4

Kacper-W-Kozdon added 3 commits February 27, 2024 21:09

tensorflow backend- 'out'

46228db

tensorflow backend- 'out'

5622346

tensorflow backend- 'out'

bd7d2ab

Kacper-W-Kozdon requested a review from Ishticode February 27, 2024 20:30

Ishticode suggested changes Mar 4, 2024

View reviewed changes

revert docs/demos

6e972ae

Kacper-W-Kozdon requested a review from Ishticode March 4, 2024 13:40

Kacper-W-Kozdon added 2 commits March 13, 2024 11:33

fixing new errors

973a479

fixing new errors

698fde3

Kacper-W-Kozdon force-pushed the fix_torch_mean_test branch from 66075e1 to 698fde3 Compare March 13, 2024 10:40

Kacper-W-Kozdon and others added 6 commits March 13, 2024 11:42

fixing new errors

762bf54

fixing new errors

4ff05d9

docs/demos

1a454bc

fixing core errors

307bd76

docs/demos

c35778f

Merge branch 'unifyai:main' into fix_torch_mean_test

8104da5

ivy-seed added the Stale label Mar 22, 2024

Merge branch 'main' into fix_torch_mean_test

45572f4

Merge branch 'unifyai:main' into fix_torch_mean_test

234ddfa

Ishticode approved these changes Apr 9, 2024

View reviewed changes

Ishticode merged commit 58de7f4 into ivy-llc:main Apr 9, 2024
41 of 141 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Frontend Failing Test: torch - reduction_ops.torch.mean #28278

fix: Frontend Failing Test: torch - reduction_ops.torch.mean #28278

Kacper-W-Kozdon commented Feb 13, 2024 •

edited

Loading

Kacper-W-Kozdon commented Feb 27, 2024 •

edited

Loading

Ishticode left a comment •

edited

Loading

Kacper-W-Kozdon commented Mar 4, 2024 •

edited

Loading

Kacper-W-Kozdon commented Mar 13, 2024

vedpatwardhan commented Mar 14, 2024

ivy-seed commented Mar 22, 2024

Kacper-W-Kozdon commented Mar 23, 2024 •

edited

Loading

Kacper-W-Kozdon commented Mar 23, 2024 •

edited

Loading

Kacper-W-Kozdon commented Mar 26, 2024

Ishticode commented Apr 8, 2024

Ishticode left a comment •

edited

Loading

fix: Frontend Failing Test: torch - reduction_ops.torch.mean #28278

fix: Frontend Failing Test: torch - reduction_ops.torch.mean #28278

Conversation

Kacper-W-Kozdon commented Feb 13, 2024 • edited Loading

PR Description

Related Issue

Checklist

Socials

Kacper-W-Kozdon commented Feb 27, 2024 • edited Loading

Ishticode left a comment • edited Loading

Choose a reason for hiding this comment

Kacper-W-Kozdon commented Mar 4, 2024 • edited Loading

Kacper-W-Kozdon commented Mar 13, 2024

vedpatwardhan commented Mar 14, 2024

ivy-seed commented Mar 22, 2024

Kacper-W-Kozdon commented Mar 23, 2024 • edited Loading

Kacper-W-Kozdon commented Mar 23, 2024 • edited Loading

Kacper-W-Kozdon commented Mar 26, 2024

Ishticode commented Apr 8, 2024

Ishticode left a comment • edited Loading

Choose a reason for hiding this comment

Kacper-W-Kozdon commented Feb 13, 2024 •

edited

Loading

Kacper-W-Kozdon commented Feb 27, 2024 •

edited

Loading

Ishticode left a comment •

edited

Loading

Kacper-W-Kozdon commented Mar 4, 2024 •

edited

Loading

Kacper-W-Kozdon commented Mar 23, 2024 •

edited

Loading

Kacper-W-Kozdon commented Mar 23, 2024 •

edited

Loading

Ishticode left a comment •

edited

Loading