Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to train a PyTorch SSD model on an M1 Mac - or is this not yet implemented? PtNDArrayEx.multiBoxPrior(PtNDArrayEx.java:697) UnsupportedOperationException: Not implemented #2693

Open
juliangamble opened this issue Jul 5, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@juliangamble
Copy link
Contributor

Description

When running TrainPikachuTest on an M1 Mac I get the error UnsupportedOperationException: Not implemented

Expected Behavior

The TrainPikachuTest runs as expected and a model is produced.

Error Message

Exception in thread "main" java.lang.UnsupportedOperationException: Not implemented
	at ai.djl.pytorch.engine.PtNDArrayEx.multiBoxPrior(PtNDArrayEx.java:697)
	at ai.djl.modality.cv.MultiBoxPrior.generateAnchorBoxes(MultiBoxPrior.java:68)
	at ai.djl.basicmodelzoo.cv.object_detection.ssd.SingleShotDetection.forwardInternal(SingleShotDetection.java:84)
	at ai.djl.nn.AbstractBaseBlock.forwardInternal(AbstractBaseBlock.java:128)
	at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:93)
	at ai.djl.training.Trainer.forward(Trainer.java:189)
	at ai.djl.training.EasyTrain.trainSplit(EasyTrain.java:122)
	at ai.djl.training.EasyTrain.trainBatch(EasyTrain.java:110)
	at ai.djl.training.EasyTrain.fit(EasyTrain.java:58)
	at ai.djl.examples.training.TrainPikachu.runExample(TrainPikachu.java:93)
	at ai.djl.examples.training.TrainPikachuTest.testDetection(TrainPikachuTest.java:52)
	at ai.djl.examples.training.TrainPikachuTest.main(TrainPikachuTest.java:30)

How to Reproduce?

Run the class TrainPikachuTest on an M1 Mac

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Run the TrainPikachuTest class with DJL_DEFAULT_ENGINE=PyTorch

What have you tried to solve it?

  1. Debugging through the code - and looking at the implementation of the class.
  2. Looking for other examples of training doing SingleShotDetection. (Didn't find any).

Environment Info

DJL_DEFAULT_ENGINE=PyTorch
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home
@juliangamble juliangamble added the bug Something isn't working label Jul 5, 2023
@zachgk
Copy link
Contributor

zachgk commented Jul 5, 2023

MXNet has several helper operators specific to SSD and they were used as part of the DJL SSD model you are using. Unfortunately, MXNet doesn't support M1 and the model doesn't run on PyTorch.

If you are interested in contributing here, you could build an implementation of SSD that does not rely on those operators or you could add the missing implementations as part of PtNDArrayEx.

@juliangamble
Copy link
Contributor Author

@zachgk thanks for getting back to me. Thanks for creating an opportunity to contribute.

I'm sizing it up - and working out a specification and way to measure if it is working.
In terms of a specification - it seems to be this class here:
https://github.com/apache/mxnet/blob/master/src/operator/contrib/multibox_prior.cc
Please help me out if you know a better one.

In terms of measuring if it is working - I'm looking in here - and not finding anything that corresponds:
https://github.com/apache/mxnet/tree/master/tests/cpp/operator

Can you help me out with how you would measure a working implementation?

@zachgk
Copy link
Contributor

zachgk commented Jul 10, 2023

Probably the easiest way to test whether it is working is to use a hard-coded value for inputs and outputs. We have some examples in OptimizerTest.

So, find a known sample data and then you can put it into the integration suite so it is run in all engines. This way, it ensures that all engines have matching behavior (including between the MXNet version and your new implementation). It also ensures that the behavior won't change because it would require also changing the values in the test

@juliangamble
Copy link
Contributor Author

I'll get back to you - I'm writing a test.

@juliangamble
Copy link
Contributor Author

I've done a pull request on this.
#2715
The two different unit tests nearly match up, but not quite - so I'm asking for some help on this.

juliangamble added a commit to juliangamble/djl that referenced this issue Jul 19, 2023
frankfliu pushed a commit that referenced this issue Aug 14, 2023
* Issue #2693 Implement PtNDArrayEx.multiBoxPrior with validation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants