-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Having runtime error when train your Hopenet #6
Comments
@developer-mayuan You should filter the 300W_LP dataset to make sure the three angles int the train list are between -99° and 99°. |
@MichaelYSC Thank you very much for your help! I don't think this should be the reason that caused the problem. In the 'dataset.py', the following code deals with the binding issue:
According to the API manual of
But I will check it anyway, thank you very much for your inspiration! |
@developer-mayuan As your description, np.digitize() will return 0 if x < bins[0], then binned_pose = np.digitize( ) - 1 will return -1 and send to calculate CrossEntropyLoss, but CrossEntropyLoss expects a class index (0 to C-1) , so that is a problem. |
@developer-mayuan |
@ytgcljj The dlib version kind of have some problem when I ran it so I wrote my own version with the same input format. Anyway, you gave gave the previous code a try. The following is the command line: You can follow my folder structure like the image shown below. You need to download the dlib face detector model and the hopenet model from the link given in the readme file. You can also refer to my code which modified the dlib+hopenet code. |
@MichaelYSC Yes, you are right, I will think about how to modify it! Thank you very much for your help! |
@MichaelYSC After I filtering the data with pose range out of [-99, 99], the training program can be run smoothly. But I still have a problem that the loss at the beginning is extremely large. For example:
I would like to know if the loss at the beginning looks normal or not in your case. Thanks. |
@developer-mayuan Did you train on the 300W_LP dataset ? It loos normal in my case. |
@MichaelYSC Yes, I did train on the 300W_LP but I trained from the scratch. Could this be the reason? |
@developer-mayuan I am not sure. Maybe we should wait natanielruiz release the train list file. |
@MichaelYSC Yes, let's do that. :) |
Hi everyone, I don't have too much time until the end of this week but for now I can say two things: I will release the 300W-LP list at the end of the week when I get a second. Thank you for your patience! |
@natanielruiz Thank you very much for your response. I really appreciate your help! |
300W_LP_filename_filtered.txt Here are the filtered filename lists for 300W-LP and AFLW2000. Have fun! |
Hi natanielruiz:
Firstly, I want to say thank you for your great work! I tested your pretrained model on my own dataset and it works great. The result is accurate and robust. Then currently I would like to fine-tune your network with my own dataset, however, I found I cannot do it.
I did prepared the 300W_LP dataset and generated the filelist based on the input of your code. (By the way, maybe you can provide the filelist generation code in your repository, which will make it self-contained.)
Then, we I ran your train_hopenet.py code, sometimes I can got result for 1 or 2 epochs, however, it will always gave me the following error message:
I did some search, and the most promising answer is the following link:
https://discuss.pytorch.org/t/runtimeerror-cuda-runtime-error-59-device-side-assert-triggered-at-opt-conda-conda-bld-pytorch-1503970438496-work-torch-lib-thc-generic-thcstorage-c-32/9669/5
It sees like in some case your output is out of the bound of the target. The following is my running environment:
Python 2.7.14 (with Anaconda)
Using conda virtual environment
pytorch 0.2.0 py27hc03bea1_4cu80 [cuda80] soumith
torchvision 0.1.9 py27hdb88a65_1 soumith
I would like to know if you meet this kind of problem before and if you can provide me some ideas about how to solving this problem? Thank you very much for your help!
The text was updated successfully, but these errors were encountered: