Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

please fix coding errors in pickup_best_model.lua script #18

Open
garzy opened this issue Oct 7, 2020 · 14 comments
Open

please fix coding errors in pickup_best_model.lua script #18

garzy opened this issue Oct 7, 2020 · 14 comments

Comments

@garzy
Copy link

garzy commented Oct 7, 2020

I have to add this at beggining:

require 'nn'
require 'cunn'

And

local path = arguments.model_path  
  path = path .. "NoLimit/"

because game_settings.nl is NULL

And replace this

 local best_model_path = path .. '/epoch_' .. epoch .. net_type_str .. '.model'

with this

 local best_model_path = path .. '/epoch_' .. best_epoch .. net_type_str .. '.model'

because NIL exception too.

Finally, it's crashing at line:

  torch.save(final_model_file_name, best_model)

error thrown:

 /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:76: in function 'select_best_model'
        Training/pickup_best_model.lua:90: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk

I'm on kubuntu 16.04 with torch, lua 5.2, cutorch, cuda, and Nvidia GTX 1060 with 6GB of RAM

@garzy garzy changed the title please fix coding error in pickup_best_model.lua script please fix coding errors in pickup_best_model.lua script Oct 7, 2020
@aikupoker
Copy link
Owner

Hi @garzy

Thanks for reporting this issue.

I have fixed the errors. Could you remove your local changes and update your local repository?

@garzy
Copy link
Author

garzy commented Oct 7, 2020

I've updated the file and launch it again, but ends crashing at line

  torch.save(final_model_file_name, best_model)

Throwing the above exception

Maybe could be an error with return type of local best_model = torch.load(best_model_path) ??

@aikupoker
Copy link
Owner

Could you print the complete log output?

@garzy
Copy link
Author

garzy commented Oct 7, 2020

Selecting best model with less Validation Huber Loss ...
best epoch: 201
best loss: 0.076074071484905
best model path ../Data/Models/NoLimit/river//epoch_201_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:83: in function 'select_best_model'
        Training/pickup_best_model.lua:97: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?

@aikupoker
Copy link
Owner

Please, update again your local repo "deeper-stacker" in master branch and try again.

Thanks!

@garzy
Copy link
Author

garzy commented Oct 7, 2020

Same problem :(

best epoch: 201
best loss: 0.076074071484905
best model info path ../Data/Models/NoLimit/river/epoch_201_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:85: in function 'select_best_model'
        Training/pickup_best_model.lua:104: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?

@aikupoker
Copy link
Owner

Could you do a ls -lah ../Data/Models/NoLimit/river/ to this path?

@garzy
Copy link
Author

garzy commented Oct 7, 2020

...
-rw-rw-r-- 1 kml kml  119 oct  7 01:36 epoch_86_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:36 epoch_86_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:37 epoch_87_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:37 epoch_87_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:38 epoch_88_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:38 epoch_88_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:38 epoch_89_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:38 epoch_89_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 00:47 epoch_8_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 00:47 epoch_8_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:39 epoch_90_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:39 epoch_90_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:40 epoch_91_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:40 epoch_91_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:40 epoch_92_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:40 epoch_92_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:41 epoch_93_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:41 epoch_93_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:42 epoch_94_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:42 epoch_94_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:42 epoch_95_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:42 epoch_95_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:43 epoch_96_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:43 epoch_96_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:43 epoch_97_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:43 epoch_97_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:44 epoch_98_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:44 epoch_98_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:45 epoch_99_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:45 epoch_99_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 00:47 epoch_9_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 00:47 epoch_9_gpu.model
-rw-rw-r-- 1 kml kml    0 oct  7 10:56 final__gpu.model
-rw-rw-r-- 1 kml kml    8 oct  6 19:47 .gitkeep

@garzy
Copy link
Author

garzy commented Oct 7, 2020

two underscores at final__gpu.model... maybe this?

@aikupoker
Copy link
Owner

aikupoker commented Oct 7, 2020

I fixed two typos in master branch. Try again.

@garzy
Copy link
Author

garzy commented Oct 7, 2020

same problem

saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:85: in function 'select_best_model'
        Training/pickup_best_model.lua:104: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?

Don't worry, maybe have corrupted training models because I generated them with kubuntu 18.04 but at the end I'm having segmentation fault core exceptions and trying to fix the problem I've noticed that I need kubuntu 16 instead, but in fresh install of kubuntu 16 I've run directly the step 4.th Training/main_train.lua 4

I'm going to retry operations from step 3.th Training/raw_converter.lua 4

@garzy
Copy link
Author

garzy commented Oct 7, 2020

After repeat the steps I'm having the same error:

/deeper-stacker/Source$ th Training/pickup_best_model.lua 4
Selecting best model with less Validation Huber Loss ...
best epoch: 204 of total: 350 epochs
best loss: 0.074449650388494
best model info path ../Data/Models/NoLimit/river/epoch_204_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:85: in function 'select_best_model'
        Training/pickup_best_model.lua:104: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?

@garzy
Copy link
Author

garzy commented Oct 7, 2020

I can continue doing this without launch pickup_best_model script

cp epoch_204_gpu.info final_gpu.info
cp epoth_204_gpu.model final_gpu.model

When I execute $ torch.load('final_cpu.info') model seems to load well.

Then, I continue with turn generation:

kml@kubuntu:~/deeper-stacker$ cd Source && th DataGeneration/main_data_generation.lua 3
Generating data ...
6sAh9s5c 1 292NN information:
learning_rate   0.0001
valid_loss      0.074449650388494
gpu     true
epoch   204
NN architecture:
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> output]
  (1): nn.ConcatTable {
    input
      |`-> (1): nn.Sequential {
      |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> output]
      |      (1): nn.Linear(1009 -> 500)
      |      (2): nn.BatchNormalization (2D) (500)
      |      (3): nn.PReLU
      |      (4): nn.Linear(500 -> 500)
      |      (5): nn.BatchNormalization (2D) (500)
      |      (6): nn.PReLU
      |      (7): nn.Linear(500 -> 500)
      |      (8): nn.BatchNormalization (2D) (500)
      |      (9): nn.PReLU
      |      (10): nn.Linear(500 -> 1008)
      |    }
       `-> (2): nn.Sequential {
             [input -> (1) -> output]
             (1): nn.Narrow
           }
       ... -> output
  }
  (2): nn.ConcatTable {
    input
      |`-> (1): nn.Sequential {
      |      [input -> (1) -> output]
      |      (1): nn.SelectTable(1)
      |    }
       `-> (2): nn.Sequential {
             [input -> (1) -> (2) -> (3) -> output]
             (1): nn.DotProduct
             (2): nn.Replicate
             (3): nn.MulConstant
           }
       ... -> output
  }
  (3): nn.CAddTable
}
nextround init_bucket time: 1.1490240097046
    avgTime: 123.4568271637
AdAs8s8h 2 979nextround init_bucket time: 0.58787417411804
    avgTime: 73.452112078667
4hTdAd7h 3 1712nextround init_bucket time: 1.2796399593353
    avgTime: 57.244448343913
Th2d3s5c 4 14861nextround init_bucket time: 0.56205201148987
    avgTime: 44.798284769058
2s2hTsJc 5 100nextround init_bucket time: 0.64476418495178

@yffbit
Copy link

yffbit commented Feb 9, 2021

This error is weird. When I run th Training/pickup_best_model.lua 4 directly, the error occurs. When I debug the pickup_best_model.lua file in vs code, no error occurs and the final_gpu.model works fine. When I enter the torch environment and execute torch.save(final_model_file_name, best_model) manually, no error occurs. My environment is win10, luajit, cutorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants