please fix coding errors in pickup_best_model.lua script #18

garzy · 2020-10-07T06:32:53Z

I have to add this at beggining:

require 'nn'
require 'cunn'

And

local path = arguments.model_path  
  path = path .. "NoLimit/"

because game_settings.nl is NULL

And replace this

 local best_model_path = path .. '/epoch_' .. epoch .. net_type_str .. '.model'

with this

 local best_model_path = path .. '/epoch_' .. best_epoch .. net_type_str .. '.model'

because NIL exception too.

Finally, it's crashing at line:

  torch.save(final_model_file_name, best_model)

error thrown:

 /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:76: in function 'select_best_model'
        Training/pickup_best_model.lua:90: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk

I'm on kubuntu 16.04 with torch, lua 5.2, cutorch, cuda, and Nvidia GTX 1060 with 6GB of RAM

The text was updated successfully, but these errors were encountered:

aikupoker · 2020-10-07T08:01:39Z

Hi @garzy

Thanks for reporting this issue.

I have fixed the errors. Could you remove your local changes and update your local repository?

garzy · 2020-10-07T08:32:30Z

I've updated the file and launch it again, but ends crashing at line

  torch.save(final_model_file_name, best_model)

Throwing the above exception

Maybe could be an error with return type of local best_model = torch.load(best_model_path) ??

aikupoker · 2020-10-07T08:47:18Z

Could you print the complete log output?

garzy · 2020-10-07T08:48:51Z

Selecting best model with less Validation Huber Loss ...
best epoch: 201
best loss: 0.076074071484905
best model path ../Data/Models/NoLimit/river//epoch_201_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:83: in function 'select_best_model'
        Training/pickup_best_model.lua:97: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?

aikupoker · 2020-10-07T08:52:55Z

Please, update again your local repo "deeper-stacker" in master branch and try again.

Thanks!

garzy · 2020-10-07T08:58:26Z

Same problem :(

best epoch: 201
best loss: 0.076074071484905
best model info path ../Data/Models/NoLimit/river/epoch_201_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:85: in function 'select_best_model'
        Training/pickup_best_model.lua:104: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?

aikupoker · 2020-10-07T09:06:14Z

Could you do a ls -lah ../Data/Models/NoLimit/river/ to this path?

garzy · 2020-10-07T09:35:22Z

...
-rw-rw-r-- 1 kml kml  119 oct  7 01:36 epoch_86_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:36 epoch_86_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:37 epoch_87_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:37 epoch_87_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:38 epoch_88_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:38 epoch_88_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:38 epoch_89_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:38 epoch_89_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 00:47 epoch_8_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 00:47 epoch_8_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:39 epoch_90_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:39 epoch_90_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:40 epoch_91_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:40 epoch_91_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:40 epoch_92_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:40 epoch_92_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:41 epoch_93_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:41 epoch_93_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:42 epoch_94_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:42 epoch_94_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:42 epoch_95_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:42 epoch_95_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:43 epoch_96_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:43 epoch_96_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:43 epoch_97_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:43 epoch_97_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:44 epoch_98_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:44 epoch_98_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 01:45 epoch_99_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 01:45 epoch_99_gpu.model
-rw-rw-r-- 1 kml kml  119 oct  7 00:47 epoch_9_gpu.info
-rw-rw-r-- 1 kml kml 112M oct  7 00:47 epoch_9_gpu.model
-rw-rw-r-- 1 kml kml    0 oct  7 10:56 final__gpu.model
-rw-rw-r-- 1 kml kml    8 oct  6 19:47 .gitkeep

garzy · 2020-10-07T09:37:54Z

two underscores at final__gpu.model... maybe this?

aikupoker · 2020-10-07T09:53:56Z

I fixed two typos in master branch. Try again.

garzy · 2020-10-07T10:06:04Z

same problem

saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:85: in function 'select_best_model'
        Training/pickup_best_model.lua:104: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?

Don't worry, maybe have corrupted training models because I generated them with kubuntu 18.04 but at the end I'm having segmentation fault core exceptions and trying to fix the problem I've noticed that I need kubuntu 16 instead, but in fresh install of kubuntu 16 I've run directly the step 4.th Training/main_train.lua 4

I'm going to retry operations from step 3.th Training/raw_converter.lua 4

garzy · 2020-10-07T17:57:36Z

After repeat the steps I'm having the same error:

/deeper-stacker/Source$ th Training/pickup_best_model.lua 4
Selecting best model with less Validation Huber Loss ...
best epoch: 204 of total: 350 epochs
best loss: 0.074449650388494
best model info path ../Data/Models/NoLimit/river/epoch_204_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
        /home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
        /home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
        Training/pickup_best_model.lua:85: in function 'select_best_model'
        Training/pickup_best_model.lua:104: in main chunk
        [C]: in function 'dofile'
        /home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: in ?

garzy · 2020-10-07T18:45:46Z

I can continue doing this without launch pickup_best_model script

cp epoch_204_gpu.info final_gpu.info
cp epoth_204_gpu.model final_gpu.model

When I execute $ torch.load('final_cpu.info') model seems to load well.

Then, I continue with turn generation:

kml@kubuntu:~/deeper-stacker$ cd Source && th DataGeneration/main_data_generation.lua 3
Generating data ...
6sAh9s5c 1 292NN information:
learning_rate   0.0001
valid_loss      0.074449650388494
gpu     true
epoch   204
NN architecture:
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> output]
  (1): nn.ConcatTable {
    input
      |`-> (1): nn.Sequential {
      |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> output]
      |      (1): nn.Linear(1009 -> 500)
      |      (2): nn.BatchNormalization (2D) (500)
      |      (3): nn.PReLU
      |      (4): nn.Linear(500 -> 500)
      |      (5): nn.BatchNormalization (2D) (500)
      |      (6): nn.PReLU
      |      (7): nn.Linear(500 -> 500)
      |      (8): nn.BatchNormalization (2D) (500)
      |      (9): nn.PReLU
      |      (10): nn.Linear(500 -> 1008)
      |    }
       `-> (2): nn.Sequential {
             [input -> (1) -> output]
             (1): nn.Narrow
           }
       ... -> output
  }
  (2): nn.ConcatTable {
    input
      |`-> (1): nn.Sequential {
      |      [input -> (1) -> output]
      |      (1): nn.SelectTable(1)
      |    }
       `-> (2): nn.Sequential {
             [input -> (1) -> (2) -> (3) -> output]
             (1): nn.DotProduct
             (2): nn.Replicate
             (3): nn.MulConstant
           }
       ... -> output
  }
  (3): nn.CAddTable
}
nextround init_bucket time: 1.1490240097046
    avgTime: 123.4568271637
AdAs8s8h 2 979nextround init_bucket time: 0.58787417411804
    avgTime: 73.452112078667
4hTdAd7h 3 1712nextround init_bucket time: 1.2796399593353
    avgTime: 57.244448343913
Th2d3s5c 4 14861nextround init_bucket time: 0.56205201148987
    avgTime: 44.798284769058
2s2hTsJc 5 100nextround init_bucket time: 0.64476418495178

yffbit · 2021-02-09T12:53:12Z

This error is weird. When I run th Training/pickup_best_model.lua 4 directly, the error occurs. When I debug the pickup_best_model.lua file in vs code, no error occurs and the final_gpu.model works fine. When I enter the torch environment and execute torch.save(final_model_file_name, best_model) manually, no error occurs. My environment is win10, luajit, cutorch.

garzy changed the title ~~please fix coding error in pickup_best_model.lua script~~ please fix coding errors in pickup_best_model.lua script Oct 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

please fix coding errors in pickup_best_model.lua script #18

please fix coding errors in pickup_best_model.lua script #18

garzy commented Oct 7, 2020 •

edited

Loading

aikupoker commented Oct 7, 2020

garzy commented Oct 7, 2020

aikupoker commented Oct 7, 2020

garzy commented Oct 7, 2020

aikupoker commented Oct 7, 2020

garzy commented Oct 7, 2020

aikupoker commented Oct 7, 2020

garzy commented Oct 7, 2020

garzy commented Oct 7, 2020

aikupoker commented Oct 7, 2020 •

edited

Loading

garzy commented Oct 7, 2020

garzy commented Oct 7, 2020

garzy commented Oct 7, 2020

yffbit commented Feb 9, 2021

please fix coding errors in pickup_best_model.lua script #18

please fix coding errors in pickup_best_model.lua script #18

Comments

garzy commented Oct 7, 2020 • edited Loading

aikupoker commented Oct 7, 2020

garzy commented Oct 7, 2020

aikupoker commented Oct 7, 2020

garzy commented Oct 7, 2020

aikupoker commented Oct 7, 2020

garzy commented Oct 7, 2020

aikupoker commented Oct 7, 2020

garzy commented Oct 7, 2020

garzy commented Oct 7, 2020

aikupoker commented Oct 7, 2020 • edited Loading

garzy commented Oct 7, 2020

garzy commented Oct 7, 2020

garzy commented Oct 7, 2020

yffbit commented Feb 9, 2021

garzy commented Oct 7, 2020 •

edited

Loading

aikupoker commented Oct 7, 2020 •

edited

Loading