Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training does not start #17

Open
acDante opened this issue Jan 12, 2019 · 2 comments
Open

Training does not start #17

acDante opened this issue Jan 12, 2019 · 2 comments

Comments

@acDante
Copy link

acDante commented Jan 12, 2019

Hi, Luheng:
Thanks for your great work! I encountered some strange errors during training. I used the following to start training your model :
python python/train.py --config=./config/srl_config.json --model=./output --train=./sample_data/sentences_with_gold.txt --dev=./sample_data/sentences_with_gold.txt --task=srl

And I got these outputs in the terminal:

/scratch/users/duxi/miniconda3/envs/deep_srl/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6021 on context None
Mapped name None to device cuda: GeForce GTX TITAN X (0000:04:00.0)
Task: srl
Embedding size=100
Extracting features
Extraced 19 words and 9 tags
Max training sentence length: 9
Max development sentence length: 9
Warning: not using official gold predicates. Not for formal evaluation.
Dev data has 1 batches.
Data loading duration was 0:00:14.
[WARNING] Log directory ./output is not empty, previous checkpoints might be overwritten
Preparation duration was 0:00:00.
Using 2 feature types, projected output dim=200.
('lstm_0_rdrop', 0.1, True)
<neural_srl.theano.layer.HighwayLSTMLayer object at 0x7fe5782bab50>
('lstm_1_rdrop', 0.1, True)
<neural_srl.theano.layer.HighwayLSTMLayer object at 0x7fe5782620d0>
('lstm_2_rdrop', 0.1, True)
<neural_srl.theano.layer.HighwayLSTMLayer object at 0x7fe56bf82c10>
('lstm_3_rdrop', 0.1, True)
<neural_srl.theano.layer.HighwayLSTMLayer object at 0x7fe570087f90>
('lstm_4_rdrop', 0.1, True)
<neural_srl.theano.layer.HighwayLSTMLayer object at 0x7fe5781912d0>
('lstm_5_rdrop', 0.1, True)
<neural_srl.theano.layer.HighwayLSTMLayer object at 0x7fe570090f90>
('lstm_6_rdrop', 0.1, True)
<neural_srl.theano.layer.HighwayLSTMLayer object at 0x7fe57809b590>
('lstm_7_rdrop', 0.1, True)
<neural_srl.theano.layer.HighwayLSTMLayer object at 0x7fe578203f90>
embedding_0 embedding_0 [ 19 100]
embedding_1 embedding_1 [ 2 100]
lstm_0_W lstm_0_W [ 200 1800]
lstm_0_U lstm_0_U [ 300 1500]
lstm_0_b lstm_0_b [1800]
lstm_1_W lstm_1_W [ 300 1800]
lstm_1_U lstm_1_U [ 300 1500]
lstm_1_b lstm_1_b [1800]
lstm_2_W lstm_2_W [ 300 1800]
lstm_2_U lstm_2_U [ 300 1500]
lstm_2_b lstm_2_b [1800]
lstm_3_W lstm_3_W [ 300 1800]
lstm_3_U lstm_3_U [ 300 1500]
lstm_3_b lstm_3_b [1800]
lstm_4_W lstm_4_W [ 300 1800]
lstm_4_U lstm_4_U [ 300 1500]
lstm_4_b lstm_4_b [1800]
lstm_5_W lstm_5_W [ 300 1800]
lstm_5_U lstm_5_U [ 300 1500]
lstm_5_b lstm_5_b [1800]
lstm_6_W lstm_6_W [ 300 1800]
lstm_6_U lstm_6_U [ 300 1500]
lstm_6_b lstm_6_b [1800]
lstm_7_W lstm_7_W [ 300 1800]
lstm_7_U lstm_7_U [ 300 1500]
lstm_7_b lstm_7_b [1800]
softmax_W softmax_W [300 9]
softmax_b softmax_b [9]

After these output, I never got other terminal output and the file "./output/checkpoints.tsv" remains empty even after the training is started for a long time. It seems the training does not make any progress at all. I am not sure if this is a GPU-specific issue: I am using cuda8.0 + cudnn8.0 and here is my theano configuration file:

[global]
device = cuda
floatX = float64
mode = FAST_RUN

[cuda]
root=/usr/local/cuda-8.0/

[dnn]
enable=True
include_path=/usr/local/cuda-8.0/include
library_path=/usr/local/cuda-8.0/lib64

[lib]
cnmem = 0.8

[nvcc]
fastmath = True

[gcc]
cxxflags=-Wno-narrowing
~

Could you give me any ideas about the potential reason ?

@Huijun-Cui
Copy link

hi do you solve this problem , I also account this problem ,

@acDante
Copy link
Author

acDante commented Mar 30, 2019

I guess this problem is caused by incompatible GPU configuration, but I did not solve this problem .. I would suggest you to take a look at allennlp. This model is also implemented there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants