Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

报错:RuntimeError: CPU memory allocation failed #2

Open
starevelyn opened this issue Jan 4, 2018 · 8 comments
Open

报错:RuntimeError: CPU memory allocation failed #2

starevelyn opened this issue Jan 4, 2018 · 8 comments
Labels

Comments

@starevelyn
Copy link

root@liangzhiNLP:/home/liangzhi/liangxingzheng/multi-criteria-cws/multi-criteria-cws# ./script/train.sh joint-10in1 --dynet-seed 10364 --python-seed 840868838938890892
[dynet] random seed: 10364
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
model.py --dataset dataset/joint-10in1/dataset.pkl --num-epochs 60 --word-embeddings data/embedding/character.vec --log-dir result/joint-10in1 --dropout 0.2 --learning-rate 0.01 --learning-rate-decay 0.9 --hidden-dim 100 --dynet-seed 22059 --bigram --skip-dev --dynet-seed 10364 --python-seed 840868838938890892

Namespace(always_model=False, batch_size=20, bigram=True, char_embedding_dim=100, char_embeddings=None, char_hidden_dim=100, clip_norm=None, dataset='dataset/joint-10in1/dataset.pkl', debug=False, dropout=0.2, dynet_autobatch=None, dynet_gpus=None, dynet_mem=None, dynet_seed=10364, dynet_weight_decay=None, hidden_dim=100, learning_rate=0.01, learning_rate_decay=0.9, log_dir='result/joint-10in1', lowercase_words=False, lstm_layers=1, no_model=False, no_we=False, no_we_update=False, num_epochs=60, old_model=None, python_seed=840868838938890892, skip_dev=True, subset=None, task_name='2018-01-04-15-01-54', test=False, tie_two_embeddings=False, use_char_rnn=False, word_embeddings='data/embedding/character.vec')
Python random seed: 840868838938890892

Memory pool info for each devices:
Device CPU - FOR Memory 128MB, BACK Memory 128MB, PARAM Memory 128MB, SCRATCH Memory 128MB.
CPU memory allocation failed n=570425344 align=32
Traceback (most recent call last):
File "model.py", line 492, in
tie_two_embeddings=options.tie_two_embeddings
File "model.py", line 56, in init
self.bigram_lookup = self.model.add_lookup_parameters((len(b2i), word_embedding_dim))
File "_dynet.pyx", line 1183, in _dynet.ParameterCollection.add_lookup_parameters
File "_dynet.pyx", line 1210, in _dynet.ParameterCollection.add_lookup_parameters
RuntimeError: CPU memory allocation failed

这个错误是什么原因呀?要改代码吗还是环境问题。。

@hankcs
Copy link
Owner

hankcs commented Jan 4, 2018

不清楚,看上去像是内存不够,试试更大内存的机器。我的实验环境是8个G。

@hankcs hankcs added the question label Jan 4, 2018
@starevelyn
Copy link
Author

确实是内存问题,多谢啦!

@starevelyn
Copy link
Author

Python random seed: 840868838938890892
Training Algorithm: <class '_dynet.MomentumSGDTrainer'>
Number training instances: 2533999
Number dev instances: 262929
Epoch 1 out of 60
Traceback (most recent call last):
File "model.py", line 528, in
loss_expr = model.neg_log_loss(instance.sentence, instance.tags)
File "model.py", line 192, in neg_log_loss
forward_score = self.forward(observations)
File "model.py", line 212, in forward
alphas_t.append(log_sum_exp(next_tag_expr))
File "model.py", line 202, in log_sum_exp
return max_score_expr + dy.log(dy.sum_cols(dy.transpose(dy.exp(scores - max_score_expr_broadcast))))
AttributeError: module 'dynet' has no attribute 'sum_cols'
加了内存条又出现了这个问题。。

@hankcs
Copy link
Owner

hankcs commented Jan 4, 2018

Dynet版本号不匹配,必须是2.0.1:https://github.com/clab/dynet/releases/tag/2.0.1

@starevelyn
Copy link
Author

starevelyn commented Jan 8, 2018

`AttributeError: module 'dynet' has no attribute 'sum_cols'
starevelyn@starevelyn-OptiPlex-7020:~/multi-criteria-cws$ pip3 list
apturl (0.5.2)
beautifulsoup4 (4.4.1)
blinker (1.3)
Brlapi (0.6.4)
chardet (2.3.0)
checkbox-support (0.22)
cmake (0.9.0)
command-not-found (0.3)
cryptography (1.2.3)
Cython (0.27.3)
defer (1.0.6)
dyNET (2.0.2)
还是存在这个错误。。

starevelyn@starevelyn-OptiPlex-7020:~/multi-criteria-cws$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

import dynet
[dynet] random seed: 3213626540
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
dynet.sum_cols()
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'dynet' has no attribute 'sum_cols'
`
这个是什么情况啊?

@hankcs
Copy link
Owner

hankcs commented Jan 8, 2018

Dynet版本号不匹配,必须是2.0.1:https://github.com/clab/dynet/releases/tag/2.0.1

而你安装的是dyNET (2.0.2)

@starevelyn
Copy link
Author

我又重新按前面提问那个命令安装了一下,这次应该按对了,但是还是报错了。。
liangzhi@liangzhiNLP:~$ python3
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import dynet
[dynet] random seed: 1811970545
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
dynet.version
2.0
错误是这个
Python random seed: 840868838938890892
Training Algorithm: <class '_dynet.MomentumSGDTrainer'>
Number training instances: 2533999
Number dev instances: 262929
Epoch 1 out of 60
126700/126700 [==============================] - 7497s - train loss: 1.0665
Traceback (most recent call last):
File "model.py", line 549, in
trainer.learning_rate *= options.learning_rate_decay
AttributeError: '_dynet.MomentumSGDTrainer' object has no attribute 'learning_rate'

然后我找了一下2.0版本里面确实没有learning_rate这个属性啊??

@hankcs
Copy link
Owner

hankcs commented Jan 9, 2018

感谢反馈,抱歉我提供了错误的版本号,正确的版本号应该是https://github.com/clab/dynet/releases/tag/2.0.1 ,已经反复验证过了。

当时由于从源码编译安装的Dynet版本号只显示dyNET (0.0.0),而论文试验是8月份开始的,所以按照Dynet的发布日志猜测是v2.0。安装后果然可以启动,但每个epoch会出现找不到learning_rate的问题。现在从git commit hash(87df34103625102493f8c660684146a636e2482c)看,应该属于2.0到2.0.1之间的一个版本。通过反复验证,发现2.0.1可以正常运行。

麻烦按照:#1 (comment) 重新安装2.0.1,谢谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants