Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

测试vgg_16_cifar.py报错 #9

Closed
quietsmile opened this issue Aug 31, 2016 · 7 comments
Closed

测试vgg_16_cifar.py报错 #9

quietsmile opened this issue Aug 31, 2016 · 7 comments
Assignees
Labels

Comments

@quietsmile
Copy link

quietsmile commented Aug 31, 2016

ubuntu 14.04, cuda 7.5, cudnn 5.1.5 安装成功
但是跑demo/image_classification/train.sh时报错,错误信息如下:

[INFO 2016-08-31 17:20:21,497 layers.py:1430] channels=512 size=8192
[INFO 2016-08-31 17:20:21,497 layers.py:1430] output size for conv_8 is 4
[INFO 2016-08-31 17:20:21,498 layers.py:1430] channels=512 size=8192
[INFO 2016-08-31 17:20:21,499 layers.py:1430] output size for conv_9 is 4
[INFO 2016-08-31 17:20:21,501 layers.py:1490] output size for pool_3 is 2_2
[INFO 2016-08-31 17:20:21,502 layers.py:1490] output size for pool_4 is 1_1
[INFO 2016-08-31 17:20:21,507 networks.py:960] The input order is [image, label]
[INFO 2016-08-31 17:20:21,507 networks.py:963] The output order is [cost_0]
I0831 17:20:21.523936 13974 Trainer.cpp:169] trainer mode: Normal
I0831 17:20:21.546594 13974 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-08-31 17:20:21,682 image_provider.py:52] Image size: 32
[INFO 2016-08-31 17:20:21,682 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-08-31 17:20:21,682 image_provider.py:58] DataProvider Initialization finished
I0831 17:20:21.682675 13974 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-08-31 17:20:21,682 image_provider.py:52] Image size: 32
[INFO 2016-08-31 17:20:21,682 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-08-31 17:20:21,682 image_provider.py:58] DataProvider Initialization finished
I0831 17:20:21.683006 13974 GradientMachine.cpp:134] Initing parameters..
I0831 17:20:22.312453 13974 GradientMachine.cpp:141] Init parameters done.
.........
I0831 17:20:52.894659 13974 TrainerInternal.cpp:162] Batch=100 samples=12800 AvgCost=2.35864 CurrentCost=2.35864 Eval: classification_error_evaluator=0.833906 CurrentEval: classification_error_evaluator=0.833906
.........
I0831 17:21:00.884374 13974 TrainerInternal.cpp:162] Batch=200 samples=25600 AvgCost=2.15774 CurrentCost=1.95684 Eval: classification_error_evaluator=0.792148 CurrentEval: classification_error_evaluator=0.750391
.........
I0831 17:21:08.731333 13974 TrainerInternal.cpp:162] Batch=300 samples=38400 AvgCost=2.01417 CurrentCost=1.72705 Eval: classification_error_evaluator=0.753672 CurrentEval: classification_error_evaluator=0.676719
.........I0831 17:21:15.873359 13974 TrainerInternal.cpp:179] Pass=0 Batch=391 samples=50048 AvgCost=1.90795 Eval: classification_error_evaluator=0.71814
F0831 17:21:18.497601 13974 hl_cuda_cudnn.cc:779] Check failed: CUDNN_STATUS_SUCCESS == cudnnStat (0 vs. 5) Cudnn Error: CUDNN_STATUS_INVALID_VALUE
*** Check failure stack trace: ***
@ 0x7f609f255daa (unknown)
@ 0x7f609f255ce4 (unknown)
@ 0x7f609f2556e6 (unknown)
@ 0x7f609f258687 (unknown)
@ 0x8a98d4 hl_convolution_forward()
@ 0x5c66fc paddle::CudnnConvLayer::forward()
@ 0x62305c paddle::NeuralNetwork::forward()
@ 0x6b54af paddle::Tester::testOneBatch()
@ 0x6b5dc2 paddle::Tester::testOnePeriod()
@ 0x69a28c paddle::Trainer::trainOnePass()
@ 0x69d687 paddle::Trainer::train()
@ 0x53b0b3 main
@ 0x7f609e461ec5 (unknown)
@ 0x546695 (unknown)
@ (nil) (unknown)

更改cudnn版本,5.0.5, 4.0.4错误都一样~
求助!

@reyoung
Copy link
Collaborator

reyoung commented Aug 31, 2016

Please use command paddle version to print compile flag, and paste them here~~ Thanks.

@gangliao
Copy link
Contributor

Hi, Can you post your GPU type name? For instance, K40?

@quietsmile
Copy link
Author

PaddlePaddle 0.8.0b, compiled with
with_avx: ON
with_gpu: ON
with_double: OFF
with_python: ON
with_rdma: OFF
with_glog: ON
with_gflags: ON
with_metric_learning:
with_timer: OFF
with_predict_sdk:

gtx titanx, driver 352.39

@qingqing01
Copy link
Contributor

@quietsmile Hi, there is no problem when we tested on Tesla K20/K40 with cuda 7.5 and cudnn 5.1, cudnn 4.0. But we don't have gtx titanx environment and wasn't able to to replicate this problem. We will solve it later.

@wangjiangb
Copy link

I have added a change list to fix it.

@qingqing01
Copy link
Contributor

@quietsmile We have fixed this problem in GTX 980, see 341486d .

@hedaoyuan
Copy link
Contributor

Fixed #107, and close issue.

qingqing01 added a commit to qingqing01/Paddle that referenced this issue Aug 10, 2017
velconia pushed a commit that referenced this issue Mar 22, 2019
bingyanghuang pushed a commit to bingyanghuang/Paddle that referenced this issue Mar 25, 2019
Add fsp op for distillation in slim.
thisjiang pushed a commit to thisjiang/Paddle that referenced this issue Oct 28, 2021
paddle-bot-old bot pushed a commit that referenced this issue Nov 17, 2021
* add c_concat for npu

* UT for c_concat_npu

* fix c_concat , adding rank

* add assert nranks

* add assert dims % nranks == 0
gglin001 pushed a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021
zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this issue May 23, 2022
danleifeng pushed a commit to danleifeng/Paddle that referenced this issue May 31, 2022
zmxdream referenced this issue in zmxdream/Paddle Jun 8, 2022
zmxdream referenced this issue in zmxdream/Paddle Jun 30, 2022
* parquet parser

* fix IsThreadLocalCapturing

* run cuda kernel: CalcAucKernel with 512 threads

* fix_afs_api_download_dnn_plugin

* fix_fleet_last_base

* parquet parser

* add ps core so

* chg cmake

Co-authored-by: rensilin <rensilin@baidu.com>
Co-authored-by: root <root@yq01-sys-hic-k8s-v100-box-a225-0693.yq01.baidu.com>

* parquet

* fix IsThreadLocalCapturing

* run cuda kernel: CalcAucKernel with 512 threads

* fix_afs_api_download_dnn_plugin

* fix_fleet_last_base

* parquet parser

* add ps core so

* chg cmake

* fix libjvm lost

Co-authored-by: rensilin <rensilin@baidu.com>
Co-authored-by: root <root@yq01-sys-hic-k8s-v100-box-a225-0693.yq01.baidu.com>

* add dymf (PaddlePaddle#10)

* dymf tmp

* add dymf tmp

* local test change

* pull thread pool

* fix conflict

* delete unuse log

* local change for mirrow 0

* fix dymf

* code clean

* fix code clean

* code clean

* code clean

* fix dymf

* fix dymf

* add endpass optimize

* clean code

* fix endpass optimize

* fix

* fix

Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
Co-authored-by: Thunderbrook <a754913769@163.com>

* pipeline build (#9)

* Fix eigvals_op (PaddlePaddle#12)

* dymf tmp

* add dymf tmp

* local test change

* pull thread pool

* fix conflict

* delete unuse log

* local change for mirrow 0

* fix dymf

* code clean

* fix code clean

* code clean

* code clean

* fix dymf

* fix dymf

* add endpass optimize

* clean code

* fix endpass optimize

* fix

* fix

* fix eigvals_op

* merge pre-stable

* merge pre-stable

Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
Co-authored-by: Thunderbrook <a754913769@163.com>

* test

* passid memory && Generalization

* fix code style

Co-authored-by: xionglei1234 <105704175+xionglei1234@users.noreply.github.com>
Co-authored-by: rensilin <rensilin@baidu.com>
Co-authored-by: root <root@yq01-sys-hic-k8s-v100-box-a225-0693.yq01.baidu.com>
Co-authored-by: zmxdream <zhangminxu01@baidu.com>
Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
Co-authored-by: Thunderbrook <a754913769@163.com>
Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com>
Co-authored-by: liaoxiaochao <liaoxiaochao@baidu.com>
zmxdream referenced this issue in zmxdream/Paddle Jul 4, 2022
zmxdream referenced this issue in zmxdream/Paddle Jul 6, 2022
* revert pipeline pull

* fix conflict

* fix conflict

* fix conflict

* add jvm.so

* Revert "pipeline build (#9)"

This reverts commit 869c43f.

* revert async build pull
zmxdream referenced this issue in zmxdream/Paddle Jul 6, 2022
* revert pipeline pull

* fix conflict

* fix conflict

* fix conflict

* add jvm.so

* Revert "pipeline build (#9)"

This reverts commit 869c43f.

* revert async build pull

* fix dataset

* fix dataset
jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022
jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022
add cgpu and file parser block bug fix
qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023
* update docs
* add pretrained models
0x45f pushed a commit to 0x45f/Paddle that referenced this issue Jun 19, 2023
tianyan01 pushed a commit to tianyan01/Paddle that referenced this issue Jan 23, 2024
lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this issue Feb 29, 2024
NKNaN pushed a commit to NKNaN/Paddle that referenced this issue Mar 3, 2024
Fridge003 pushed a commit to Fridge003/Paddle that referenced this issue Mar 21, 2024
ming1753 added a commit to ckl117/Paddle that referenced this issue Jul 23, 2024
支持overwrite = True时的scatter算子,减少子图数量
lizexu123 added a commit to lizexu123/Paddle that referenced this issue Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants