-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault with hybrid Overfeat-Alexnet architecture for large input images #1260
Comments
Probably running out of memory try to reduce the batch_size On Friday, October 10, 2014, andreesteva notifications@github.com wrote:
Sergio |
I'm wonder if this configure will help to improve accuracy or speed or memory usage? |
@sguada @sunbaigui |
@andreesteva follow the questions caffe-users mailing list I don't think there is any bug here, since it works with 315x315 images, so probably there is a problem of your prototxt definition, for instance when you change the size of the images, do you change the mean_file accordingly? Otherwise you should use mean_values instead #1070. |
@sguada On old caffe version: On new caffe version, with cudnn From caffe version to version, I've encountered times when the examples haven't been fully updated (i.e. the argument syntax for creating leveldb files changed). Could it possibly be due to that? If helpful, I can upload a package to github which can allow you to recreate the error. |
Could you try new caffe version without cudnn? Sergio 2014-10-17 14:28 GMT-07:00 andreesteva notifications@github.com:
|
I'll try a version without cudnn In the meantime, I've uploaded the code I used for the old caffe and the new caffe version to It includes all the training and validation images already, as well as the leveldb and lmdb files used. Copy either folder into the folder $CAFFE_ROOT/examples/ and then run To recreate the error I'm seeing. |
I'm sorry, but I'm not planning to recreate your error, can you just post Sergio 2014-10-17 15:32 GMT-07:00 andreesteva notifications@github.com:
|
Sure - is this a file that is generated somewhere or just the stdout output when you run caffe? |
It is the output generated when you run caffe, but you can redirect it with: Sergio 2014-10-17 15:47 GMT-07:00 andreesteva notifications@github.com:
|
The segfault is not very informative, can you Also try to use gdb to get where the segmentation fault is happening, by Sergio 2014-10-17 15:52 GMT-07:00 andreesteva notifications@github.com:
|
OK, I'll try that. |
If you use bash, then in your .bashrc, so it is there all time. Sergio 2014-10-17 16:00 GMT-07:00 andreesteva notifications@github.com:
|
OK. I put up the log file: I'm trying now with gdb |
I also just put up the gdb file, which isn't highly elucidative either. I hope you can make more sense of it than I can... |
Given this warning, makes me guess that your LMDB is empty. Could you check Sergio 2014-10-17 16:21 GMT-07:00 andreesteva notifications@github.com:
|
I checked the file size and even tried increasing the number of images generated. The lmdb_folder/data.mdb files are substantial, and when I rerun gdb --args caffe ... I end up with the same exact output. |
Try with backend:LEVELDB instead Sergio 2014-10-17 17:01 GMT-07:00 andreesteva notifications@github.com:
|
The same issue persists.... |
Don't know, please post your question at caffe-users mailing list |
Thats alright. Thank you so much for your help regardless. |
I am trying to implement a Krizhevsky net with the last 3 fully connected layers converted to overfeat's 1x1 convolution layers. I can get it to run with 256x256 input images up to 315x315 input images. Ultimately I'd like for the net to accept 400x640 sized images, but at the moment anything greater than 315x315 causes a segmentation fault.
Does anyone know what might be happening? Below are the solver.prototxt and train_val_segfault.prototxt that I'm using.
Solver.prototxt
net: "models/bvlc_reference_caffenet/train_val.prototxt"
net: "train_val_segfault.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/bvlc_reference_caffenet/caffenet_train"
snapshot_prefix: "randcaffenet"
solver_mode: GPU
Tra_val_segfault.prototxt:
name: "CaffeNet"
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "random_train_leveldb"
batch_size: 50
}
transform_param {
crop_size: 227
mean_file: "random_image_mean.binaryproto"
mirror: true
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "random_val_leveldb"
batch_size: 50
}
transform_param {
crop_size: 227
mean_file: "random_image_mean.binaryproto"
mirror: false
}
include: { phase: TEST }
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "pool1"
type: POOLING
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm1"
type: LRN
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "norm1"
top: "conv2"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu2"
type: RELU
bottom: "conv2"
top: "conv2"
}
layers {
name: "pool2"
type: POOLING
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm2"
type: LRN
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "norm2"
top: "conv3"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu3"
type: RELU
bottom: "conv3"
top: "conv3"
}
layers {
name: "conv4"
type: CONVOLUTION
bottom: "conv3"
top: "conv4"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu4"
type: RELU
bottom: "conv4"
top: "conv4"
}
layers {
name: "conv5"
type: CONVOLUTION
bottom: "conv4"
top: "conv5"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu5"
type: RELU
bottom: "conv5"
top: "conv5"
}
layers {
name: "pool5"
type: POOLING
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "fc6-conv"
type: CONVOLUTION
bottom: "pool5"
top: "fc6-conv"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 4096
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu6"
type: RELU
bottom: "fc6-conv"
top: "fc6-conv"
}
layers {
name: "drop6"
type: DROPOUT
bottom: "fc6-conv"
top: "fc6-conv"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc7-conv"
type: CONVOLUTION
bottom: "fc6-conv"
top: "fc7-conv"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 4096
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu7"
type: RELU
bottom: "fc7-conv"
top: "fc7-conv"
}
layers {
name: "drop7"
type: DROPOUT
bottom: "fc7-conv"
top: "fc7-conv"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc8-conv"
type: CONVOLUTION
bottom: "fc7-conv"
top: "fc8-conv"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 1000
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "fc8-conv"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "fc8-conv"
bottom: "label"
top: "loss"
}
The text was updated successfully, but these errors were encountered: