Skip to content
This repository has been archived by the owner on Nov 3, 2022. It is now read-only.

The drop connect rate (aka survival rate) is incorrect #200

Open
xhluca opened this issue Dec 20, 2020 · 9 comments
Open

The drop connect rate (aka survival rate) is incorrect #200

xhluca opened this issue Dec 20, 2020 · 9 comments

Comments

@xhluca
Copy link

xhluca commented Dec 20, 2020

I originally posted this as an issue here: qubvel/efficientnet#135

However I noticed the two implementations were the same and the error exists here as well, so I decided to post it here.


I just verified with the reference tf.keras implementation, and here are the results. Below is the output for B5

This implementation's drop connect rate

(index, name, rate)

0 block1b_drop 0.9875
1 block1c_drop 0.975
2 block2b_drop 0.95
3 block2c_drop 0.9375
4 block2d_drop 0.925
5 block2e_drop 0.9125
6 block3b_drop 0.8875
7 block3c_drop 0.875
8 block3d_drop 0.8625
9 block3e_drop 0.85
10 block4b_drop 0.825
11 block4c_drop 0.8125
12 block4d_drop 0.8
13 block4e_drop 0.7875
14 block4f_drop 0.775
15 block4g_drop 0.7625
16 block5b_drop 0.7375
17 block5c_drop 0.725
18 block5d_drop 0.7124999999999999
19 block5e_drop 0.7
20 block5f_drop 0.6875
21 block5g_drop 0.675
22 block6b_drop 0.6499999999999999
23 block6c_drop 0.6375
24 block6d_drop 0.625
25 block6e_drop 0.6125
26 block6f_drop 0.6
27 block6g_drop 0.5874999999999999
28 block6h_drop 0.575
29 block6i_drop 0.5625
30 block7b_drop 0.5375
31 block7c_drop 0.5249999999999999
32 top_dropout 0.6

Tensorflow's drop connect rate

0 block1b_drop 0.9948717948717949
1 block1c_drop 0.9897435897435898
2 block2b_drop 0.9794871794871794
3 block2c_drop 0.9743589743589743
4 block2d_drop 0.9692307692307692
5 block2e_drop 0.9641025641025641
6 block3b_drop 0.9538461538461538
7 block3c_drop 0.9487179487179487
8 block3d_drop 0.9435897435897436
9 block3e_drop 0.9384615384615385
10 block4b_drop 0.9282051282051282
11 block4c_drop 0.9230769230769231
12 block4d_drop 0.9179487179487179
13 block4e_drop 0.9128205128205128
14 block4f_drop 0.9076923076923077
15 block4g_drop 0.9025641025641026
16 block5b_drop 0.8923076923076922
17 block5c_drop 0.8871794871794871
18 block5d_drop 0.882051282051282
19 block5e_drop 0.8769230769230769
20 block5f_drop 0.8717948717948718
21 block5g_drop 0.8666666666666667
22 block6b_drop 0.8564102564102564
23 block6c_drop 0.8512820512820513
24 block6d_drop 0.8461538461538461
25 block6e_drop 0.841025641025641
26 block6f_drop 0.8358974358974359
27 block6g_drop 0.8307692307692307
28 block6h_drop 0.8256410256410256
29 block6i_drop 0.8205128205128205
30 block7b_drop 0.8102564102564103
31 block7c_drop 0.8051282051282052
32 top_dropout 0.6

At index 18 it's off by a significant amount

@darcula1993
Copy link

I check the drop rate per block and it looks fine:

block1a_ 1.0
block1b_ 0.9875
block1c_ 0.975
block2a_ 0.9625
block2b_ 0.95
block2c_ 0.9375
block2d_ 0.925
block2e_ 0.9125
block3a_ 0.9
block3b_ 0.8875
block3c_ 0.875
block3d_ 0.8625
block3e_ 0.85
block4a_ 0.8375
block4b_ 0.825
block4c_ 0.8125
block4d_ 0.8
block4e_ 0.7875
block4f_ 0.775
block4g_ 0.7625
block5a_ 0.75
block5b_ 0.7375
block5c_ 0.725
block5d_ 0.7124999999999999
block5e_ 0.7
block5f_ 0.6875
block5g_ 0.675
block6a_ 0.6625
block6b_ 0.6499999999999999
block6c_ 0.6375
block6d_ 0.625
block6e_ 0.6125
block6f_ 0.6
block6g_ 0.5874999999999999
block6h_ 0.575
block6i_ 0.5625
block7a_ 0.55
block7b_ 0.5375
block7c_ 0.5249999999999999

@xhluca
Copy link
Author

xhluca commented Dec 22, 2020

@darcula1993 I'm confused. Shouldn't the block rate be at ~0.8 for the final block since the drop_connect_rate is 0.2 by default?

@xhluca
Copy link
Author

xhluca commented Dec 22, 2020

So it turns out I pasted different values. However the problem remains as indicated.

@darcula1993
Copy link

        for j in range(round_repeats(args.pop('repeats'))):
            # The first block needs to take care of stride and filter size increase.
            if j > 0:
                args['strides'] = 1
                args['filters_in'] = args['filters_out']
            x = block(x, activation_fn, drop_connect_rate * b / blocks,
                      name='block{}{}_'.format(i + 1, chr(j + 97)), **args)
            b += 1

I check the code and seems that b can be greater than num of blocks,not sure why.

@xhluca
Copy link
Author

xhluca commented Dec 23, 2020

I've observed the same thing as well.

@fmbahrt
Copy link

fmbahrt commented Jan 2, 2021

Qubvel's implementation does not calculate the total number of blocks correctly for configurations larger than B0.

@innat
Copy link

innat commented Jan 3, 2021

Practically it does perform better than official. -_- .

@xhluca
Copy link
Author

xhluca commented Jan 4, 2021

Practically it does perform better than official. -_- .

I've only observed better performance in one case so I'm not sure it generalizes. In that case, the improved performance does indicate that extreme low survival rates (<0.3) might be a good regularization approach.

@innat
Copy link

innat commented Jan 5, 2021

Well, I'm not sure, maybe I need to look again properly. In fact, I spent almost a week assuming that there probably some problem with my data loader using the official efficient-net. But when I use non-official implementation, it was just fine.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants