Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented resnet18 and resnet34 #16363

Closed
wants to merge 22 commits into from

Conversation

zaccharieramzi
Copy link

This should solve this issue : keras-team/keras-applications#151

Which has duplicates here:

I don't know how to test this, this is why I am making it a draft PR.
I haven't implemented the V2, to make this easy to review, and I haven't trained the networks to get the weights.

Note: this is a reopening of #16358, which I messed up with wrong emails in the commits.

@gbaned gbaned requested a review from qlzh727 April 5, 2022 14:35
@google-ml-butler google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Apr 5, 2022
@gbaned gbaned removed the keras-team-review-pending Pending review by a Keras team member. label Apr 5, 2022
@zaccharieramzi
Copy link
Author

Adding the model summaries here for info:

Resnet18:

Model: "resnet18"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                                  
 conv1_bn (BatchNormalization)  (None, 112, 112, 64  256         ['conv1_conv[0][0]']             
                                )                                                                 
                                                                                                  
 conv1_relu (Activation)        (None, 112, 112, 64  0           ['conv1_bn[0][0]']               
                                )                                                                 
                                                                                                  
 pool1_pad (ZeroPadding2D)      (None, 114, 114, 64  0           ['conv1_relu[0][0]']             
                                )                                                                 
                                                                                                  
 pool1_pool (MaxPooling2D)      (None, 56, 56, 64)   0           ['pool1_pad[0][0]']              
                                                                                                  
 conv2_block1_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['pool1_pool[0][0]']             
                                                                                                  
 conv2_block1_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block1_0_conv (Conv2D)   (None, 56, 56, 64)   4160        ['pool1_pool[0][0]']             
                                                                                                  
 conv2_block1_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block1_1_relu[0][0]']    
                                                                                                  
 conv2_block1_0_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_add (Add)         (None, 56, 56, 64)   0           ['conv2_block1_0_bn[0][0]',      
                                                                  'conv2_block1_2_bn[0][0]']      
                                                                                                  
 conv2_block1_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block1_add[0][0]']       
                                                                                                  
 conv2_block2_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block1_out[0][0]']       
                                                                                                  
 conv2_block2_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block2_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block2_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block2_1_relu[0][0]']    
                                                                                                  
 conv2_block2_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block2_add (Add)         (None, 56, 56, 64)   0           ['conv2_block1_out[0][0]',       
                                                                  'conv2_block2_2_bn[0][0]']      
                                                                                                  
 conv2_block2_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block2_add[0][0]']       
                                                                                                  
 conv3_block1_1_conv (Conv2D)   (None, 28, 28, 128)  73856       ['conv2_block2_out[0][0]']       
                                                                                                  
 conv3_block1_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block1_0_conv (Conv2D)   (None, 28, 28, 128)  8320        ['conv2_block2_out[0][0]']       
                                                                                                  
 conv3_block1_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block1_1_relu[0][0]']    
                                                                                                  
 conv3_block1_0_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_add (Add)         (None, 28, 28, 128)  0           ['conv3_block1_0_bn[0][0]',      
                                                                  'conv3_block1_2_bn[0][0]']      
                                                                                                  
 conv3_block1_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block1_add[0][0]']       
                                                                                                  
 conv3_block2_1_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block1_out[0][0]']       
                                                                                                  
 conv3_block2_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block2_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block2_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block2_1_relu[0][0]']    
                                                                                                  
 conv3_block2_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block2_add (Add)         (None, 28, 28, 128)  0           ['conv3_block1_out[0][0]',       
                                                                  'conv3_block2_2_bn[0][0]']      
                                                                                                  
 conv3_block2_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block2_add[0][0]']       
                                                                                                  
 conv4_block1_1_conv (Conv2D)   (None, 14, 14, 256)  295168      ['conv3_block2_out[0][0]']       
                                                                                                  
 conv4_block1_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block1_0_conv (Conv2D)   (None, 14, 14, 256)  33024       ['conv3_block2_out[0][0]']       
                                                                                                  
 conv4_block1_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block1_1_relu[0][0]']    
                                                                                                  
 conv4_block1_0_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_add (Add)         (None, 14, 14, 256)  0           ['conv4_block1_0_bn[0][0]',      
                                                                  'conv4_block1_2_bn[0][0]']      
                                                                                                  
 conv4_block1_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block1_add[0][0]']       
                                                                                                  
 conv4_block2_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block1_out[0][0]']       
                                                                                                  
 conv4_block2_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block2_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block2_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block2_1_relu[0][0]']    
                                                                                                  
 conv4_block2_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block2_add (Add)         (None, 14, 14, 256)  0           ['conv4_block1_out[0][0]',       
                                                                  'conv4_block2_2_bn[0][0]']      
                                                                                                  
 conv4_block2_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block2_add[0][0]']       
                                                                                                  
 conv5_block1_1_conv (Conv2D)   (None, 7, 7, 512)    1180160     ['conv4_block2_out[0][0]']       
                                                                                                  
 conv5_block1_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block1_0_conv (Conv2D)   (None, 7, 7, 512)    131584      ['conv4_block2_out[0][0]']       
                                                                                                  
 conv5_block1_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block1_1_relu[0][0]']    
                                                                                                  
 conv5_block1_0_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_add (Add)         (None, 7, 7, 512)    0           ['conv5_block1_0_bn[0][0]',      
                                                                  'conv5_block1_2_bn[0][0]']      
                                                                                                  
 conv5_block1_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block1_add[0][0]']       
                                                                                                  
 conv5_block2_1_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block1_out[0][0]']       
                                                                                                  
 conv5_block2_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block2_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block2_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block2_1_relu[0][0]']    
                                                                                                  
 conv5_block2_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block2_add (Add)         (None, 7, 7, 512)    0           ['conv5_block1_out[0][0]',       
                                                                  'conv5_block2_2_bn[0][0]']      
                                                                                                  
 conv5_block2_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block2_add[0][0]']       
                                                                                                  
 avg_pool (GlobalAveragePooling  (None, 512)         0           ['conv5_block2_out[0][0]']       
 2D)                                                                                              
                                                                                                  
 predictions (Dense)            (None, 1000)         513000      ['avg_pool[0][0]']               
                                                                                                  
==================================================================================================
Total params: 11,708,328
Trainable params: 11,698,600
Non-trainable params: 9,728
__________________________________________________________________________________________________

Resnet34:

Model: "resnet34"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                                  
 conv1_bn (BatchNormalization)  (None, 112, 112, 64  256         ['conv1_conv[0][0]']             
                                )                                                                 
                                                                                                  
 conv1_relu (Activation)        (None, 112, 112, 64  0           ['conv1_bn[0][0]']               
                                )                                                                 
                                                                                                  
 pool1_pad (ZeroPadding2D)      (None, 114, 114, 64  0           ['conv1_relu[0][0]']             
                                )                                                                 
                                                                                                  
 pool1_pool (MaxPooling2D)      (None, 56, 56, 64)   0           ['pool1_pad[0][0]']              
                                                                                                  
 conv2_block1_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['pool1_pool[0][0]']             
                                                                                                  
 conv2_block1_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block1_0_conv (Conv2D)   (None, 56, 56, 64)   4160        ['pool1_pool[0][0]']             
                                                                                                  
 conv2_block1_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block1_1_relu[0][0]']    
                                                                                                  
 conv2_block1_0_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_add (Add)         (None, 56, 56, 64)   0           ['conv2_block1_0_bn[0][0]',      
                                                                  'conv2_block1_2_bn[0][0]']      
                                                                                                  
 conv2_block1_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block1_add[0][0]']       
                                                                                                  
 conv2_block2_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block1_out[0][0]']       
                                                                                                  
 conv2_block2_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block2_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block2_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block2_1_relu[0][0]']    
                                                                                                  
 conv2_block2_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block2_add (Add)         (None, 56, 56, 64)   0           ['conv2_block1_out[0][0]',       
                                                                  'conv2_block2_2_bn[0][0]']      
                                                                                                  
 conv2_block2_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block2_add[0][0]']       
                                                                                                  
 conv2_block3_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block2_out[0][0]']       
                                                                                                  
 conv2_block3_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block3_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block3_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block3_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block3_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block3_1_relu[0][0]']    
                                                                                                  
 conv2_block3_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block3_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block3_add (Add)         (None, 56, 56, 64)   0           ['conv2_block2_out[0][0]',       
                                                                  'conv2_block3_2_bn[0][0]']      
                                                                                                  
 conv2_block3_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block3_add[0][0]']       
                                                                                                  
 conv3_block1_1_conv (Conv2D)   (None, 28, 28, 128)  73856       ['conv2_block3_out[0][0]']       
                                                                                                  
 conv3_block1_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block1_0_conv (Conv2D)   (None, 28, 28, 128)  8320        ['conv2_block3_out[0][0]']       
                                                                                                  
 conv3_block1_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block1_1_relu[0][0]']    
                                                                                                  
 conv3_block1_0_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_add (Add)         (None, 28, 28, 128)  0           ['conv3_block1_0_bn[0][0]',      
                                                                  'conv3_block1_2_bn[0][0]']      
                                                                                                  
 conv3_block1_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block1_add[0][0]']       
                                                                                                  
 conv3_block2_1_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block1_out[0][0]']       
                                                                                                  
 conv3_block2_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block2_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block2_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block2_1_relu[0][0]']    
                                                                                                  
 conv3_block2_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block2_add (Add)         (None, 28, 28, 128)  0           ['conv3_block1_out[0][0]',       
                                                                  'conv3_block2_2_bn[0][0]']      
                                                                                                  
 conv3_block2_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block2_add[0][0]']       
                                                                                                  
 conv3_block3_1_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block2_out[0][0]']       
                                                                                                  
 conv3_block3_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block3_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block3_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block3_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block3_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block3_1_relu[0][0]']    
                                                                                                  
 conv3_block3_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block3_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block3_add (Add)         (None, 28, 28, 128)  0           ['conv3_block2_out[0][0]',       
                                                                  'conv3_block3_2_bn[0][0]']      
                                                                                                  
 conv3_block3_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block3_add[0][0]']       
                                                                                                  
 conv3_block4_1_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block3_out[0][0]']       
                                                                                                  
 conv3_block4_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block4_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block4_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block4_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block4_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block4_1_relu[0][0]']    
                                                                                                  
 conv3_block4_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block4_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block4_add (Add)         (None, 28, 28, 128)  0           ['conv3_block3_out[0][0]',       
                                                                  'conv3_block4_2_bn[0][0]']      
                                                                                                  
 conv3_block4_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block4_add[0][0]']       
                                                                                                  
 conv4_block1_1_conv (Conv2D)   (None, 14, 14, 256)  295168      ['conv3_block4_out[0][0]']       
                                                                                                  
 conv4_block1_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block1_0_conv (Conv2D)   (None, 14, 14, 256)  33024       ['conv3_block4_out[0][0]']       
                                                                                                  
 conv4_block1_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block1_1_relu[0][0]']    
                                                                                                  
 conv4_block1_0_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_add (Add)         (None, 14, 14, 256)  0           ['conv4_block1_0_bn[0][0]',      
                                                                  'conv4_block1_2_bn[0][0]']      
                                                                                                  
 conv4_block1_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block1_add[0][0]']       
                                                                                                  
 conv4_block2_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block1_out[0][0]']       
                                                                                                  
 conv4_block2_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block2_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block2_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block2_1_relu[0][0]']    
                                                                                                  
 conv4_block2_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block2_add (Add)         (None, 14, 14, 256)  0           ['conv4_block1_out[0][0]',       
                                                                  'conv4_block2_2_bn[0][0]']      
                                                                                                  
 conv4_block2_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block2_add[0][0]']       
                                                                                                  
 conv4_block3_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block2_out[0][0]']       
                                                                                                  
 conv4_block3_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block3_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block3_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block3_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block3_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block3_1_relu[0][0]']    
                                                                                                  
 conv4_block3_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block3_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block3_add (Add)         (None, 14, 14, 256)  0           ['conv4_block2_out[0][0]',       
                                                                  'conv4_block3_2_bn[0][0]']      
                                                                                                  
 conv4_block3_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block3_add[0][0]']       
                                                                                                  
 conv4_block4_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block3_out[0][0]']       
                                                                                                  
 conv4_block4_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block4_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block4_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block4_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block4_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block4_1_relu[0][0]']    
                                                                                                  
 conv4_block4_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block4_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block4_add (Add)         (None, 14, 14, 256)  0           ['conv4_block3_out[0][0]',       
                                                                  'conv4_block4_2_bn[0][0]']      
                                                                                                  
 conv4_block4_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block4_add[0][0]']       
                                                                                                  
 conv4_block5_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block4_out[0][0]']       
                                                                                                  
 conv4_block5_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block5_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block5_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block5_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block5_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block5_1_relu[0][0]']    
                                                                                                  
 conv4_block5_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block5_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block5_add (Add)         (None, 14, 14, 256)  0           ['conv4_block4_out[0][0]',       
                                                                  'conv4_block5_2_bn[0][0]']      
                                                                                                  
 conv4_block5_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block5_add[0][0]']       
                                                                                                  
 conv4_block6_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block5_out[0][0]']       
                                                                                                  
 conv4_block6_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block6_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block6_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block6_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block6_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block6_1_relu[0][0]']    
                                                                                                  
 conv4_block6_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block6_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block6_add (Add)         (None, 14, 14, 256)  0           ['conv4_block5_out[0][0]',       
                                                                  'conv4_block6_2_bn[0][0]']      
                                                                                                  
 conv4_block6_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block6_add[0][0]']       
                                                                                                  
 conv5_block1_1_conv (Conv2D)   (None, 7, 7, 512)    1180160     ['conv4_block6_out[0][0]']       
                                                                                                  
 conv5_block1_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block1_0_conv (Conv2D)   (None, 7, 7, 512)    131584      ['conv4_block6_out[0][0]']       
                                                                                                  
 conv5_block1_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block1_1_relu[0][0]']    
                                                                                                  
 conv5_block1_0_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_add (Add)         (None, 7, 7, 512)    0           ['conv5_block1_0_bn[0][0]',      
                                                                  'conv5_block1_2_bn[0][0]']      
                                                                                                  
 conv5_block1_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block1_add[0][0]']       
                                                                                                  
 conv5_block2_1_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block1_out[0][0]']       
                                                                                                  
 conv5_block2_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block2_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block2_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block2_1_relu[0][0]']    
                                                                                                  
 conv5_block2_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block2_add (Add)         (None, 7, 7, 512)    0           ['conv5_block1_out[0][0]',       
                                                                  'conv5_block2_2_bn[0][0]']      
                                                                                                  
 conv5_block2_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block2_add[0][0]']       
                                                                                                  
 conv5_block3_1_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block2_out[0][0]']       
                                                                                                  
 conv5_block3_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block3_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block3_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block3_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block3_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block3_1_relu[0][0]']    
                                                                                                  
 conv5_block3_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block3_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block3_add (Add)         (None, 7, 7, 512)    0           ['conv5_block2_out[0][0]',       
                                                                  'conv5_block3_2_bn[0][0]']      
                                                                                                  
 conv5_block3_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block3_add[0][0]']       
                                                                                                  
 avg_pool (GlobalAveragePooling  (None, 512)         0           ['conv5_block3_out[0][0]']       
 2D)                                                                                              
                                                                                                  
 predictions (Dense)            (None, 1000)         513000      ['avg_pool[0][0]']               
                                                                                                  
==================================================================================================
Total params: 21,827,624
Trainable params: 21,810,472
Non-trainable params: 17,152
__________________________________________________________________________________________________

It turns out that they do not match PyTorch's numbers which is something I do not understand.
For info, the same happens for ResNet50 (already implemented), and you can see that in the following colab: https://colab.research.google.com/drive/1RCmWkpwuKFapzzPacbqodxz0mqt9Igft?usp=sharing

This appears to be due to the fact that there are bias in TF's convs, and not in PyTorch's ones, and also due to how PyTorch counts BN's params.

However, the last dimension before the dense layer matches, and the size (WH) of the feature maps matches as well.

@zaccharieramzi
Copy link
Author

zaccharieramzi commented Apr 6, 2022

So 2 things w.r.t. to the comparison with PyTorch:

  • indeed the only difference in the trainable parameter count is the use of bias in Keras. Imo, there shouldn't be any bias in the convolutions given we have affine BatchNorm just afterwards. Maybe having an option allowing to use it or not would be nice, I am going to implement it.
  • the batch norm in PyTorch indeed doesn't count the running stats as parameters but as buffers.

Side note: the default momentum values for the batch norm in Keras and PyTorch are not the same: 0.9 for PyTorch and 0.99 in Keras. This, coupled with the use of bias in TF will mean that the training will be different between the 2 frameworks.

I think it would be nice to implement the possibility to change the batch norm momentum to fit PyTorch's one, I am going to open a new issue and a new PR about this.

@zaccharieramzi zaccharieramzi marked this pull request as ready for review April 6, 2022 16:18
@qlzh727
Copy link
Member

qlzh727 commented Apr 6, 2022

Thanks for the PR. Could u make the sure the weights for imagenet also available? Also please make sure to run the evaluation with imagenet eval set, and report the acc number in the PR.

@zaccharieramzi
Copy link
Author

@qlzh727 should I train the models also for the no bias case?

Also, could you point me to the script that were used to train the bigger models? I couldn't find them but maybe didn't look well enough

@gbaned gbaned requested review from qlzh727 and removed request for qlzh727 April 7, 2022 11:09
@google-ml-butler google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Apr 7, 2022
@zaccharieramzi
Copy link
Author

@qlzh727 I was looking for an official script to train a classification model on imagenet, and stumbled upon this: https://github.com/tensorflow/models

There is a typical example allowing to train classification models, but I also noticed that there is already an implementation of ResNet without the bias and with the basic blocks here. I don't think the weights are available, but now my question is more: should we re-implement it here given it's already present in this other repo?

Basically, is there a difference in concern between keras applications and tensorflow models?

@zaccharieramzi
Copy link
Author

zaccharieramzi commented Apr 7, 2022

I just noticed that one additional difference with the PyTorch implementation (in both keras applications and tensorflow models) is the initialization strategy for the convolution weights.

Framework Init strategy
PyTorch He normal, nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
Keras Glorot uniform, default of Conv2D
TensorFlow Variance Scaling, at least by default

@qlzh727
Copy link
Member

qlzh727 commented Apr 7, 2022

@qlzh727 should I train the models also for the no bias case?

Also, could you point me to the script that were used to train the bigger models? I couldn't find them but maybe didn't look well enough

We currently don't have any script for retrain the model. Keras application was used for fine tuning and we usually reuse weights/checkpoints from original paper (if it was published).

@qlzh727
Copy link
Member

qlzh727 commented Apr 7, 2022

@qlzh727 I was looking for an official script to train a classification model on imagenet, and stumbled upon this: https://github.com/tensorflow/models

There is a typical example allowing to train classification models, but I also noticed that there is already an implementation of ResNet without the bias and with the basic blocks here. I don't think the weights are available, but now my question is more: should we re-implement it here given it's already present in this other repo?

Basically, is there a difference in concern between keras applications and tensorflow models?

tensorflow-models is more focused on end to end solutions, and if that's already available in tf-models, we probably can skip it here in keras.application (given that you can't get any existing weigths).

@divyashreepathihalli divyashreepathihalli removed the keras-team-review-pending Pending review by a Keras team member. label Apr 7, 2022
@zaccharieramzi
Copy link
Author

zaccharieramzi commented Apr 7, 2022

Well the original paper did train both resnet 18 and 34, but not sure in which framework or even whether the weights are available.
Do you know where you obtained the resnet 50 weights ?

Another solution would be to translate the ones from PyTorch, potentially forcing the bias to 0 for the original implementations with bias. Wdyt?

EDIT

One last thing is that if we do not include the resnet 18 and 34 here, it might still be nice to have a pointer to tensorflow/models, in order for people looking for an implementation to find it easily (this is not the case rn, see keras-team/keras-applications#151)

@qlzh727 qlzh727 requested a review from fchollet May 10, 2022 20:30
@google-ml-butler google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label May 10, 2022
@zaccharieramzi
Copy link
Author

@qlzh727 Indeed since I am porting from PyTorch I needed to use their preprocessing.
I was not able to find the weights of the resnet34 in caffe, and the resnet18 weights appear to be only available here.

Here are my tentative answers:

  • Since anyway we wanted to retrain the models (cf this comment), it's only going to be a temporary issue. We can simply document it well, in particular in the model and preprocessing docs. There could by the way be a tf.keras.applications.resnet18.preprocess_input similarly to what exists for resnet50.
  • In the current state we could do the correction of preprocessing in the model, before retraining.

If however, you have at your disposal the caffe weights for both models (and by any chance the script to port them), I can definitely do the porting, and checks.

@qlzh727 qlzh727 removed the keras-team-review-pending Pending review by a Keras team member. label May 12, 2022
@zaccharieramzi
Copy link
Author

I just found out something about the way torch applies batch norm at eval time that might explain the difference in accuracy I noticed here.

You can read about it here.

@KaleabTessera
Copy link

Any progress on this? This would be really great to have!

@gbaned
Copy link
Collaborator

gbaned commented Jul 6, 2022

@zaccharieramzi Can you please resolve conflicts? Thank you!

@zaccharieramzi
Copy link
Author

@gbaned should be done

@gbaned gbaned requested review from qlzh727 and removed request for qlzh727 August 5, 2022 07:39
@google-ml-butler google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Aug 5, 2022
@qlzh727
Copy link
Member

qlzh727 commented Aug 8, 2022

Sorry for the long wait, since end user could easily miss the preprocess API with pytorch format, how about we include the preprocess as part of the model, and control it via a include_preprocessing flag on the model. We have take this approach for several other models in the applications.

@LukeWood
Copy link
Contributor

Sorry for the long wait, since end user could easily miss the preprocess API with pytorch format, how about we include the preprocess as part of the model, and control it via a include_preprocessing flag on the model. We have take this approach for several other models in the applications.

Due to the fact that the model requires a different preprocessing for inputs in the inputs between the ResNet18/34 and the other ResNets, we would probably need to re-train these weights. Let's migrate this to a PR on keras-cv. Please send a pull request to KerasCV, and place the model in the models package:

https://github.com/keras-team/keras-cv/tree/master/keras_cv/models

from there, we can retrain the models

@zaccharieramzi
Copy link
Author

@LukeWood sure, opening this PR keras-team/keras-cv#805

@fchollet
Copy link
Member

@LukeWood sure, opening this PR keras-team/keras-cv#805

Thank you. Let's move to the discussion to the KerasCV PR.

@fchollet fchollet closed this Sep 22, 2022
@zaccharieramzi
Copy link
Author

Just mentioning for those following the conversation that the corresponding PR in keras-cv has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keras-team-review-pending Pending review by a Keras team member. size:M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants