Skip to content

Latest commit

 

History

History
73 lines (57 loc) · 5.62 KB

File metadata and controls

73 lines (57 loc) · 5.62 KB

Knowledge Distillation Methods with Tensorflow

Knowledge distillation is the method to enhance student network by teacher knowledge. So annually knowledge distillation methods have been proposed but each paper's do experiments with different networks and compare with different methods. And each method is implemented by each author, so if a new researcher wants to study knowledge distillation, they have to find or implement all of the methods. Surely it is very hard work. To reduce this burden, I publish some code that is modified from my research codes. I'll update the code and knowledge distillation algorithm, and all of the things will be implemented by Tensorflow.

If you want something a new method, please notice to me :)

Implemented Knowledge Distillation Methods

below methods are implemented and base on insight with TAKD, I make each category. I think they are meaningful categories, but if you think it has problems please notice for me :)

Response-based Knowledge

Defined knowledge by the neural response of the hidden layer or the output layer of the network

Multi-connection Knowledge

Increases knowledge by sensing several points of the teacher network

Shared-representation Knowledge

Defined knowledge by the relation between two feature maps

Relational Knowledge

Defined knowledge by intra-data relation

Experimental Results

The below table and plot are sample results using ResNet.

I use the same hyper-parameter for training each network, and only tune hyper-parameter of each distillation algorithm. But the results may be not optimal. All of the numerical values and plots are averages of five trials.

Network architecture

The teacher network is ResNet32 and Student is ResNet8, and the student network is well-converged (not over and under-fit) for evaluating each distillation algorithm performance precisely. Note that implemented ResNet has doubled depth.

Training/Validation plots

Methods Last Accuracy Best Accuracy
Student 71.76 71.92
Teacher 78.96 79.08
Soft-logits 71.79 72.08
FitNet 72.74 72.96
AT 72.31 72.60
FSP 72.65 72.91
DML 73.27 73.47
KD-SVD 73.68 73.78
AB 73.08 73.41
RKD 73.40 73.48

Plan to do

  • Implement the Zeros-shot knowledge distillation (implemented in other repo. and the merging process is ongoing)