Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probably mistaken implementation of RKD methond #17

Open
wisdom0530 opened this issue Nov 13, 2019 · 3 comments
Open

probably mistaken implementation of RKD methond #17

wisdom0530 opened this issue Nov 13, 2019 · 3 comments

Comments

@wisdom0530
Copy link

Hi!
I am very grateful for your code. It helps me a lot.
I have 2 questions:
(1) In the paper of RKD method, the authors say: "RKD-D and RKD-A are applied
on the last pooling layer of the teacher and the student" in the Image classification section. However, in the code you provide, the RKD method is applied to "logits". I think it maybe a mistake.
(2) what's the Tensorflow version used in these codes. could you please add a "requirement" file to this project?

@sseung0703
Copy link
Owner

sseung0703 commented Nov 13, 2019

Thank you for your comments.
(1) I checked paper again and I found what you say. You're right it is my mistake. I just confused due to this line : "We apply RKD-D and RKD-A on the final embedding outputs of the teacher and the student." in page 5. I'll correct this error and update the experiment results.
(2) I use TF.1.13 for all the experiments. Your suggestion is useful for my repo. I'll submit a "requirement "soon

@sseung0703

This comment has been minimized.

@sseung0703
Copy link
Owner

I checked the paper carefully, and I found that

"RKD-D and RKD-A are applied on the last pooling layer of the teacher and the student, as they produce the final embedding before classification."
and
"As the prototypical networks build on shallow networks that consist of only 4 convolutional layers, we use the same architecture for the student model and the teacher, i.e., self-distillation, rather than using a smaller student network. We apply RKD, FitNet, and Attention on the final embedding output of the teacher and the student."

It is very interesting. Because only difference is number of data, but it makes the authors change inputs of their algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants