Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature selection #4

Open
valbarriere opened this issue Feb 7, 2019 · 16 comments
Open

Feature selection #4

valbarriere opened this issue Feb 7, 2019 · 16 comments

Comments

@valbarriere
Copy link

Hi Paul,
Thanks for sharing the code.
I have a question about the feature selection, which is not mentioned in your paper.
Since we don't have the file /media/bighdd5/Paul/mosi/fs_mask.pkl, could you tell us which parameters work the best on that dataset and how did you obtained them ?
Cheers,
Valentin

@ghost
Copy link

ghost commented Feb 8, 2019

@valbarriere the feature selection was done in a previous paper:
Multimodal sentiment analysis with word-level fusion and reinforcement learning
This is only done for CMU-MOSI.

And here are the values (first for covarep and then facet):
[[1, 3, 6, 25, 60], [0, 2, 5, 10, 11, 12, 14, 17, 20, 21, 22, 24, 25, 29, 30, 31, 32, 36, 37, 40]]

@valbarriere
Copy link
Author

Ok thanks! I just saw you already linked the ICMI paper in an other SDK issue yesterday.

Since I'm here, did you also use padding on the POM dataset (for MOSI the length of all the sequence is 20) ? I couldn't find any information about that on the paper. I'm trying to replicate the results in order to compare my model with the MFN on POM.

@ghost
Copy link

ghost commented Feb 9, 2019

We actually did. You can get the exact POM data from here: http://immortal.multicomp.cs.cmu.edu/raw_datasets/old_processed_data/pom/data/

We actually calculate the expected audio and visual and verbal contexts based on sentence (average word embeddings per sentence) as LSTMs are not good with long sequences. This and ICT-MMMO are the only ones we do this. I think the data is already in this format.

@valbarriere
Copy link
Author

Great, thanks! I also started running the experiments on ICT-MMMO, MOUD and YOUTUBE. But I think it would be better with the new configurations used in "Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities". The results of the MFN are really different in this article (passing from 87,5 to 73,8 for ICT). Do you also have an easy access to them ?

Finally, for the POM dataset there is 17 labels per video, can you tell me where to find the name of the labels associated with each of the 17 columns ?

Thanks again !

@ghost
Copy link

ghost commented Feb 11, 2019

I am actually not an author on that paper so I don't know how the experiments were done. Let me include @pliang279 to the chain also. Paul can probably answer for the name of the labels as well.

@valbarriere
Copy link
Author

Ok thanks I'm waiting for Paul answer. Should I send him an email ?

@ghost
Copy link

ghost commented Feb 14, 2019

@valbarriere I think that would be a good idea.

@valbarriere
Copy link
Author

OK, I just sent @pliang279 an email. I will summarize here what will come out from the discussion as soon as I have answers

@pliang279
Copy link
Owner

pliang279 commented Feb 15, 2019

Hey @valbarriere, I just saw your email. Here are some answers:

  1. Yes, the MMMO dataset (and Youtube dataset) changed during the course of 2018 since we changed our video and audio feature extractor versions as well as their sampling rates. in subsequent papers, all models were retrained on these new versions of the datasets. I will upload these new datasets now.

  2. Here are the names of labels:

0 confident
1 passionate
2 voice pleasant
3 dominant
4 credible
5 vivid
6 expertise
7 entertaining
8 reserved
9 trusting
10 relaxed
11 outgoing
12 thorough
13 nervous
14 sentiment
15 persuasive
16 humerous

we did not report results on index 14 sentiment since we ran the model on 3 other sentiment analysis datasets.

  1. The hyperparameters for POM are different from those for MOSI.

@valbarriere
Copy link
Author

Thanks for the details @pliang279 ! I still have 2 questions :

  1. Where can I find the uploaded version of the datasets ?

  2. Can you tell me the hyperparameters grid you use in order to reproduce your results on POM, like for MOSI ?

@valbarriere
Copy link
Author

Hi @A2Zadeh, @pliang279, just to be sure : I know the hyperparameters should not be the same for the best models on POM and MOSI, I'm talking about the grid used to search the best hyperparameters.

I try to replicate the MFN results on the POM dataset but cannot reach your performances (I stop after 100 runs, I thinks it's fair...). Did you use the same hyperparameter grid on the POM dataset than the one used on MOSI (here below) ? I cannot reproduce the article's results with this grid...

	hl = random.choice([32,64,88,128,156,256])
	ha = random.choice([8,16,32,48,64,80])
	hv = random.choice([8,16,32,48,64,80])
	config["h_dims"] = [hl,ha,hv]
	config["memsize"] = random.choice([64,128,256,300,400])
	config["windowsize"] = 2
	config["batchsize"] = random.choice([32,64,128,256])
	config["num_epochs"] = 50
	config["lr"] = random.choice([0.001,0.002,0.005,0.008,0.01])
	config["momentum"] = random.choice([0.1,0.3,0.5,0.6,0.8,0.9])
	NN1Config = dict()
	NN1Config["shapes"] = random.choice([32,64,128,256])
	NN1Config["drop"] = random.choice([0.0,0.2,0.5,0.7])
	NN2Config = dict()
	NN2Config["shapes"] = random.choice([32,64,128,256])
	NN2Config["drop"] = random.choice([0.0,0.2,0.5,0.7])
	gamma1Config = dict()
	gamma1Config["shapes"] = random.choice([32,64,128,256])
	gamma1Config["drop"] = random.choice([0.0,0.2,0.5,0.7])
	gamma2Config = dict()
	gamma2Config["shapes"] = random.choice([32,64,128,256])
	gamma2Config["drop"] = random.choice([0.0,0.2,0.5,0.7])
	outConfig = dict()
	outConfig["shapes"] = random.choice([32,64,128,256])
	outConfig["drop"] = random.choice([0.0,0.2,0.5,0.7])

@ghost
Copy link

ghost commented Feb 24, 2019

@valbarriere that is strange. Do you let your models train for a large number of epochs? Do you use Adam? How close do you get to the paper results?

@valbarriere
Copy link
Author

Stop after 30 epochs (I saw that after 30 epochs it generally does not improve), Adam, 100 runs on the grid search.

I just started a new test on the first column putting at 50 epochs the stopping criterion, and couldn't obtain better results (even worse than before : 1.021 for the best model)

The best mae I got, for the 10 first columns :
1.001 instead of 0.952
1.015 instead of 0.993
0.892 instead of 0.882
0.876 instead of 0.835
0.986 instead of 0.903
0.959 instead of 0.908
0.918 instead of 0.886
0.948 instead of 0.913
0.848 instead of 0.821
0.528 instead of 0.521
0.575 instead of 0.566

Maybe it is the number of runs... How many runs did you try before obtaining the best results for each of the columns ?

@ghost
Copy link

ghost commented Feb 25, 2019

Well, we definitely do a lot of runs on the validation set. However, we also do multitask learning, which we output all the values at the same time as opposed to just one value at a time. Helps a bit with the performance. I think 50 epochs is also too low, we were doing around 2500 and picked the best validation one. Hope this helps. Keep us in the loop of how the experiments go.

@valbarriere
Copy link
Author

Ok, thanks for the informations.

In order to summarize : multitask learning over the different traits, each model ran for 2500 epochs (50 times more than for MOSI where you stopped at 50 epochs, that seems a lot) and you took the best on the validation set. You did that “definitely a lot of times” regarding different hyper parameters values.

Since you did multi-task learning, is it one only model that can reach the best performances for all the speaker traits or are there several best models learned in a multitask fashioned way (one per trait for example) ?

I keep you in touch about the results. Thanks again

@ghost
Copy link

ghost commented Feb 26, 2019

@valbarriere great. Yes we pick the best for each trait, there is no single model that does best. In a way, we use other POM labels to help with the training (the other POM labels are not inputs to the model but outputs). Goes without saying that baselines in our tables also do the same for training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants