Feature selection #4

valbarriere · 2019-02-07T19:58:43Z

Hi Paul,
Thanks for sharing the code.
I have a question about the feature selection, which is not mentioned in your paper.
Since we don't have the file /media/bighdd5/Paul/mosi/fs_mask.pkl, could you tell us which parameters work the best on that dataset and how did you obtained them ?
Cheers,
Valentin

ghost · 2019-02-08T23:53:41Z

@valbarriere the feature selection was done in a previous paper:
Multimodal sentiment analysis with word-level fusion and reinforcement learning
This is only done for CMU-MOSI.

And here are the values (first for covarep and then facet):
[[1, 3, 6, 25, 60], [0, 2, 5, 10, 11, 12, 14, 17, 20, 21, 22, 24, 25, 29, 30, 31, 32, 36, 37, 40]]

valbarriere · 2019-02-09T18:07:01Z

Ok thanks! I just saw you already linked the ICMI paper in an other SDK issue yesterday.

Since I'm here, did you also use padding on the POM dataset (for MOSI the length of all the sequence is 20) ? I couldn't find any information about that on the paper. I'm trying to replicate the results in order to compare my model with the MFN on POM.

ghost · 2019-02-09T19:11:19Z

We actually did. You can get the exact POM data from here: http://immortal.multicomp.cs.cmu.edu/raw_datasets/old_processed_data/pom/data/

We actually calculate the expected audio and visual and verbal contexts based on sentence (average word embeddings per sentence) as LSTMs are not good with long sequences. This and ICT-MMMO are the only ones we do this. I think the data is already in this format.

valbarriere · 2019-02-11T10:03:50Z

Great, thanks! I also started running the experiments on ICT-MMMO, MOUD and YOUTUBE. But I think it would be better with the new configurations used in "Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities". The results of the MFN are really different in this article (passing from 87,5 to 73,8 for ICT). Do you also have an easy access to them ?

Finally, for the POM dataset there is 17 labels per video, can you tell me where to find the name of the labels associated with each of the 17 columns ?

Thanks again !

ghost · 2019-02-11T16:06:24Z

I am actually not an author on that paper so I don't know how the experiments were done. Let me include @pliang279 to the chain also. Paul can probably answer for the name of the labels as well.

valbarriere · 2019-02-14T09:22:33Z

Ok thanks I'm waiting for Paul answer. Should I send him an email ?

ghost · 2019-02-14T15:58:25Z

@valbarriere I think that would be a good idea.

valbarriere · 2019-02-15T05:59:31Z

OK, I just sent @pliang279 an email. I will summarize here what will come out from the discussion as soon as I have answers

pliang279 · 2019-02-15T06:06:14Z

Hey @valbarriere, I just saw your email. Here are some answers:

Yes, the MMMO dataset (and Youtube dataset) changed during the course of 2018 since we changed our video and audio feature extractor versions as well as their sampling rates. in subsequent papers, all models were retrained on these new versions of the datasets. I will upload these new datasets now.
Here are the names of labels:

0 confident
1 passionate
2 voice pleasant
3 dominant
4 credible
5 vivid
6 expertise
7 entertaining
8 reserved
9 trusting
10 relaxed
11 outgoing
12 thorough
13 nervous
14 sentiment
15 persuasive
16 humerous

we did not report results on index 14 sentiment since we ran the model on 3 other sentiment analysis datasets.

The hyperparameters for POM are different from those for MOSI.

valbarriere · 2019-02-16T14:49:00Z

Thanks for the details @pliang279 ! I still have 2 questions :

Where can I find the uploaded version of the datasets ?
Can you tell me the hyperparameters grid you use in order to reproduce your results on POM, like for MOSI ?

valbarriere · 2019-02-19T21:31:49Z

Hi @A2Zadeh, @pliang279, just to be sure : I know the hyperparameters should not be the same for the best models on POM and MOSI, I'm talking about the grid used to search the best hyperparameters.

I try to replicate the MFN results on the POM dataset but cannot reach your performances (I stop after 100 runs, I thinks it's fair...). Did you use the same hyperparameter grid on the POM dataset than the one used on MOSI (here below) ? I cannot reproduce the article's results with this grid...

	hl = random.choice([32,64,88,128,156,256])
	ha = random.choice([8,16,32,48,64,80])
	hv = random.choice([8,16,32,48,64,80])
	config["h_dims"] = [hl,ha,hv]
	config["memsize"] = random.choice([64,128,256,300,400])
	config["windowsize"] = 2
	config["batchsize"] = random.choice([32,64,128,256])
	config["num_epochs"] = 50
	config["lr"] = random.choice([0.001,0.002,0.005,0.008,0.01])
	config["momentum"] = random.choice([0.1,0.3,0.5,0.6,0.8,0.9])
	NN1Config = dict()
	NN1Config["shapes"] = random.choice([32,64,128,256])
	NN1Config["drop"] = random.choice([0.0,0.2,0.5,0.7])
	NN2Config = dict()
	NN2Config["shapes"] = random.choice([32,64,128,256])
	NN2Config["drop"] = random.choice([0.0,0.2,0.5,0.7])
	gamma1Config = dict()
	gamma1Config["shapes"] = random.choice([32,64,128,256])
	gamma1Config["drop"] = random.choice([0.0,0.2,0.5,0.7])
	gamma2Config = dict()
	gamma2Config["shapes"] = random.choice([32,64,128,256])
	gamma2Config["drop"] = random.choice([0.0,0.2,0.5,0.7])
	outConfig = dict()
	outConfig["shapes"] = random.choice([32,64,128,256])
	outConfig["drop"] = random.choice([0.0,0.2,0.5,0.7])

ghost · 2019-02-24T14:59:05Z

@valbarriere that is strange. Do you let your models train for a large number of epochs? Do you use Adam? How close do you get to the paper results?

valbarriere · 2019-02-25T17:07:07Z

Stop after 30 epochs (I saw that after 30 epochs it generally does not improve), Adam, 100 runs on the grid search.

I just started a new test on the first column putting at 50 epochs the stopping criterion, and couldn't obtain better results (even worse than before : 1.021 for the best model)

The best mae I got, for the 10 first columns :
1.001 instead of 0.952
1.015 instead of 0.993
0.892 instead of 0.882
0.876 instead of 0.835
0.986 instead of 0.903
0.959 instead of 0.908
0.918 instead of 0.886
0.948 instead of 0.913
0.848 instead of 0.821
0.528 instead of 0.521
0.575 instead of 0.566

Maybe it is the number of runs... How many runs did you try before obtaining the best results for each of the columns ?

ghost · 2019-02-25T20:29:42Z

Well, we definitely do a lot of runs on the validation set. However, we also do multitask learning, which we output all the values at the same time as opposed to just one value at a time. Helps a bit with the performance. I think 50 epochs is also too low, we were doing around 2500 and picked the best validation one. Hope this helps. Keep us in the loop of how the experiments go.

valbarriere · 2019-02-26T16:19:34Z

Ok, thanks for the informations.

In order to summarize : multitask learning over the different traits, each model ran for 2500 epochs (50 times more than for MOSI where you stopped at 50 epochs, that seems a lot) and you took the best on the validation set. You did that “definitely a lot of times” regarding different hyper parameters values.

Since you did multi-task learning, is it one only model that can reach the best performances for all the speaker traits or are there several best models learned in a multitask fashioned way (one per trait for example) ?

I keep you in touch about the results. Thanks again

ghost · 2019-02-26T16:31:39Z

@valbarriere great. Yes we pick the best for each trait, there is no single model that does best. In a way, we use other POM labels to help with the training (the other POM labels are not inputs to the model but outputs). Goes without saying that baselines in our tables also do the same for training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature selection #4

Feature selection #4

valbarriere commented Feb 7, 2019

ghost commented Feb 8, 2019 •

edited by ghost

Loading

valbarriere commented Feb 9, 2019

ghost commented Feb 9, 2019

valbarriere commented Feb 11, 2019

ghost commented Feb 11, 2019

valbarriere commented Feb 14, 2019

ghost commented Feb 14, 2019

valbarriere commented Feb 15, 2019

pliang279 commented Feb 15, 2019 •

edited

Loading

valbarriere commented Feb 16, 2019

valbarriere commented Feb 19, 2019

ghost commented Feb 24, 2019

valbarriere commented Feb 25, 2019

ghost commented Feb 25, 2019

valbarriere commented Feb 26, 2019

ghost commented Feb 26, 2019 •

edited by ghost

Loading

Feature selection #4

Feature selection #4

Comments

valbarriere commented Feb 7, 2019

ghost commented Feb 8, 2019 • edited by ghost Loading

valbarriere commented Feb 9, 2019

ghost commented Feb 9, 2019

valbarriere commented Feb 11, 2019

ghost commented Feb 11, 2019

valbarriere commented Feb 14, 2019

ghost commented Feb 14, 2019

valbarriere commented Feb 15, 2019

pliang279 commented Feb 15, 2019 • edited Loading

valbarriere commented Feb 16, 2019

valbarriere commented Feb 19, 2019

ghost commented Feb 24, 2019

valbarriere commented Feb 25, 2019

ghost commented Feb 25, 2019

valbarriere commented Feb 26, 2019

ghost commented Feb 26, 2019 • edited by ghost Loading

ghost commented Feb 8, 2019 •

edited by ghost

Loading

pliang279 commented Feb 15, 2019 •

edited

Loading

ghost commented Feb 26, 2019 •

edited by ghost

Loading