can deepFM use sparse data format? #10

sddi · 2017-12-26T08:09:51Z

I try using deepFM.py with sparse data a8a.train, and its format likes "label index:value index:value..." .
I see in S1_4.txt, if some value is 0 it is also in the feature line, but in a8a.train it is not.
I run python deepFM.py, I got "Input to reshape is a tensor with 5528 values, but the requested shape requires a multiple of 672"
I don't know if the code not supports the format?

Leavingseason · 2017-12-28T11:55:14Z

hi sddi,
deepFM reads sparse data as input, but notice that each instance must have exactly the same number of features, which is the "field number" in the paper. So if a field is empty or missing, you should append a zero fake value for it.

sddi · 2018-01-05T07:54:56Z

hi @Leavingseason , thank you for answering. I have millions of features, if i append a zero fake value , the input file maybe very large, could you update the code support the input format likes libsvm format(index:value, if value is zero, omit it in the input file )?

Leavingseason · 2018-01-08T02:36:57Z

hi sddi,
How many fields of feature (not the number of feature) do you have? Actually we do not request for every feature append a zero value, instead for each field, if there is no feature under it, we will append a zero fake value. The deepFM model use field-wise dense embedding as the input for deep neural network, so the number of fields can not be too large.

sddi · 2018-01-08T06:59:21Z

oh~~ @Leavingseason i see~~~
for instance, i have two fieds of feature, userID features(index from 0 to 100), itemID features(index from 101 to 1000). for one sample, in the input file maybe, "1 36:1 108:1 123:1 365:1", is it ok?

Leavingseason · 2018-01-09T03:52:02Z

That's partially right. Now my code only supports at most one feature for each field, which follows the original paper's framework. So for itemID features, you can only keep one itemID. I know you concerns, in the real world, multiple features under one field happens a lot. We have the corresponding version of code to handle this case, which leverages sparse embedding lookup https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup_sparse, and the input format becomes fieldID:featureID:value. We will consider to release this version.

sddi · 2018-01-10T06:33:31Z

OK，thank you very much! I am waiting for your new version~~~:D

waitingc · 2018-01-31T09:07:57Z

Have the version which supports "multiple features under one field" released ? Thanks

Leavingseason · 2018-02-01T05:34:10Z

Not yet. All right, since some people are interested in this version, I will release a preview code which is now very ugly. I will try to find some time in two days (it is so sad that KDD deadline is near...)

Leavingseason · 2018-02-02T05:19:21Z

Done.

CheungZeeCn · 2018-05-10T11:15:56Z

@Leavingseason hello, 请教一个格式上的问题，fieldID:featureID:value 这里，如果fieldID==1 对应的featureID 有3个，如果fieldID==2对应的featureID 有2个，fieldID==2的 featureID 的值的编码需要基于 fieldID==1 的featureID 上吗？ for example:

0 1:1:1 1:2:1 1:3:1 2:1:1 #这里fieldID==2的 featureID 可以重新编码
0 1:1:1 1:2:1 1:3:1 2:4:1 #这里fieldID==2的 featureID 不可以重新编码,需要基于原来的1之上，谢谢！

sddi changed the title ~~can deepFM using sparse data format?~~ can deepFM use sparse data format? Dec 28, 2017

qingaidexin mentioned this issue Mar 29, 2018

S1_4_and_S5.zip数据含义 #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can deepFM use sparse data format? #10

can deepFM use sparse data format? #10

sddi commented Dec 26, 2017

Leavingseason commented Dec 28, 2017

sddi commented Jan 5, 2018 •

edited

Loading

Leavingseason commented Jan 8, 2018

sddi commented Jan 8, 2018

Leavingseason commented Jan 9, 2018

sddi commented Jan 10, 2018

waitingc commented Jan 31, 2018

Leavingseason commented Feb 1, 2018

Leavingseason commented Feb 2, 2018

CheungZeeCn commented May 10, 2018 •

edited

Loading

can deepFM use sparse data format? #10

can deepFM use sparse data format? #10

Comments

sddi commented Dec 26, 2017

Leavingseason commented Dec 28, 2017

sddi commented Jan 5, 2018 • edited Loading

Leavingseason commented Jan 8, 2018

sddi commented Jan 8, 2018

Leavingseason commented Jan 9, 2018

sddi commented Jan 10, 2018

waitingc commented Jan 31, 2018

Leavingseason commented Feb 1, 2018

Leavingseason commented Feb 2, 2018

CheungZeeCn commented May 10, 2018 • edited Loading

sddi commented Jan 5, 2018 •

edited

Loading

CheungZeeCn commented May 10, 2018 •

edited

Loading