Debias #23

pepsi2222 · 2022-09-14T09:17:33Z

[feat&fix] add ExpoMF, PDA, DICE

XuHwang · 2022-09-15T06:19:05Z

recstudio/model/mf/dice.py

+    def _get_query_encoder(self, train_data):
+        int = torch.nn.Embedding(train_data.num_users, self.embed_dim, padding_idx=0)
+        pop = torch.nn.Embedding(train_data.num_users, self.embed_dim, padding_idx=0)
+        class DICEQueryEncoder(torch.nn.Module):


The int and pop in query encoder could be defined as one Embedding, with dimension 2*self.embed_dim? @pepsi2222

Suggested change

class DICEQueryEncoder(torch.nn.Module):

torch.nn.Embedding(train_data.num_users, 2*self.embed_dim, padding_idx=0)

XuHwang · 2022-09-15T06:19:49Z

recstudio/model/mf/dice.py

+                self.pop = pop
+            def forward(self, batch):
+                return torch.cat((self.int(batch), self.pop(batch)), dim=-1)
+        return DICEItemEncoder(int, pop)


Similarly as comment in query encoder above.

XuHwang

Nice job! But some changes should be token according to the comments.

XuHwang · 2022-09-15T06:31:26Z

recstudio/model/mf/dice.py

+        return output
+
+    def _get_sampler(self, train_data):
+        class PopularSamplerWithMargin(Sampler):


To be discussed. I'm confusing about what the pool means and why negative items are sampled only in pop or unpop items when their size are smaller than pool. @pepsi2222

XuHwang · 2022-09-15T06:35:36Z

recstudio/model/mf/dice.py

+        output['mask'] = mask
+        output['score'] = {'pos_int_score': pos_int_score, 'pos_pop_score': pos_pop_score, 'pos_click_score': pos_click_score,
+                           'neg_int_score': neg_int_score, 'neg_pop_score': neg_pop_score, 'neg_click_score': neg_click_score}
+        output['query'] = {'query_int': query.chunk(2, -1)[0], 'query_pop': query.chunk(2, -1)[1]}


The chunk() operations are duplicated here, query_int and query_pop are defined in line 117.

Suggested change

output['query'] = {'query_int': query.chunk(2, -1)[0], 'query_pop': query.chunk(2, -1)[1]}

output['query'] = {'query_int': query_int, 'query_pop': query_pop}

XuHwang · 2022-09-15T06:37:16Z

recstudio/model/mf/dice.py

+from recstudio.model.mf.bpr import BPR
+from recstudio.model import basemodel, loss_func
+import time
+


u'd better attach the title and url of the paper corresponding to the model with a comment here. @pepsi2222

XuHwang · 2022-09-15T06:39:51Z

recstudio/data/advance_dataset.py

@@ -12,7 +12,8 @@ class ALSDataset(MFDataset):
    So the data provided should be ``<u, Iu>`` and ``<i, Ui>`` alternatively.
    """

-    def build(self, split_ratio, shuffle=True, split_mode='user_entry', **kwargs):
+    def build(self, split_ratio, shuffle=True, split_mode='user_entry', excluding_hist=False, **kwargs):
+        self.excluding_hist = excluding_hist


I think excluding_hist is not a good name here. How about return_hist? @pepsi2222 @Xiuchen519

XuHwang · 2022-09-15T06:46:37Z

recstudio/model/mf/dice.py

+
+                        if num_pop_items < self.pool:
+                            for cnt in range(num_neg):
+                                idx = torch.randint(num_unpop_items, (1,))


Why not using torch.randint(num_unpop_items, (num_neg,)) instead of for?

Suggested change

idx = torch.randint(num_unpop_items, (1,))

idx = torch.randint(num_unpop_items, (num_neg,))

XuHwang · 2022-09-15T06:54:54Z

recstudio/model/mf/expomf.py

+                # data to device
+                batch = self._to_device(batch, self.device)               
+                # update latent user/item factors
+                a = self._expectation(batch)


Maybe the method is not required to be overrided. U can add some conditional statement in training_step method to achieve the EM alg as below:

if batch_idx % 2 == 0: do expectation else: do maximization

XuHwang · 2022-09-15T07:05:53Z

recstudio/model/mf/pda.py

+                                                                              excluding_hist=self.config.get('excluding_hist', False),
+                                                                              method=self.config.get('sampling_method', 'none'), return_query=True)
+            pos_score = self.score_func(query, pos_item_vec)
+            pos_score = pos_item_vec.split([pos_item_vec.shape[-1]-1, 1], dim=-1)[1] ** self.config['gamma'] * pos_score #


I wonder the line is duplicated, becasue the operation has been done in scorer in line 53. @pepsi2222
If so, the method don't need to be overrided.

XuHwang · 2022-09-15T07:07:35Z

recstudio/model/mf/pmf.py

+    def _get_query_encoder(self, train_data):
+        return torch.nn.Embedding(train_data.num_users, self.embed_dim, padding_idx=0)
+
+    def _init_parameter(self):


You can define init_method: normal and init_range:0.1 in pmf.yaml without overriding the method.

XuHwang reviewed Sep 15, 2022

View reviewed changes

XuHwang requested changes Sep 15, 2022

View reviewed changes

XuHwang requested a review from Xiuchen519 March 14, 2023 13:08

pepsi2222 force-pushed the debias branch from 9e35941 to 9923d4c Compare March 22, 2023 06:19

debiasing with different backbones (v1)

a2263ab

pepsi2222 force-pushed the debias branch from 9923d4c to a2263ab Compare March 22, 2023 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debias #23

Debias #23

pepsi2222 commented Sep 14, 2022

XuHwang Sep 15, 2022 •

edited

Loading

XuHwang Sep 15, 2022

XuHwang left a comment

XuHwang Sep 15, 2022

XuHwang Sep 15, 2022

XuHwang Sep 15, 2022

XuHwang Sep 15, 2022 •

edited

Loading

XuHwang Sep 15, 2022

XuHwang Sep 15, 2022

XuHwang Sep 15, 2022

XuHwang Sep 15, 2022

	class DICEQueryEncoder(torch.nn.Module):
	torch.nn.Embedding(train_data.num_users, 2*self.embed_dim, padding_idx=0)

	output['query'] = {'query_int': query.chunk(2, -1)[0], 'query_pop': query.chunk(2, -1)[1]}
	output['query'] = {'query_int': query_int, 'query_pop': query_pop}

	idx = torch.randint(num_unpop_items, (1,))
	idx = torch.randint(num_unpop_items, (num_neg,))

Debias #23

Are you sure you want to change the base?

Debias #23

Conversation

pepsi2222 commented Sep 14, 2022

XuHwang Sep 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XuHwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XuHwang Sep 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XuHwang Sep 15, 2022 •

edited

Loading

XuHwang Sep 15, 2022 •

edited

Loading