Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Highway & Attention Component And AITM model #471

Merged
merged 6 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Running Platform:
- [DSSM](docs/source/models/dssm.md) / [MIND](docs/source/models/mind.md) / [DropoutNet](docs/source/models/dropoutnet.md) / [CoMetricLearningI2I](docs/source/models/co_metric_learning_i2i.md) / [PDN](docs/source/models/pdn.md)
- [W&D](docs/source/models/wide_and_deep.md) / [DeepFM](docs/source/models/deepfm.md) / [MultiTower](docs/source/models/multi_tower.md) / [DCN](docs/source/models/dcn.md) / [FiBiNet](docs/source/models/fibinet.md) / [MaskNet](docs/source/models/masknet.md) / [PPNet](docs/source/models/ppnet.md) / [CDN](docs/source/models/cdn.md)
- [DIN](docs/source/models/din.md) / [BST](docs/source/models/bst.md) / [CL4SRec](docs/source/models/cl4srec.md)
- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [PLE](docs/source/models/ple.md)
- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [AITM](docs/source/models/aitm.md) / [PLE](docs/source/models/ple.md)
- [HighwayNetwork](docs/source/models/highway.md) / [CMBF](docs/source/models/cmbf.md) / [UNITER](docs/source/models/uniter.md)
- More models in development

Expand Down
Binary file added docs/images/models/aitm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 8 additions & 7 deletions docs/source/component/backbone.md
Original file line number Diff line number Diff line change
Expand Up @@ -1111,13 +1111,14 @@ MovieLens-1M数据集效果:

## 2.特征交叉组件

| 类名 | 功能 | 说明 | 示例 |
| -------------- | ---------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------- |
| FM | 二阶交叉 | DeepFM模型的组件 | [案例2](#deepfm) |
| DotInteraction | 二阶内积交叉 | DLRM模型的组件 | [案例4](#dlrm) |
| Cross | bit-wise交叉 | DCN v2模型的组件 | [案例3](#dcn) |
| BiLinear | 双线性 | FiBiNet模型的组件 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
| FiBiNet | SENet & BiLinear | FiBiNet模型 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
| 类名 | 功能 | 说明 | 示例 |
| -------------- | --------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------- |
| FM | 二阶交叉 | DeepFM模型的组件 | [案例2](#deepfm) |
| DotInteraction | 二阶内积交叉 | DLRM模型的组件 | [案例4](#dlrm) |
| Cross | bit-wise交叉 | DCN v2模型的组件 | [案例3](#dcn) |
| BiLinear | 双线性 | FiBiNet模型的组件 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
| FiBiNet | SENet & BiLinear | FiBiNet模型 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
| Attention | Dot-product attention | Transformer模型的组件 | |

## 3.特征重要度学习组件

Expand Down
27 changes: 27 additions & 0 deletions docs/source/component/component.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,33 @@
| senet | SENet | | protobuf message |
| mlp | MLP | | protobuf message |

- Attention

Dot-product attention layer, a.k.a. Luong-style attention.

The calculation follows the steps:

1. Calculate attention scores using query and key with shape (batch_size, Tq, Tv).
1. Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv).
1. Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim).

| 参数 | 类型 | 默认值 | 说明 |
| ----------------------- | ------ | ----- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| use_scale | bool | False | If True, will create a scalar variable to scale the attention scores. |
| score_mode | string | dot | Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors. |
| dropout | float | 0.0 | Float between 0 and 1. Fraction of the units to drop for the attention scores. |
| seed | int | None | A Python integer to use as random seed incase of dropout. |
| return_attention_scores | bool | False | if True, returns the attention scores (after masking and softmax) as an additional output argument. |
| use_causal_mask | bool | False | Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. |

- inputs: List of the following tensors:
- query: Query tensor of shape (batch_size, Tq, dim).
- value: Value tensor of shape (batch_size, Tv, dim).
- key: Optional key tensor of shape (batch_size, Tv, dim). If not given, will use value for both key and value, which is the most common case.
- output:
- Attention outputs of shape (batch_size, Tq, dim).
- (Optional) Attention scores after masking and softmax with shape (batch_size, Tq, Tv).

## 3.特征重要度学习组件

- SENet
Expand Down
118 changes: 118 additions & 0 deletions docs/source/models/aitm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# AITM

### 简介

在推荐场景里,用户的转化链路往往有多个中间步骤(曝光->点击->转化),AITM是一种多任务模型框架,充分利用了链路上各个节点的样本,提升模型对后端节点转化率的预估。

![AITM](../../images/models/aitm.jpg)

1. (a) Expert-Bottom pattern。如 [MMoE](mmoe.md)
1. (b) Probability-Transfer pattern。如 [ESMM](esmm.md)
1. (c) Adaptive Information Transfer Multi-task (AITM) framework.

两个特点:

1. 使用Attention机制来融合多个目标对应的特征表征;
1. 引入了行为校正的辅助损失函数。

### 配置说明

```protobuf
model_config {
model_name: "AITM"
model_class: "MultiTaskModel"
feature_groups {
group_name: "all"
feature_names: "user_id"
feature_names: "cms_segid"
...
feature_names: "tag_brand_list"
wide_deep: DEEP
}
backbone {
blocks {
name: "mlp"
inputs {
feature_group_name: "all"
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [512, 256]
}
}
}
}
model_params {
task_towers {
tower_name: "ctr"
label_name: "clk"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128]
}
use_ait_module: true
weight: 1.0
}
task_towers {
tower_name: "cvr"
label_name: "buy"
losses {
loss_type: CLASSIFICATION
}
losses {
loss_type: ORDER_CALIBRATE_LOSS
}
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128]
}
relation_tower_names: ["ctr"]
use_ait_module: true
ait_project_dim: 128
weight: 1.0
}
l2_regularization: 1e-6
}
embedding_regularization: 5e-6
}
```

- model_name: 任意自定义字符串,仅有注释作用

- model_class: 'MultiTaskModel', 不需要修改, 通过组件化方式搭建的多目标排序模型都叫这个名字

- feature_groups: 配置一组特征。

- backbone: 通过组件化的方式搭建的主干网络,[参考文档](../component/backbone.md)

- blocks: 由多个`组件块`组成的一个有向无环图(DAG),框架负责按照DAG的拓扑排序执行个`组件块`关联的代码逻辑,构建TF Graph的一个子图
- name/inputs: 每个`block`有一个唯一的名字(name),并且有一个或多个输入(inputs)和输出
- keras_layer: 加载由`class_name`指定的自定义或系统内置的keras layer,执行一段代码逻辑;[参考文档](../component/backbone.md#keraslayer)
- mlp: MLP模型的参数,详见[参考文档](../component/component.md#id1)

- model_params: AITM相关的参数

- task_towers 根据任务数配置task_towers
- tower_name
- dnn deep part的参数配置
- hidden_units: dnn每一层的channel数目,即神经元的数目
- use_ait_module: if true 使用`AITM`模型;否则,使用[DBMTL](dbmtl.md)模型
- ait_project_dim: 每个tower对应的表征向量的维度,一般设为最后一个隐藏的维度即可
- 默认为二分类任务,即num_class默认为1,weight默认为1.0,loss_type默认为CLASSIFICATION,metrics_set为auc
- loss_type: ORDER_CALIBRATE_LOSS 使用目标依赖关系校正预测结果的辅助损失函数,详见原始论文
- 注:label_fields需与task_towers一一对齐。
- embedding_regularization: 对embedding部分加regularization,防止overfit

### 示例Config

- [AITM_demo.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/aitm_on_taobao.config)

### 参考论文

[AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf)
6 changes: 4 additions & 2 deletions docs/source/models/loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2
| PAIRWISE_LOGISTIC_LOSS | pair粒度的logistic loss, 支持自定义pair分组 |
| JRC_LOSS | 二分类 + listwise ranking loss |
| F1_REWEIGHTED_LOSS | 可以调整二分类召回率和准确率相对权重的损失函数,可有效对抗正负样本不平衡问题 |
| ORDER_CALIBRATE_LOSS | 使用目标依赖关系校正预测结果的辅助损失函数,详见[AITM](aitm.md)模型 |

- 说明:SOFTMAX_CROSS_ENTROPY_WITH_NEGATIVE_MINING
- 支持参数配置,升级为 [support vector guided softmax loss](https://128.84.21.199/abs/1812.11317) ,
Expand Down Expand Up @@ -71,9 +72,9 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2

- f1_beta_square: 大于1的值会导致模型更关注recall,小于1的值会导致模型更关注precision
- F1 分数,又称平衡F分数(balanced F Score),它被定义为精确率和召回率的调和平均数。
- ![f1 score](../images/other/f1_score.svg)
- ![f1 score](../../images/other/f1_score.svg)
- 更一般的,我们定义 F_beta 分数为:
- ![f_beta score](../images/other/f_beta_score.svg)
- ![f_beta score](../../images/other/f_beta_score.svg)
- f1_beta_square 即为 上述公式中的 beta 系数的平方。

- PAIRWISE_FOCAL_LOSS 的参数配置
Expand Down Expand Up @@ -159,3 +160,4 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2

- 《 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics 》
- 《 [Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/abs/2111.10603) 》
- [AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf)
1 change: 1 addition & 0 deletions docs/source/models/multi_target.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@
esmm
mmoe
dbmtl
aitm
ple
simple_multi_task
1 change: 1 addition & 0 deletions easy_rec/python/layers/keras/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .attention import Attention
from .auxiliary_loss import AuxiliaryLoss
from .blocks import MLP
from .blocks import Gate
Expand Down
Loading
Loading