-
Notifications
You must be signed in to change notification settings - Fork 541
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #48 from hitsz-ids/feature-Data_Processor
Update SDG's New Data Processor Structure
- Loading branch information
Showing
19 changed files
with
66 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# 设计上有参考sdv,但我们想做的更进一步 | ||
# 主要描述以下几类信息: | ||
# - 列与列之间的限制关系 | ||
# - 列与数值之间的关系 | ||
# - 列与其他规则之间的关系,例如:我们将首先支持正则表达式 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Formatter: 列格式转换工具,基本描述如下: | ||
# - 针对不同类型的列,实现解析能力,例如:DataTime 搞成时间戳形式; | ||
# - 针对不同类型的列,提供格式上的转换能力 | ||
# - 输入和输出均为【列】数据 | ||
# | ||
# 同时也在此说明与 transform 的区别: | ||
# - 涉及到【单列】作为输入的,涉及【格式转换】问题,使用 formatter | ||
# - 涉及到【整张表】作为输入的进行转换,使用 data transformer | ||
# - 通常,在 Data Transformer 的实现中,针对列的情况,调用不同的 formatter | ||
# - 提供 extract 方法 | ||
# | ||
|
||
|
||
class BaseFormatter(object): | ||
# def extract_xxx(self): | ||
# pass | ||
|
||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# 顾名思义,Metadata 用于记录表的元数据信息,在第一阶段,主要描述如下: | ||
# - 我们会参考 sdv 中的 metadata 管理方法,但不会照搬; | ||
# - 我们则主要提供表元数据的描述; | ||
# - 我们提供表于表之间元数据的描述; | ||
# - 我们提供一些必要的问题检测,例如:DAG检测; | ||
# - 我们提供足够的人工接口,用于人工修改和配置元数据; | ||
# - 未来会提供更多实用功能 |
File renamed without changes.
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# PII Generator 模块专门用于针对 PII 类型的列进行 【生成】 | ||
# 随机生成是一种简单粗暴且有效的方法,但我们会更进一步 | ||
# 该模块主要负责: | ||
# - 针对不同类型的 PII 对象(列),提供针对列的批量生成方法 | ||
# - 针对不同类型的 PII 对象,提供随机化的生成方法 | ||
# - 以地域、归属地等限制条件为输入,生成 PII 对象 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sampler 模块主要用于针对下列情况: | ||
# - 大规模数据库时候的情况; | ||
# - 其他必要的情况,包括:csv, xls 等; | ||
# - 个别模型模型所需的 sampler ; |
File renamed without changes.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters