You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This inspector determines whether a column matches the regular expression given by the user and outputs the column names.
Add an Inspector that accepts two parameters:
User defined regular expression (string type);
Whether it is a PII column (bool type): whether the column contains private information.
🏕Solution(optional)
The content is as follows:
Inherit sdgx.data_models.inspector.base.Inspector and implement the fit method;
Inherit sdgx.data_models.inspector.base.Inspector and implement the inspect method;
Complete examples of using this Inspector to infer data types;
Complete necessary test cases.
🍰Detail(optional)
For the __init__ method:
This method should contain regular expressions as input parameter;
Necessary checks should be executed on regular expressions.
For the fit method, the input parameters should be:
raw_data (pd.DataFrame): the input data;
It is recommended to add a match_rate parameter (default is set to 0.8 or other values). This parameter is between 0-1, when a column of data with a "match_rate" ratio matches the regular expression, this column should appear in the inspect results.
For inspect method:
Like other inspectors, should output the names of columns that match the data type inferred by this inspector.
Output PII attributes for easy updating to metadata.
🍰Example(optional)
inspectors = InspectorManager().init_inspcetors(
include_inspectors, exclude_inspectors, **(inspector_init_kwargs or {})
)
for inspector in inspectors:
inspector.fit(df)
metadata = Metadata(primary_keys=[df.columns[0]], column_list=list(df.columns))
for inspector in inspectors:
metadata.update(inspector.inspect())
The text was updated successfully, but these errors were encountered:
🚅Search before asking
I have searched for issues similar to this one.
🚅Description
This inspector determines whether a column matches the regular expression given by the user and outputs the column names.
Add an Inspector that accepts two parameters:
🏕Solution(optional)
The content is as follows:
sdgx.data_models.inspector.base.Inspector
and implement the fit method;sdgx.data_models.inspector.base.Inspector
and implement the inspect method;🍰Detail(optional)
For the
__init__
method:For the fit method, the input parameters should be:
match_rate
parameter (default is set to 0.8 or other values). This parameter is between 0-1, when a column of data with a "match_rate" ratio matches the regular expression, this column should appear in the inspect results.For inspect method:
🍰Example(optional)
The text was updated successfully, but these errors were encountered: