You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, HMA cannot run on certain multi-table schemas. We issue a warning when a schema will generate too many columns, and we should provide a utility function to easily reduce a multi-table schema so it can successfully run on HMA.
Expected behavior
Add a new utility function utils.simplify_schema:
Parameters:
data - the data dictionary
metadata - the MultiTableMetadata for this dataset
Returns:
A data dictionary mapping table names to simplified tables
drop any table that is depth > 2 away from the parent (i.e. keep only direct children and grandchildren) and count the number of tables connected to the root
Select the root with the greatest number of descendant tables
Calculate the number of extended columns we can add to the root (we can reuse the logic used to generate the warning in HMA)
Allocate a # of augmented columns to each child relationship
For each child:
Determine the number of modelable columns and add the number of child relationships for that child
If the number of modelable columns will generate more than the allowed number of extended columns, drop modelable columns from the child
Try to keep a variety of sdtypes
If we cannot drop columns so that we will not exceed the maximum number of extended_columns, drop any grandchild tables until we can
For each grandchild:
Drop all modelable columns (grandchildren should only generate a num_rows column in their parents)
Additional context
We should also change the warning in HMA to point to this utility function:
>>> synthesizer = HMASynthesizer(metadata)
PerformanceAlert: Using the HMASynthesizer on this metadata schema is not recommended because HMA will generate a large number of columns
Table Name # Columns in Metadata Est # Columns
users 12 123123123
transactions
...
We recommend simplifying your metadata schema using utils.simplify_schema
The text was updated successfully, but these errors were encountered:
Problem Description
Currently, HMA cannot run on certain multi-table schemas. We issue a warning when a schema will generate too many columns, and we should provide a utility function to easily reduce a multi-table schema so it can successfully run on HMA.
Expected behavior
Add a new utility function
utils.simplify_schema
:Parameters:
data
- the data dictionarymetadata
- the MultiTableMetadata for this datasetReturns:
MultiTableMetadata
for the simplified data schemaAlgorithm overview
For every root table:
Select the root with the greatest number of descendant tables
Calculate the number of extended columns we can add to the root (we can reuse the logic used to generate the warning in HMA)
Allocate a # of augmented columns to each child relationship
For each child:
For each grandchild:
num_rows
column in their parents)Additional context
We should also change the warning in HMA to point to this utility function:
The text was updated successfully, but these errors were encountered: