Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.2.0] Metadata Implementation #81

Merged
merged 37 commits into from
Dec 27, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
de083b8
Update file structure
MooooCat Dec 19, 2023
7834da0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 19, 2023
dd87ee1
Update single_table.py
MooooCat Dec 20, 2023
14e83e6
Merge branch 'main' into feature-metadata
MooooCat Dec 21, 2023
06a14f2
Update metadata (still draft)
MooooCat Dec 21, 2023
ac89e8e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 21, 2023
1c322d0
Update metadata and reset file structure
MooooCat Dec 23, 2023
462b3cb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 23, 2023
fb1565c
add numeric inspector
MooooCat Dec 23, 2023
f6da1d3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 23, 2023
7e65c83
fix metadata initialization
MooooCat Dec 23, 2023
329755b
fix type hits error in py38
MooooCat Dec 23, 2023
f2d50f5
add inspectors
MooooCat Dec 23, 2023
1f4ff08
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 23, 2023
10f0175
Update relationship.py
MooooCat Dec 24, 2023
c027feb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 24, 2023
9b026fc
Update multi-table-combiner
MooooCat Dec 25, 2023
c768a24
add composite key list in relationship
MooooCat Dec 25, 2023
8af1e1d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 25, 2023
ed30dad
Apply suggestions from code review
MooooCat Dec 25, 2023
f2e4954
Apply suggestions from code review
MooooCat Dec 25, 2023
44d0c61
use a single list for primary key(s)
MooooCat Dec 25, 2023
75985e0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 25, 2023
cfd2a11
Sync with main.
MooooCat Dec 25, 2023
644568f
Update expections
MooooCat Dec 25, 2023
c1b239c
Update check functions
MooooCat Dec 25, 2023
d6077d6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 25, 2023
89e4d1f
Apply suggestions from code review
MooooCat Dec 25, 2023
143eeb9
Apply suggestions from code review
MooooCat Dec 25, 2023
49e02c4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 25, 2023
b2889da
update some test case
MooooCat Dec 26, 2023
8faa6e2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 26, 2023
4cf8db0
update test cases
MooooCat Dec 27, 2023
98f4596
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 27, 2023
b50e41b
Merge branch 'main' into feature-metadata
MooooCat Dec 27, 2023
7687f77
update metadata save and load test cases.
MooooCat Dec 27, 2023
ad312c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions sdgx/data_models/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,7 @@ class Metadata(BaseModel):

# for primary key
# compatible with single primary key or composite primary key
primary_key: str = "primary_key undefined"
_composite_primary_key: bool = False
primary_key_list: list = []
primary_keys: List[str] = []

# variables related to columns
# column_list is used to store all columns' name
Expand Down
56 changes: 54 additions & 2 deletions sdgx/data_models/multi_table_combiner.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,54 @@
class MultiTableCombiner:
pass
from typing import Any, Dict, List

from pydantic import BaseModel

from sdgx.data_models.metadata import Metadata
from sdgx.data_models.relationship import Relationship


class MultiTableCombiner(BaseModel):
"""MultiTableCombiner: combine different tables using relationship

Args:
metadata_dict (Dict[str, Any]):

relationships (List[Any]):
"""

metadata_version: str = "1.0"

metadata_dict: Dict[str, Any] = {}
relationships: List[Any] = []

def check(self):
"""Do necessary checks:

- Whether number of tables corresponds to relationships.
- Whether table names corresponds to the relationship between tables;
"""

# count check
relationship_cnt = len(self.relationships)
metadata_cnt = len(self.metadata_dict.keys())
if metadata_cnt != relationship_cnt + 1:
raise ValueError("Number of tables should corresponds to relationships.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a new exception in sdgx/exceptions.py? This will help us differentiate between what are expected exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I didn't consider, I'll add new exceptions MultiTableCombineError and MetadataError in sdgx.exceptions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better unit testing and bug fixing, I'd suggest synchronizing the main branch, which also includes Exceptions with error codes and exit codes


# table name check
table_names_from_relationships = set()

# each relationship's table must have metadata
table_names = list(self.metadata_dict.keys())
for each_r in self.relationships:
if each_r.parent_table not in table_names:
raise ValueError(f"Metadata of parent table {each_r.parent_table} is missing.")
if each_r.child_table not in table_names:
raise ValueError(f"Metadata of child table {each_r.child_table} is missing.")
table_names_from_relationships.add(each_r.parent_table)
table_names_from_relationships.add(each_r.child_table)

# each table in metadata must in a relationship
for each_t in table_names:
if each_t not in table_names_from_relationships:
raise ValueError(f"Table {each_t} has not relationship.")

return True
10 changes: 8 additions & 2 deletions sdgx/data_models/relationship.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from typing import List

from pydantic import BaseModel


Expand All @@ -16,8 +18,12 @@ class Relationship(BaseModel):
parent_table: str
child_table: str

# keys
child_table_foreign_key: str
# foreign keys
child_table_foreign_key: str = "foreign key undefined"

# for composite keys
composite_foreign_key: bool = False
child_table_composite_foreign_key: List[str] = []
MooooCat marked this conversation as resolved.
Show resolved Hide resolved

def __init__(self, **kwargs):
super().__init__(**kwargs)
Expand Down