Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Base Class of Metric #60

Merged
merged 13 commits into from
Dec 18, 2023
Merged

Update Base Class of Metric #60

merged 13 commits into from
Dec 18, 2023

Conversation

MooooCat
Copy link
Contributor

@MooooCat MooooCat commented Dec 11, 2023

Description

Updated several files in the sdgx/metric/ directory

Motivation and Context

Update the metric base class used to evaluate the quality of synthetic data

How has this been tested?

After the specific metric is implemented, that metric will be integrated into the test/ directory and tested using pytest.

Types of changes

  • Maintenance (no change in code, maintain the project's CI, docs, etc.)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Copy link
Contributor

sweep-ai bot commented Dec 11, 2023

Apply Sweep Rules to your PR?

  • Apply: All new business logic should have corresponding unit tests.
  • Apply: Refactor large functions to be more modular.
  • Apply: Add docstrings to all functions and file headers.

sdgx/metrics/column/base.py Outdated Show resolved Hide resolved
sdgx/metrics/single_table/base.py Outdated Show resolved Hide resolved
sdgx/metrics/single_table/base.py Outdated Show resolved Hide resolved
@Wh1isper
Copy link
Collaborator

Metric isn't the focus at the moment, I'm in favour of us being able to put a demo on it first, but it might not look like this.

@Wh1isper
Copy link
Collaborator

When we consider metrics as a standalone module, what we expect is basically to input the generated table with the original table. Then calculate something like a score or an analysis report.

But considering that the generated and raw data are not necessarily csv files and pandas can't handle very large amounts of data, then we will get a restricted metrics class(mostly by performance). Maybe we should separate it to model and data interface.

Co-authored-by: Zhongsheng Ji <9573586@qq.com>
@MooooCat
Copy link
Contributor Author

Metric isn't the focus at the moment, I'm in favour of us being able to put a demo on it first, but it might not look like this.

From an algorithmic perspective, metric can become a standalone module (so as to models). We now need to complete the entire process through adding the metric module.

Currently @sjh120 is implementing Jensen–Shannon Divergence(JSD Feature) and adding it to the demo of single-table data generation example, we can suggest him to merge JSD Feature and new demo into this branch.

@Wh1isper
Copy link
Collaborator

and adding it to the demo of single-table data generation example

Good! It would make sense if this PR could contain working demos.

@Wh1isper Wh1isper removed the request for review from wunder957 December 11, 2023 06:37
@MooooCat
Copy link
Contributor Author

But considering that the generated and raw data are not necessarily csv files and pandas can't handle very large amounts of data, then we will get a restricted metrics class(mostly by performance). Maybe we should separate it to model and data interface.

This is indeed an important factor and has been ignored (even by other simulation data projects).

I have a suggestion, please evaluate if it is suitable.

Developers first implement the calculate method and demos which can handle non-big data situations (this can be completed this week and next week), then we use new data interfaces to make performance optimization for large-scale data situation.

Maybe the same approach can be used for the our models (Synthesizer).

@Wh1isper
Copy link
Collaborator

Wh1isper commented Dec 11, 2023

Developers first implement the calculate method and demos which can handle non-big data situations (this can be completed this week and next week)

Yes. I'm glad to see that there are practices that can be referenced, it's much better than imagining things out of thin air.

MooooCat and others added 8 commits December 18, 2023 12:18
* Feature metric jsd (#66)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Zhongsheng Ji <9573586@qq.com>

* jsd

* jsd

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* base更新

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Delete .idea directory

---------


Co-authored-by: Jinhang Su <171846802@qq.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: MoooCat <141886018+MooooCat@users.noreply.github.com>
Co-authored-by: Zhongsheng Ji <9573586@qq.com>


* Apply suggestions from code review

Co-authored-by: Zhongsheng Ji <9573586@qq.com>

* Update type hint and comments.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Zhongsheng Ji <9573586@qq.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------


Co-authored-by: Jinhang Su <171846802@qq.com>
Co-authored-by: sjh120 <171846802@qq.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zhongsheng Ji <9573586@qq.com>
unnecessary imports are also removed.
The format of some codes has also been adjusted.
@MooooCat
Copy link
Contributor Author

At present, this PR contains:

  • three base classes of synthetic data metric;
  • one implementation of column metric;
  • one example of column metric.

I prefer to merge this PR at this stage.

@MooooCat MooooCat marked this pull request as ready for review December 18, 2023 08:39
@MooooCat MooooCat merged commit 7b66566 into main Dec 18, 2023
11 checks passed
@Wh1isper
Copy link
Collaborator

@all-contributors please add @sjh120 for code

Copy link
Contributor

@Wh1isper

I've put up a pull request to add @sjh120! 🎉

@Wh1isper Wh1isper deleted the feature-metric-base branch December 18, 2023 09:23
Wh1isper added a commit that referenced this pull request Dec 18, 2023
* Update metric base-class

* Update column metric base class

* Update Metric Base Class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Zhongsheng Ji <9573586@qq.com>

* Feature: Add metric jsd (#66) (#71)

* Feature metric jsd (#66)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Zhongsheng Ji <9573586@qq.com>

* jsd

* jsd

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* base更新

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Delete .idea directory

---------


Co-authored-by: Jinhang Su <171846802@qq.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: MoooCat <141886018+MooooCat@users.noreply.github.com>
Co-authored-by: Zhongsheng Ji <9573586@qq.com>


* Apply suggestions from code review

Co-authored-by: Zhongsheng Ji <9573586@qq.com>

* Update type hint and comments.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Zhongsheng Ji <9573586@qq.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------


Co-authored-by: Jinhang Su <171846802@qq.com>
Co-authored-by: sjh120 <171846802@qq.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zhongsheng Ji <9573586@qq.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo error

unnecessary imports are also removed.

* Add type hints and comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update input check methods

The format of some codes has also been adjusted.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zhongsheng Ji <9573586@qq.com>
Co-authored-by: Jinhang Su <171846802@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants