Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.1.0] Refactoring CTGAN for DataLoader #72

Merged
merged 13 commits into from
Dec 18, 2023
Merged

[0.1.0] Refactoring CTGAN for DataLoader #72

merged 13 commits into from
Dec 18, 2023

Conversation

Wh1isper
Copy link
Collaborator

@Wh1isper Wh1isper commented Dec 18, 2023

Description

Motivation and Context

  • Rewrite CTGAN based on MIT Licensed code
  • Imp and test part of Synthesizer
  • Intro optimized CTGAN component for DataLoader's chunk read.

How has this been tested?

Types of changes

  • Maintenance (no change in code, maintain the project's CI, docs, etc.)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@Wh1isper Wh1isper added this to the 0.1.0 milestone Dec 18, 2023
Copy link
Contributor

sweep-ai bot commented Dec 18, 2023

Apply Sweep Rules to your PR?

  • Apply: All new business logic should have corresponding unit tests.
  • Apply: Refactor large functions to be more modular.
  • Apply: Add docstrings to all functions and file headers.

@Wh1isper Wh1isper marked this pull request as draft December 18, 2023 04:52
@Wh1isper Wh1isper marked this pull request as ready for review December 18, 2023 09:10
@Wh1isper
Copy link
Collaborator Author

Wh1isper commented Dec 18, 2023

@MooooCat
This PR supports DataLoader on the interface, but does not yet support chunk reading on all model component

I'll draft another PR for lazy ndarray loader for Sampler

Docs is not updated yet, will update all before release.

@Wh1isper Wh1isper marked this pull request as draft December 18, 2023 09:39
@Wh1isper Wh1isper marked this pull request as ready for review December 18, 2023 10:01
@Wh1isper
Copy link
Collaborator Author

I believe DataLoader has a lower memory consumption when it is fetching whole columns: https://github.com/hitsz-ids/synthetic-data-generator/blob/0.1.0-ctgan/sdgx/data_loader.py#L106

@MooooCat
Copy link
Contributor

This PR supports DataLoader on the interface, but does not yet support chunk reading on all model component

Sure, we can finish these components one by one, the files changed is enough for this PR.

I may spend a little time running and understanding the new module such as DataLoader, which won't be long.

If we provide some examples of DataLoader or other modules in docs, developers may be able to quickly understand the use of modules.

@MooooCat MooooCat merged commit 2dab1e1 into main Dec 18, 2023
11 checks passed
@Wh1isper Wh1isper deleted the 0.1.0-ctgan branch December 18, 2023 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants