Skip to content

Commit

Permalink
Update docstring and user guides, some bug fix(#84)
Browse files Browse the repository at this point in the history
  • Loading branch information
Wh1isper committed Dec 20, 2023
1 parent b3b6aee commit e636c80
Show file tree
Hide file tree
Showing 19 changed files with 440 additions and 65 deletions.
21 changes: 9 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,27 +80,24 @@ from sdgx.models.ml.single_table.ctgan import CTGANSynthesizerModel
from sdgx.synthesizer import Synthesizer
from sdgx.utils import download_demo_data

# This will download demo data to ./dataset
dataset_csv = download_demo_data()

# Create data connector for csv file
data_connector = CsvConnector(path=dataset_csv)

# Initialize synthesizer, use CTGAN model
synthesizer = Synthesizer(
model=CTGANSynthesizerModel(epochs=1), # For quick demo
data_connector=data_connector,
)

# Fit the model
synthesizer.fit()
sampled_data = synthesizer.sample(1000)
synthesizer.cleanup() # Clean all cache

# Optional, use JSD for mectics
from sdgx.metrics.column.jsd import JSD

JSD = JSD()

selected_columns = ["workclass"]
isDiscrete = True
metrics = JSD.calculate(data_connector.read(), sampled_data, selected_columns, isDiscrete)

print("JSD metric of column %s: %g" % (selected_columns[0], metrics))
# Sample
sampled_data = synthesizer.sample(1000)
print(sampled_data)
```

#### Comparison
Expand Down
21 changes: 9 additions & 12 deletions README_ZH_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,27 +79,24 @@ from sdgx.models.ml.single_table.ctgan import CTGANSynthesizerModel
from sdgx.synthesizer import Synthesizer
from sdgx.utils import download_demo_data

# This will download demo data to ./dataset
dataset_csv = download_demo_data()

# Create data connector for csv file
data_connector = CsvConnector(path=dataset_csv)

# Initialize synthesizer, use CTGAN model
synthesizer = Synthesizer(
model=CTGANSynthesizerModel(epochs=1), # For quick demo
data_connector=data_connector,
)

# Fit the model
synthesizer.fit()
sampled_data = synthesizer.sample(1000)
synthesizer.cleanup() # Clean all cache

# Optional, use JSD for mectics
from sdgx.metrics.column.jsd import JSD

JSD = JSD()

selected_columns = ["workclass"]
isDiscrete = True
metrics = JSD.calculate(data_connector.read(), sampled_data, selected_columns, isDiscrete)

print("JSD metric of column %s: %g" % (selected_columns[0], metrics))
# Sample
sampled_data = synthesizer.sample(1000)
print(sampled_data)
```

#### 对比
Expand Down
6 changes: 5 additions & 1 deletion docs/source/api_reference/data_connectors/index.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
DataConnector
========================================================

.. toctree::
:maxdepth: 1

Base Class for DataConnector <base>

Built-in DataConnector
-----------------------------

.. toctree::
:maxdepth: 2

DataConnector <base>
CsvConnector <csv_connector>
GeneratorConnector <generator_connector>

Expand Down
6 changes: 5 additions & 1 deletion docs/source/api_reference/data_models/inspectors/index.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
Inspectors
========================================================

.. toctree::
:maxdepth: 1

Base Class for Inspector <base>

Built-in Inspector
-----------------------------

.. toctree::
:maxdepth: 2

Inspector <base>
DiscreteInspector <discrete>


Expand Down
7 changes: 6 additions & 1 deletion docs/source/api_reference/data_processors/index.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,18 @@
DataProcessor
========================================================

.. toctree::
:maxdepth: 1

Base Class for DataProcessor <base>


Built-in DataProcessor
-----------------------------

.. toctree::
:maxdepth: 2

DataProcessor <base>

Custom DataProcessor Relevant
-----------------------------
Expand Down
1 change: 1 addition & 0 deletions docs/source/api_reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ API Reference

Synthesizer <synthesizer>
Data Connector <data_connectors/index>
Data Models <data_models/index>
Data Loader <data_loader>
Cacher for DataLoader <cachers/index>
Data Processor <data_processors/index>
Expand Down
4 changes: 2 additions & 2 deletions docs/source/api_reference/models/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ Models
========================================================

.. toctree::
:maxdepth: 2
:maxdepth: 1

SynthesizerModel <base>
Base Class for SynthesizerModel <base>


Built-in ML Models
Expand Down
16 changes: 13 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

# -- Project information -----------------------------------------------------

project = "synthetic-data-generator"
project = "Synthetic Data Generator"
copyright = "2023, hitsz-ids"
author = "hitsz-ids"

Expand Down Expand Up @@ -61,13 +61,23 @@
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
html_theme = "pydata_sphinx_theme"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

todo_include_todos = True

html_logo = "_static/sdg_logo.png"
html_theme_options = {
"github_url": "https://github.com/hitsz-ids/synthetic-data-generator",
"use_edit_page_button": True,
}

html_context = {
"github_user": "hitsz-ids",
"github_repo": "synthetic-data-generator",
"github_version": "main",
"doc_path": "docs/source",
}
33 changes: 31 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,39 @@ Or install our python package with pip
In order to use the GPU for synthesis, you may need to refer to `Torch's GPU installation guide <https://pytorch.org/get-started/locally/>`_.


Use sdgx for generating synthetic data
Quick demo
====================================================================

.. TODO: The tutorials and feature introduction.
.. code-block:: python
"""
Example for CTGAN
"""
from sdgx.data_connectors.csv_connector import CsvConnector
from sdgx.models.ml.single_table.ctgan import CTGANSynthesizerModel
from sdgx.synthesizer import Synthesizer
from sdgx.utils import download_demo_data
# This will download demo data to ./dataset
dataset_csv = download_demo_data()
# Create data connector for csv file
data_connector = CsvConnector(path=dataset_csv)
# Initialize synthesizer, use CTGAN model
synthesizer = Synthesizer(
model=CTGANSynthesizerModel(epochs=1), # For quick demo
data_connector=data_connector,
)
# Fit the model
synthesizer.fit()
# Sample
sampled_data = synthesizer.sample(1000)
print(sampled_data)
You can refer our user guides for more details.

Expand Down
4 changes: 3 additions & 1 deletion docs/source/user_guides/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@
Guides for users
==================================================

The Guides for users section includes SDG usage for different scenarios.

.. toctree::
:maxdepth: 2

Use CLI directly <cli>
Use sdgx as a library <library>
Use SDG as a library <library>
Synthetic single-table data <single_table>
Synthetic multi-table data <multi_table>
Evaluation synthetic data <evaluation>
Loading

0 comments on commit e636c80

Please sign in to comment.