Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out metadata creation from data import in the local files handlers #1975

Closed
npatki opened this issue May 1, 2024 · 0 comments · Fixed by #1988
Closed

Split out metadata creation from data import in the local files handlers #1975

npatki opened this issue May 1, 2024 · 0 comments · Fixed by #1988
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented May 1, 2024

Problem Description

The local file handlers (for CSV and Excel) currently return both the metadata and data in one go.

data, metadata = handler.read(folder_name='project/data')

While this approach is ok, it is not consistent with other parts of our product and may not be future-proof.

Expected behavior

  1. Update the handler.read function (for both CSV and Excel) to just read and return the data. Do not create or return metadata in this function.
  2. Create a new function called create_metadata (for both CSV and Excel). This function should take as input a dictionary of DataFrames. It should output a MultiTableMetadata object.

Below is the updated user journey:

from sdv.io.local import CSVHandler

handler = CSVHandler(sep='\t', encoding='UTF') 
data = handler.read(folder_name='project/data')
metadata = handler.create_metadata(data)

Additional context

In the future, we would like to allow flags in create_metadata in order to control whether inference should be done. But this is out of scope for the current issue.

# EXAMPLE only: out of scope
metadata = handler.create_metadata(
  infer_sdtypes=True,
  infer_primary_keys=False,
  infer_foreign_keys=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants