You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Internally, reading should use the read_excel function from pandas. A few things should be hardcoded by default
Pandas should not detect an index column from the data
Pandas should not try to infer datetime formats (or cast them to np.datetime objects). Any datetime column should be left as a dtype 'object'
After reading the data, we should use it to infer a MultiTableMetadata object. (Even if there is only 1 table, we should still create a MultiTableMetadata object.)
Parameters
(required) file_path: A string describing the path of the Excel file to read
sheet_name: A list of strings denoting which sheets in the Excel file to read from
(default) None: Read all the sheets in the file
list(str): Read only the sheets listed
Returns
data: A dictionary mapping each table name to a pandas DataFrame with the data. The table name is the same as the sheet name
metadata: A MultiTableMetadata object that describes the data
write
Functionality
Internally, writing should use the to_excel function from pandas. A few things should be hardcoded by default
Do not write the index column
Each table of the synthetic data should be written as a new sheet within the file. The name of the sheet should be the same as the name of the table
If a sheet already exists with the same name, completely override it
Parameters
(required) synthetic_data: A dictionary that maps each table name to a pandas.DataFrame containing data from it
(required) file_name: The name of the excel file to write
sheet_name_suffix: A string with a suffix to add to each sheet name
(default) None: The name of the table should be the name of the sheet
(str) Append this string as the suffix. Eg. suffix of "_synthetic" will make sheets with "TABLENAME_synthetic"
mode: A string signaling which mode of writing to use
(default) 'w': Write sheets to a new file, clearing any existing file that may exist
'a': Append new sheets within the existing file. Note: You cannot append data to existing sheets.
Additional context
We will add a number of local file handlers for different file types. Therefore the implementation of this class should also add a base class.
Optionally, the init, read and write functions can include a subset of arguments that the corresponding pandas functions use
if both the read and write for pandas are the same for a parameter (eg. decimal), then put it in the init.
We can ignore most of these parameters. Only add ones that seem impactful
The text was updated successfully, but these errors were encountered:
Problem Description
As a user, I'd like an streamlined way to load my data and metadata from files so that I can get right to using SDV.
Expected behavior
sdv.io
subpackage, add a folder calledlocal
ExcelHandler
__init__
Parameters
read
Functionality
Internally, reading should use the read_excel function from pandas. A few things should be hardcoded by default
Parameters
Returns
MultiTableMetadata
object that describes the datawrite
Functionality
Internally, writing should use the to_excel function from pandas. A few things should be hardcoded by default
Parameters
(required) synthetic_data: A dictionary that maps each table name to a pandas.DataFrame containing data from it
(required) file_name: The name of the excel file to write
sheet_name_suffix: A string with a suffix to add to each sheet name
mode: A string signaling which mode of writing to use
Additional context
The text was updated successfully, but these errors were encountered: