Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add piping format from/to shell copy commands #64

Merged
merged 9 commits into from
Jan 28, 2023
Merged

add piping format from/to shell copy commands #64

merged 9 commits into from
Jan 28, 2023

Conversation

leo-schick
Copy link
Member

See #56

mara_db/shell.py Outdated Show resolved Hide resolved
Copy link
Member

@jankatins jankatins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to make this a bit cleaner: Given that you want to use Format from this onwards and the old way should get deprecated, I would:

  • Add a "NativeFormat" which takes no options
  • Make all users error if format and any of the old ones are given
  • If the old ones are given: create a corresponding Format (maybe using some helper in format.py)
  • Handle the new Format way in the actual commandline generation

That way at some point, we can add a warn into the old way and at some point remove it completely...

mara_db/shell.py Outdated Show resolved Hide resolved
mara_db/shell.py Outdated Show resolved Hide resolved
mara_db/shell.py Outdated Show resolved Hide resolved
mara_db/shell.py Outdated Show resolved Hide resolved
mara_db/shell.py Outdated Show resolved Hide resolved
mara_db/format.py Outdated Show resolved Hide resolved
@leo-schick
Copy link
Member Author

@jankatins You are right about what you write. But this might lead into breaking changes.

I was quite cautious with the changes I did to not make any breaking change. That resulted (unfortunately) into ugly code. e.g.

# ...
    elif isinstance(pipe_format, CsvFormat):
        csv_format = True
        if delimiter_char is None:
# ...

The purpose of this was to make it work that when someone uses the old and new logic combined. For example: someone upgrades their mara_db package and uses the copy command and patches manually the copy_to_stdout_command method with some custom logic (e.g. to use https://github.com/kfzteile24/sqpy). I feared that this would break at their side then.

There are already so many copy_command implementations which - for me - are hard to understand since they use often the default parameter (header=None, csv_format=None etc.) which is pretty hard to reingeneer since other databases use other defaults when passing None.
E.g. the default delimiter_char for postgres is \t, for SQL Server it is ,, but for MySQL it is not even defined and would through an ValueError exception when you use it.
A proper implementation for delimiter_char for MySQL would have been that (lets say the default for MySQL is ,) the method raises the ValueError only when the delimiter_char is not ,. That is the way SnowflakeDB is implemented.

I would like to do these changes mentioned above, but a) maybe in the next major release where we officially discontinue or drop the old parameters / or b) we implement unit tests for each database engines - especially for the copy_command method to make sure it does not break somewhere.

Another option would be to put this change from the start into a newer major version which is incompatible with older versions of mara-pipelines. This is something I tried to avoid.

@jankatins
Copy link
Member

jankatins commented Sep 30, 2022

The purpose of this was to make it work that when someone uses the old and new logic combined. For example: someone upgrades their mara_db package and uses the copy command and patches manually the copy_to_stdout_command method with some custom logic (e.g. to use https://github.com/kfzteile24/sqpy). I feared that this would break at their side then.

From my point of view, this looks like a lot of complexity you are introducing for an edge case I would say should be taken care by a changelog entry: "If you do this ..., you have to change it to ...". Which points to a major release, as it is a breaking change.

I know that we used a custom athena integration: create min cli command which spits out data to stdout, create a new mara DB class, create and register the copy commands for this class, done... These will break, because of the additional pipe_format argument in copy_command(...., pipe_format: Format = None). So anyone tinkering with that API already needs to fix their code.

@leo-schick
Copy link
Member Author

hmm... yea, you're right. I will redesign this implementation ...

@leo-schick
Copy link
Member Author

@jankatins I redesigned the solution now. Please reivew.

mara_db/shell.py Outdated Show resolved Hide resolved
@leo-schick leo-schick merged commit 1090d22 into main Jan 28, 2023
@leo-schick leo-schick deleted the issue-56 branch January 28, 2023 12:26
@leo-schick
Copy link
Member Author

leo-schick commented Jan 29, 2023

@jankatins You wrote:

I know that we used a custom athena integration: create min cli command which spits out data to stdout, create a new mara DB class, create and register the copy commands for this class, done...

Why not creating a PR which adds Athena support for the mara_db package? Then we can make sure that the code works even with the upcoming breaking change. Or is there something private in it you don’t wanna share?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants