Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

be flexible in number of feature-function outputs #43

Open
jvdd opened this issue Nov 16, 2021 · 2 comments
Open

be flexible in number of feature-function outputs #43

jvdd opened this issue Nov 16, 2021 · 2 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@jvdd
Copy link
Member

jvdd commented Nov 16, 2021

It is okay to be flexible in # of feature-function outputs, as long as it is consistent (which is a constraint that we already impose; e.g., hist4 => always returns 4 values).
=> We could omit the burden that we put on the user to always pass output_names when # function outputs > 1

If the number of outputs > 1 & no output_names is given => just add post_fixes (e.g. _1, _2, ...) to the function name as feature names

IMO this is the best of both worlds;

  • Users can still add more interpretable output names if they prefer to
  • But they are not obligated to do so :) (will avoid a lot of wrapping functions in FuncWrapper)

A drawback of this is that we lose the deterministic behavior (in number of outputs) that we currently have.

@jvdd jvdd added the enhancement New feature or request label Nov 16, 2021
@jvdd
Copy link
Member Author

jvdd commented Nov 16, 2021

What do you think @jonasvdd @emield12 ?

@jonasvdd jonasvdd added the question Further information is requested label Nov 16, 2021
@emield12
Copy link
Contributor

Great suggestion!
I just think it's a low-priority feature, as I think the added value is not huge.
I don't see wrapping functions with a FuncWrapper and passing output_names as a big burden 🙃.

On the other hand, I don't really understand the drawback you mention. Is this really a drawback? I see two possible ways of a non-deterministic amount of output features.

  • The # of output features depends on function parameters. I don't really see this one as a problem, if you serialize your pipeline those function parameters will also be the same.
  • The # of output features depends on the data. This is an issue and should not be allowed, IMHO. But I think this will throw an error now anyway, no? (Ohh maybe I see an issue here... If you have dataset A that always results in 3 output features and dataset B always in 4, you will not see any issue. Until you use dataset C which combines data of A & B... But I consider this as a very low probability edge case, I think that if we provide a clear error then, this will be alright)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants