You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just getting ramped up with the library, so I am sure I am not going down the recommended path, but one issue my brother has had with analyzing data in R is splitting that data into different subsets to run basic models on.
I.E., given a dataset with "price", "A", "B", "C", "D", "E" run linear reggression to predict price using only A, A and B, A and B and C, ... B and E... and so on.
This is what I came up with for a first draft of splitting a dataset into all the relevant combinations
The issue is that cf/categorical - which I included as part of the test to encode the "stock" field in the example data as a number - will return nil if there are no categorical columns in the dataset. It is documented to "Return a dataset containing only the categorical columns.", so I expected it to return an empty dataset if there were no matching columns.
( I didn't really read the docstring first I just assumed)
So the working version of this function ended up needing this workaround
Interesting. Makes sense. Another option would be to first do a group-by and then do combo on the keys. Agreed that it could return empty dataset. Filter may be just as fast in most cases as compared to a grouping and the concat steps.
Hmm, empty dataset and nil should work the same. I wonder if it wouldn't be better to update lots of dataset functions so that nil and an empty dataset return the same value. row-count, column-count, columns, etc. all should be safe to call on nil -- I have hit that before in a few cases.
Context
Just getting ramped up with the library, so I am sure I am not going down the recommended path, but one issue my brother has had with analyzing data in R is splitting that data into different subsets to run basic models on.
I.E., given a dataset with "price", "A", "B", "C", "D", "E" run linear reggression to predict price using only A, A and B, A and B and C, ... B and E... and so on.
This is what I came up with for a first draft of splitting a dataset into all the relevant combinations
What went wrong?
The issue is that
cf/categorical
- which I included as part of the test to encode the "stock" field in the example data as a number - will returnnil
if there are no categorical columns in the dataset. It is documented to"Return a dataset containing only the categorical columns."
, so I expected it to return an empty dataset if there were no matching columns.( I didn't really read the docstring first I just assumed)
So the working version of this function ended up needing this workaround
The text was updated successfully, but these errors were encountered: