-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples for higher dimension data #52
Comments
Another possible xarray structure, that may be easier:
|
Hi @mmann1123, thanks for bringing this up. The documentation is definitely missing examples for geospatial data as I myself use xarray for very different kinds of data, so this is something I'm guessing most of the user base would appreciate. In your example, after loading the DataArray like this:
the recommended way is to create a non-dimensional coordinate with the target values. Here's a very simple mock target where pixels with green channel values above 128 are assumed to be forest, the rest water:
Now, for compatibility with sklearn, you need to stack all sample dimensions to get a 2D array:
The
and finally you can fit an estimator:
You can then use the fitted estimator for prediction. Unfortunately, the output of the prediction are the labels from the
Finally, all you have to do is unstack the sample dimension again:
|
I would like to include this as an example in the docs pretty much verbatim. However, the |
Adding ability to handle multidimensional sample by prestacking along those dims. PASSES TESTS etc NOTE: I think this is working well for DataArrays but not sure about DataSets since I don't know what they should look like. referencing issue phausamann#52
Hey @phausamann thanks for the response to this! In the meantime was was futzing with Featurizer and will propose a change (mmann1123:patch-1) to handle multidimensional samples (quite sure it is working for DataArrays but not quite sure about DataSets (not sure what the output should look like). I have been using geowombat pretty much exclusively for remotely sensed data. I actually tried to use xarray to create a dummy dataset but in the same format and failed. I will try to contact Jordan about an example for you to work with. |
@phausamann I just pushed a Stackerizer class to (mmann1123:patch-1) please read the comments. Also, in a related issue, I am finding that I can't use the Stackerizer inside of the pipeline, BUT it works fine if I run it before the pipeline. Just wondering what I'm missing
returns
Let me know if you want me to push Stackerizer to a different branch. |
I think I know what the issue is: you are creating a With estimators wrapped by sklearn-xarray, it is not necessary to assign the target beforehand, the wrapped estimator will automatically take care of this. So instead of:
do this:
The assignment ( Probably another thing to clarify in the documentation. |
I'll look into it more but if I drop (X) i get:
|
When you define the target like this:
it works. This tells the Here's the pipeline I've used:
|
Ok great. That is super helpful as always. Much appreciated.
…On Fri, Aug 14, 2020 at 4:09 AM Peter Hausamann ***@***.***> wrote:
When you define the target like this:
y = Target(
coord="land_use",
transform_func=LabelBinarizer().fit_transform,
dim="sample",
)
it works.
This tells the Target to reduce multi-dimensional coordinates to the
sample dimension, yielding the 1D input that the LabelBinarizer expects.
It is weird though that you get an error without that, I'll have to
investigate what's going on there.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#52 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHR6VFV6QD2J7YY32E4DFLSATWKNANCNFSM4PX3LFOQ>
.
|
I think that sklearn-x could be a significant contribution to folks working on satellite remote sensing. I am REALLY excited by it!
I am however have a hard time figure out how to apply it in higher dim xarray structures. I am hoping that more examples using ideally geospatial data could be provided.
Possible format for example
A typical land cover classification where 'target' must be predicted as f(blue, green, red, nir) across time on a pixel by pixel basis.
rgbn_20160101 = image from satellite stored as array for jan 1 2016
['blue', 'green', 'red','nir'] = each image has four bands the three visible bands, and near infrared
['t1', 't2','t3'] = three images are provided across time for the same coordinates
['y,'x'] = latitude and longitude of each pixel
I would normally flatten these array using a multi-index as follows and store them as a pd.dataframe:
Then fit a model Target = f(red,green,blue, nir) on a subset of pixels where I have "target" land use clases, and predict back to all red green blue nir bands in the full data set.
It looks as if sk-xr can handle this, but it isn't clear to me how to structure the xarray to create a panel of longitudinal data (pixel values over time).
The text was updated successfully, but these errors were encountered: