-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update "Supporting Additional Objects" vignette #575
Conversation
Few changes: 1. Since sf objects have class = "sfc_GEOMETRYTYPE" "sfc", we can define sfl for "sfc" that will be called if there does not exist and sfl for "sfc_GEOMETRYTYPE". 2. I made the example a bit more relevant for sf objects, namely summarizing the CRS (projection) being used. I do think that sf objects should be used as an included sfl, but that could be because I spend a lot of time working with spatial objects ;). 3. There is mention of skim_by_type.sfc_MULTIPOLYGON, which I don't think needs to be in here unless I misunderstand skimr's code. The "skim_with" factory should already use the base sfl's for numeric, factors, characters, etc. and as you can see the skim_sf() works automatically without the skim_by_type function.
Codecov Report
@@ Coverage Diff @@
## develop #575 +/- ##
========================================
Coverage 96.09% 96.09%
========================================
Files 13 13
Lines 563 563
========================================
Hits 541 541
Misses 22 22 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this Kyle! This is a big improvement.
``` | ||
|
||
Unlike the example of having a new type of data in a column of a simple data | ||
frame in the "Using skimr" vignette, this is a different type of object | ||
with special attributes. | ||
|
||
In this object there is also a column of a class that does not have default | ||
skimmers. By default, skimr falls back to use the sfl for character variables. | ||
skimmers. By default, skimr falls back to use the sfl for character variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this whitespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is the only outstanding issue. After that we can merge.
the geometry column has two classes: 1st the specific geometry type (e.g. | ||
`sfc_MULTIPOLYGON` `sfc_LINESTRING`, `sfc_POLYGON`, `sfc_MULTIPOINT`) and 2nd the | ||
general sfc class. Skimr will try to find a sfl() helper function for the classes | ||
in the order they appear in class(.) (see S3 classes for more detail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quote this in backticks: class(.)
This is great! Thanks. |
``` | ||
|
||
Unlike the example of having a new type of data in a column of a simple data | ||
frame in the "Using skimr" vignette, this is a different type of object | ||
with special attributes. | ||
|
||
In this object there is also a column of a class that does not have default | ||
skimmers. By default, skimr falls back to use the sfl for character variables. | ||
skimmers. By default, skimr falls back to use the sfl for character variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is the only outstanding issue. After that we can merge.
[*Advanced R*](https://adv-r.hadley.nz/s3.html)). The following example will build | ||
support for `sfc`, which encompasses all `sf` objects: `sfc_MULTIPOLYGON` | ||
`sfc_LINESTRING`, `sfc_POLYGON`, `sfc_MULTIPOINT`. If we want custom skim_with | ||
functions we can write sfl() helper functions for the geometry type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errp. Sorry, one more function to escape: sfl()
.
``` | ||
|
||
** This seems unnecessary, but maybe I'm misunderstanding something. I think the skim_with function already creates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right on it being unnecessary. Mind removing it?
@elinw Is this ok with you if we drop this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is showing specifically what the new sfl does without the distraction of the rest of the output. Yes it needs to stay for the purposes of readability and pedagogy. Further, as noted, at least with the new (currently development) version of dplyr and other you can't see the handling of the sf column at all and this is the only way to see this.
** This seems unnecessary, but maybe I'm misunderstanding something. I think the skim_with function already creates | |
** This seems unnecessary, but maybe I'm misunderstanding something. I think the skim_with function already creates |
) | ||
) | ||
|
||
### This also seems unnecessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fYI I responded below about why it's ot correct to say it is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember that this is showing how to do this in your own package, where you need to export it and you don't want your users having to recreate it all the time.
There are really two separate issues, one is skimming a data frame or tibble that has some columns that are of non standard types. The other is skimming a different kind of object completely without coercing it to be a data frame and losing crucial information. When I wrote the vignette I went through the whole process multiple times and found that I needed this step. Just to expand, and maybe the text is not clear on this, but it is complex and confusing if you haven't thought through it. To skim a column of a particular type in a data frame you use To skim an object of a class that is not data frame without coercing it to be a data frame and actually a tibble you need to create a It does not help that in simple features they have the same name. |
I think I'm confused by the skim_by_type.class() function. Is the key purpose to: (1) grab information from the object and then (2) coerce to data frame/tibble and skim? Do you have any examples you worked through when writing the vignette? I imagine the problem would be if sf objects, for example, stored the projection not in the column, but in the entire object. Also, the vignette says "Creating a function that is a method of the skim_by_type generic for the data type allows skimming of an entire data frame that contains some columns of that type" which I do not think is matching with your description, so I'll try to rewrite that when I figure out the skim_by_type. |
The whole vignette is one big example. But here's something interesting with the new dplyr v1 |
One other question. Is there any situation in which the different geometry types would have different statistics? I know you can only have one |
Hi, I'm closing this in favor of #603 A few notes. I think that one difference between what we needed in this vignette and what we need now is how that I still don't know if there is the possibility of different functions for different geometries but I'll ignore that right now and just use the general Thanks @kylebutts , we would not have gotten this done without you! |
@kylebutts If there were a basic set of functions for an sfc sfl what would they be? Also could you answer the question about if there are potentially different functions for different sfc types? |
I think the two main functions are I was thinking you may want sf::st_area or sf::st_length for polygons and lines respectively. These would depend on the sfc type, but could be a bit time consuming and not something you probably are curious about when you |
Few changes:
Since sf objects have class = "sfc_GEOMETRYTYPE" "sfc", we can define sfl for "sfc" that will be called if there does not exist and sfl for "sfc_GEOMETRYTYPE".
I made the example a bit more relevant for sf objects, namely summarizing the CRS (projection) being used. I do think that sf objects should be used as an included sfl, but that could be because I spend a lot of time working with spatial objects ;).
There is mention of skim_by_type.sfc_MULTIPOLYGON, which I don't think needs to be in here unless I misunderstand skimr's code. The "skim_with" factory should already use the base sfl's for numeric, factors, characters, etc. and as you can see the skim_sf() works automatically without the skim_by_type function.