Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update "Supporting Additional Objects" vignette #575

Closed
wants to merge 3 commits into from
Closed

Update "Supporting Additional Objects" vignette #575

wants to merge 3 commits into from

Conversation

kylebutts
Copy link
Contributor

Few changes:

  1. Since sf objects have class = "sfc_GEOMETRYTYPE" "sfc", we can define sfl for "sfc" that will be called if there does not exist and sfl for "sfc_GEOMETRYTYPE".

  2. I made the example a bit more relevant for sf objects, namely summarizing the CRS (projection) being used. I do think that sf objects should be used as an included sfl, but that could be because I spend a lot of time working with spatial objects ;).

  3. There is mention of skim_by_type.sfc_MULTIPOLYGON, which I don't think needs to be in here unless I misunderstand skimr's code. The "skim_with" factory should already use the base sfl's for numeric, factors, characters, etc. and as you can see the skim_sf() works automatically without the skim_by_type function.

Few changes:
1. Since sf objects have class = "sfc_GEOMETRYTYPE" "sfc", we can define sfl for "sfc" that will be called if there does not exist and sfl for "sfc_GEOMETRYTYPE".

2. I made the example a bit more relevant for sf objects, namely summarizing the CRS (projection) being used. I do think that sf objects should be used as an included sfl, but that could be because I spend a lot of time working with spatial objects ;). 

3. There is mention of skim_by_type.sfc_MULTIPOLYGON, which I don't think needs to be in here unless I misunderstand skimr's code. The "skim_with" factory should already use the base sfl's for numeric, factors, characters, etc. and as you can see the skim_sf() works automatically without the skim_by_type function.
@kylebutts kylebutts mentioned this pull request Mar 2, 2020
@codecov-io
Copy link

codecov-io commented Mar 2, 2020

Codecov Report

Merging #575 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #575   +/-   ##
========================================
  Coverage    96.09%   96.09%           
========================================
  Files           13       13           
  Lines          563      563           
========================================
  Hits           541      541           
  Misses          22       22

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3aed1f...1d24787. Read the comment docs.

Copy link
Collaborator

@michaelquinn32 michaelquinn32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this Kyle! This is a big improvement.

```

Unlike the example of having a new type of data in a column of a simple data
frame in the "Using skimr" vignette, this is a different type of object
with special attributes.

In this object there is also a column of a class that does not have default
skimmers. By default, skimr falls back to use the sfl for character variables.
skimmers. By default, skimr falls back to use the sfl for character variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this whitespace?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the only outstanding issue. After that we can merge.

the geometry column has two classes: 1st the specific geometry type (e.g.
`sfc_MULTIPOLYGON` `sfc_LINESTRING`, `sfc_POLYGON`, `sfc_MULTIPOINT`) and 2nd the
general sfc class. Skimr will try to find a sfl() helper function for the classes
in the order they appear in class(.) (see S3 classes for more detail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote this in backticks: class(.)

vignettes/Supporting_additional_objects.Rmd Outdated Show resolved Hide resolved
vignettes/Supporting_additional_objects.Rmd Outdated Show resolved Hide resolved
@elinw
Copy link
Collaborator

elinw commented Mar 9, 2020

This is great! Thanks.

```

Unlike the example of having a new type of data in a column of a simple data
frame in the "Using skimr" vignette, this is a different type of object
with special attributes.

In this object there is also a column of a class that does not have default
skimmers. By default, skimr falls back to use the sfl for character variables.
skimmers. By default, skimr falls back to use the sfl for character variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the only outstanding issue. After that we can merge.

[*Advanced R*](https://adv-r.hadley.nz/s3.html)). The following example will build
support for `sfc`, which encompasses all `sf` objects: `sfc_MULTIPOLYGON`
`sfc_LINESTRING`, `sfc_POLYGON`, `sfc_MULTIPOINT`. If we want custom skim_with
functions we can write sfl() helper functions for the geometry type.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errp. Sorry, one more function to escape: sfl().

```

** This seems unnecessary, but maybe I'm misunderstanding something. I think the skim_with function already creates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right on it being unnecessary. Mind removing it?

@elinw Is this ok with you if we drop this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is showing specifically what the new sfl does without the distraction of the rest of the output. Yes it needs to stay for the purposes of readability and pedagogy. Further, as noted, at least with the new (currently development) version of dplyr and other you can't see the handling of the sf column at all and this is the only way to see this.

Suggested change
** This seems unnecessary, but maybe I'm misunderstanding something. I think the skim_with function already creates
** This seems unnecessary, but maybe I'm misunderstanding something. I think the skim_with function already creates

)
)

### This also seems unnecessary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fYI I responded below about why it's ot correct to say it is unnecessary.

Copy link
Collaborator

@elinw elinw Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that this is showing how to do this in your own package, where you need to export it and you don't want your users having to recreate it all the time.

@elinw
Copy link
Collaborator

elinw commented Mar 13, 2020

There are really two separate issues, one is skimming a data frame or tibble that has some columns that are of non standard types. The other is skimming a different kind of object completely without coercing it to be a data frame and losing crucial information. When I wrote the vignette I went through the whole process multiple times and found that I needed this step.

Just to expand, and maybe the text is not clear on this, but it is complex and confusing if you haven't thought through it.

To skim a column of a particular type in a data frame you use skim_with().

To skim an object of a class that is not data frame without coercing it to be a data frame and actually a tibble you need to create a skim_by_type.class() for that kind of object. You would do this because a simple features object that is coerced to a data frame/tibble is going to lose crucial information.

It does not help that in simple features they have the same name.

@kylebutts
Copy link
Contributor Author

I think I'm confused by the skim_by_type.class() function. Is the key purpose to: (1) grab information from the object and then (2) coerce to data frame/tibble and skim? Do you have any examples you worked through when writing the vignette? I imagine the problem would be if sf objects, for example, stored the projection not in the column, but in the entire object.

Also, the vignette says "Creating a function that is a method of the skim_by_type generic for the data type allows skimming of an entire data frame that contains some columns of that type" which I do not think is matching with your description, so I'll try to rewrite that when I figure out the skim_by_type.

@elinw
Copy link
Collaborator

elinw commented Apr 4, 2020

The whole vignette is one big example. But here's something interesting with the new dplyr v1 groups(nc) returns geometry and ungrouping does not seem to help.

@elinw
Copy link
Collaborator

elinw commented Apr 4, 2020

One other question. Is there any situation in which the different geometry types would have different statistics? I know you can only have one sf column per sf object but say I am a package developer and I want to support any kind of geometry that a user throws at me, would I possible have different sfls for them?

@elinw
Copy link
Collaborator

elinw commented Jul 5, 2020

Hi, I'm closing this in favor of #603

A few notes. I think that one difference between what we needed in this vignette and what we need now is how that sf now is handled as a data frame when it first starts being skimmed. Therefore we no longer need to have something for handling a non data.frame object and are not coercing. So what that means is that this vignette is really moving toward being Using skimr in a Package. It's not quite there yet, but I think what I have is better.

I still don't know if there is the possibility of different functions for different geometries but I'll ignore that right now and just use the general sfc.

Thanks @kylebutts , we would not have gotten this done without you!

@elinw elinw closed this Jul 5, 2020
@elinw
Copy link
Collaborator

elinw commented Jul 5, 2020

@kylebutts If there were a basic set of functions for an sfc sfl what would they be? Also could you answer the question about if there are potentially different functions for different sfc types?

@kylebutts kylebutts deleted the develop branch July 6, 2020 16:51
@kylebutts
Copy link
Contributor Author

I think the two main functions are sf::st_crs() which will get the current projection of the sfc column and sf::st_is_valid() which tells you how many rows of the sfc column are valid geometries.

I was thinking you may want sf::st_area or sf::st_length for polygons and lines respectively. These would depend on the sfc type, but could be a bit time consuming and not something you probably are curious about when you skim().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants