Skimming when all values are NA #666

elinw · 2021-07-01T10:38:25Z

Recently I came across a situation where all of the values of some variables were classed NA. In this case skimr

> df <- data.frame("x" = 1:10, "y" = NA   )
> df

── Data Summary ────────────────────────
                           Values
Name                       df    
Number of rows             10    
Number of columns          2     
_______________________          
Column type frequency:           
  logical                  1     
  numeric                  1     
________________________         
Group variables            None  

── Variable type: logical ───────────────────────────────────
  skim_variable n_missing complete_rate  mean count
1 y                    10             0   NaN ": " 

── Variable type: numeric ───────────────────────────────────
  skim_variable n_missing complete_rate  mean    sd    p0
1 x                     0             1   5.5  3.03     1
    p25   p50   p75  p100 hist 
1  3.25   5.5  7.75    10 ▇▇▇▇▇
> df <- data.frame("x" = 1:10, "y" = NA_integer_   )
> skimr::skim(df)
── Data Summary ────────────────────────
                           Values
Name                       df    
Number of rows             10    
Number of columns          2     
_______________________          
Column type frequency:           
  numeric                  2     
________________________         
Group variables            None  

── Variable type: numeric ───────────────────────────────────
  skim_variable n_missing complete_rate  mean    sd    p0
1 x                     0             1   5.5  3.03     1
2 y                    10             0 NaN   NA       NA
    p25   p50   p75  p100 hist   
1  3.25   5.5  7.75    10 "▇▇▇▇▇"
2 NA     NA   NA       NA " "    
>

I think the base columns are okay (n_missing, complte_rate) but probably we should not do the other statistics.
@michaelquinn32 thoughts?

The text was updated successfully, but these errors were encountered:

elinw · 2021-07-01T13:57:20Z

I guess it could be that we push the count to 0 so it works like the NA_NUMERIC case.

michaelquinn32 · 2021-07-07T15:35:28Z

I think the issue is primarily how we handle NA's in some of the summary stats that we include: count and hist. We could probably add some simple updates to check if all the data is NA, and if so, have them return NA_character_ too. How does that sound?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skimming when all values are NA #666

Skimming when all values are NA #666

elinw commented Jul 1, 2021

elinw commented Jul 1, 2021

michaelquinn32 commented Jul 7, 2021

Skimming when all values are NA #666

Skimming when all values are NA #666

Comments

elinw commented Jul 1, 2021

elinw commented Jul 1, 2021

michaelquinn32 commented Jul 7, 2021