-
-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Effective sample size per row, not total sample size in report_sample #306
Comments
One solution (that does not yet has your nice lay-out) is to summarize as follows: df.sum <- df %>% |
Thanks for the suggestion @Lakens. On it! |
Reprex of example above: library(dplyr, warn.conflicts = FALSE)
df.sum <- airquality %>%
select(Ozone, Solar.R) %>%
summarise_all(funs(min = min(., na.rm = TRUE),
median = median(., na.rm = TRUE),
max = max(., na.rm = TRUE),
mean = mean(., na.rm = TRUE),
sd = sd(., na.rm = TRUE),
n = sum(!is.na(.))))
#> Warning: `funs()` was deprecated in dplyr 0.8.0.
#> ℹ Please use a list of either functions or lambdas:
#>
#> # Simple named list: list(mean = mean, median = median)
#>
#> # Auto named with `tibble::lst()`: tibble::lst(mean, median)
#>
#> # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
df.sum
#> Ozone_min Solar.R_min Ozone_median Solar.R_median Ozone_max Solar.R_max
#> 1 1 7 31.5 205 168 334
#> Ozone_mean Solar.R_mean Ozone_sd Solar.R_sd Ozone_n Solar.R_n
#> 1 42.12931 185.9315 32.98788 90.05842 116 146 Created on 2022-12-12 with reprex v2.0.2 |
Were you thinking of something like this @Lakens? devtools::load_all("D:/github/forks/report")
#> ℹ Loading report
report_sample(airquality, effective_n = TRUE)
#> # Descriptive Statistics
#>
#> Variable | Summary | Effective n
#> ------------------------------------------------
#> Mean Ozone (SD) | 42.13 (32.99) | 116
#> Mean Solar.R (SD) | 185.93 (90.06) | 146
#> Mean Wind (SD) | 9.96 (3.52) | 153
#> Mean Temp (SD) | 77.88 (9.47) | 153
#> Mean Month (SD) | 6.99 (1.42) | 153
#> Mean Day (SD) | 15.80 (8.86) | 153 Created on 2022-12-12 with reprex v2.0.2 It's a bit more challenging when using groups since groups won't have the same n for the same rows. This is current behaviour: library(report)
report_sample(airquality, group_by = "Month")
#> # Descriptive Statistics
#>
#> Variable | 5 (n=31) | 6 (n=30) | 7 (n=31) | 8 (n=31) | 9 (n=30) | Total (n=153)
#> ------------------------------------------------------------------------------------------------------------------------
#> Mean Ozone (SD) | 23.62 (22.22) | 29.44 (18.21) | 59.12 (31.64) | 59.96 (39.68) | 31.45 (24.14) | 42.13 (32.99)
#> Mean Solar.R (SD) | 181.30 (115.08) | 190.17 (92.88) | 216.48 (80.57) | 171.86 (76.83) | 167.43 (79.12) | 185.93 (90.06)
#> Mean Wind (SD) | 11.62 (3.53) | 10.27 (3.77) | 8.94 (3.04) | 8.79 (3.23) | 10.18 (3.46) | 9.96 (3.52)
#> Mean Temp (SD) | 65.55 (6.85) | 79.10 (6.60) | 83.90 (4.32) | 83.97 (6.59) | 76.90 (8.36) | 77.88 (9.47)
#> Mean Day (SD) | 16.00 (9.09) | 15.50 (8.80) | 16.00 (9.09) | 16.00 (9.09) | 15.50 (8.80) | 15.80 (8.86) Created on 2022-12-12 with reprex v2.0.2 One possibility would be to double the number of columns by adding an effective n column for each group. Another possibility would be to include that info as a third value in each cell. What do you think would be best? |
Hi, the first table is perfect (I would just call it n, not ;'Effective n'). For the second table adding it as a third value to each cell seems the best approach. If you make it optional (even opt-in), it would not interfere too much with the table if there are no missing values. In my data, there is attrition, so showing later questions have lower n is important. Thanks for picking this up so quickly! Love the functions! |
Ok what about this? devtools::load_all("D:/github/forks/report")
#> ℹ Loading report
report_sample(airquality)
#> # Descriptive Statistics
#>
#> Variable | Summary
#> ----------------------------------
#> Mean Ozone (SD) | 42.13 (32.99)
#> Mean Solar.R (SD) | 185.93 (90.06)
#> Mean Wind (SD) | 9.96 (3.52)
#> Mean Temp (SD) | 77.88 (9.47)
#> Mean Month (SD) | 6.99 (1.42)
#> Mean Day (SD) | 15.80 (8.86)
report_sample(airquality, n = TRUE)
#> # Descriptive Statistics
#>
#> Variable | Summary
#> ------------------------------------------
#> Mean Ozone (SD, n) | 42.13 (32.99, 116)
#> Mean Solar.R (SD, n) | 185.93 (90.06, 146)
#> Mean Wind (SD, n) | 9.96 (3.52, 153)
#> Mean Temp (SD, n) | 77.88 (9.47, 153)
#> Mean Month (SD, n) | 6.99 (1.42, 153)
#> Mean Day (SD, n) | 15.80 (8.86, 153)
report_sample(airquality, group_by = "Month")
#> # Descriptive Statistics
#>
#> Variable | 5 (n=31) | 6 (n=30) | 7 (n=31) | 8 (n=31) | 9 (n=30) | Total (n=153)
#> ------------------------------------------------------------------------------------------------------------------------
#> Mean Ozone (SD) | 23.62 (22.22) | 29.44 (18.21) | 59.12 (31.64) | 59.96 (39.68) | 31.45 (24.14) | 42.13 (32.99)
#> Mean Solar.R (SD) | 181.30 (115.08) | 190.17 (92.88) | 216.48 (80.57) | 171.86 (76.83) | 167.43 (79.12) | 185.93 (90.06)
#> Mean Wind (SD) | 11.62 (3.53) | 10.27 (3.77) | 8.94 (3.04) | 8.79 (3.23) | 10.18 (3.46) | 9.96 (3.52)
#> Mean Temp (SD) | 65.55 (6.85) | 79.10 (6.60) | 83.90 (4.32) | 83.97 (6.59) | 76.90 (8.36) | 77.88 (9.47)
#> Mean Day (SD) | 16.00 (9.09) | 15.50 (8.80) | 16.00 (9.09) | 16.00 (9.09) | 15.50 (8.80) | 15.80 (8.86)
report_sample(airquality, group_by = "Month", n = TRUE)
#> # Descriptive Statistics
#>
#> Variable | 5 (n=31) | 6 (n=30) | 7 (n=31) | 8 (n=31) | 9 (n=30) | Total (n=153)
#> ----------------------------------------------------------------------------------------------------------------------------------------------------
#> Mean Ozone (SD, n) | 23.62 (22.22, 26) | 29.44 (18.21, 9) | 59.12 (31.64, 26) | 59.96 (39.68, 26) | 31.45 (24.14, 29) | 42.13 (32.99, 116)
#> Mean Solar.R (SD, n) | 181.30 (115.08, 27) | 190.17 (92.88, 30) | 216.48 (80.57, 31) | 171.86 (76.83, 28) | 167.43 (79.12, 30) | 185.93 (90.06, 146)
#> Mean Wind (SD, n) | 11.62 (3.53, 31) | 10.27 (3.77, 30) | 8.94 (3.04, 31) | 8.79 (3.23, 31) | 10.18 (3.46, 30) | 9.96 (3.52, 153)
#> Mean Temp (SD, n) | 65.55 (6.85, 31) | 79.10 (6.60, 30) | 83.90 (4.32, 31) | 83.97 (6.59, 31) | 76.90 (8.36, 30) | 77.88 (9.47, 153)
#> Mean Day (SD, n) | 16.00 (9.09, 31) | 15.50 (8.80, 30) | 16.00 (9.09, 31) | 16.00 (9.09, 31) | 15.50 (8.80, 30) | 15.80 (8.86, 153) I also realize that there is a legacy report_sample(airquality, group_by = "Month", total = FALSE)
#> # Descriptive Statistics
#>
#> Variable | 5 (n=31) | 6 (n=30) | 7 (n=31) | 8 (n=31) | 9 (n=30) (n=153)
#> ---------------------------------------------------------------------------------------------------------
#> Mean Ozone (SD) | 23.62 (22.22) | 29.44 (18.21) | 59.12 (31.64) | 59.96 (39.68) | 31.45 (24.14)
#> Mean Solar.R (SD) | 181.30 (115.08) | 190.17 (92.88) | 216.48 (80.57) | 171.86 (76.83) | 167.43 (79.12)
#> Mean Wind (SD) | 11.62 (3.53) | 10.27 (3.77) | 8.94 (3.04) | 8.79 (3.23) | 10.18 (3.46)
#> Mean Temp (SD) | 65.55 (6.85) | 79.10 (6.60) | 83.90 (4.32) | 83.97 (6.59) | 76.90 (8.36)
#> Mean Day (SD) | 16.00 (9.09) | 15.50 (8.80) | 16.00 (9.09) | 16.00 (9.09) | 15.50 (8.80) Created on 2022-12-12 with reprex v2.0.2 |
Lovely! This is exactly the behavior I would think people find useful! |
devtools::load_all("D:/github/forks/report")
#> ℹ Loading report
report_sample(airquality, n = TRUE)
#> # Descriptive Statistics
#>
#> Variable | Summary
#> ------------------------------------------
#> Mean Ozone (SD), n | 42.13 (32.99), 116
#> Mean Solar.R (SD), n | 185.93 (90.06), 146
#> Mean Wind (SD), n | 9.96 (3.52), 153
#> Mean Temp (SD), n | 77.88 (9.47), 153
#> Mean Month (SD), n | 6.99 (1.42), 153
#> Mean Day (SD), n | 15.80 (8.86), 153
report_sample(airquality, group_by = "Month", n = TRUE)
#> # Descriptive Statistics
#>
#> Variable | 5 (n=31) | 6 (n=30) | 7 (n=31) | 8 (n=31) | 9 (n=30) | Total (n=153)
#> ----------------------------------------------------------------------------------------------------------------------------------------------------
#> Mean Ozone (SD), n | 23.62 (22.22), 26 | 29.44 (18.21), 9 | 59.12 (31.64), 26 | 59.96 (39.68), 26 | 31.45 (24.14), 29 | 42.13 (32.99), 116
#> Mean Solar.R (SD), n | 181.30 (115.08), 27 | 190.17 (92.88), 30 | 216.48 (80.57), 31 | 171.86 (76.83), 28 | 167.43 (79.12), 30 | 185.93 (90.06), 146
#> Mean Wind (SD), n | 11.62 (3.53), 31 | 10.27 (3.77), 30 | 8.94 (3.04), 31 | 8.79 (3.23), 31 | 10.18 (3.46), 30 | 9.96 (3.52), 153
#> Mean Temp (SD), n | 65.55 (6.85), 31 | 79.10 (6.60), 30 | 83.90 (4.32), 31 | 83.97 (6.59), 31 | 76.90 (8.36), 30 | 77.88 (9.47), 153
#> Mean Day (SD), n | 16.00 (9.09), 31 | 15.50 (8.80), 30 | 16.00 (9.09), 31 | 16.00 (9.09), 31 | 15.50 (8.80), 30 | 15.80 (8.86), 153 Besides, the Created on 2022-12-12 with reprex v2.0.2 |
This looks perfect to me! Amazingly fast response - impressive. And I am confident this well be useful for many. Love the work you are doing on easystats! |
* report_sample: add effective n (closes #306) * remove pipe in vignette
* report_sample: add effective n (closes #306) * Addresses #309 part 1: add type and rules to chi2 objects * Add tests + styler * remove duplicate author in DESCRIPTION * Harmonize snapshot testing with OS platform variant. * styler * Run tests only on Windows closes #312 * Use devel effectsize * run only once a week [skip ci] * Rerun snapshot tests on Windows with latest version of effectsize * change snapshots variant = .Platform$OS.type to 'windows', styler, lints Co-authored-by: Indrajeet Patil <patilindrajeet.science@gmail.com>
Instead of adding a total N on top of the table, add a column at the end (after total, or as part of total) reporting the effective n for each row. If there is missing data, it is good to see how many observations underlie the means in the table.
The text was updated successfully, but these errors were encountered: