Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace tibbles with data frames to improve performance #1007

Merged
merged 12 commits into from
Sep 27, 2022
Merged

Replace tibbles with data frames to improve performance #1007

merged 12 commits into from
Sep 27, 2022

Conversation

IndrajeetPatil
Copy link
Collaborator

@IndrajeetPatil IndrajeetPatil commented Sep 26, 2022

#759 (comment)

Need to use continuous benchmarking, so not converting this to a draft.

@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2022

Codecov Report

Merging #1007 (fa98f9c) into main (1f4437b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head fa98f9c differs from pull request most recent head 94e30f8. Consider uploading reports for the commit 94e30f8 to get more accurate results

@@           Coverage Diff           @@
##             main    #1007   +/-   ##
=======================================
  Coverage   91.14%   91.14%           
=======================================
  Files          46       46           
  Lines        2664     2665    +1     
=======================================
+ Hits         2428     2429    +1     
  Misses        236      236           
Impacted Files Coverage Δ
R/nested-to-tree.R 92.85% <ø> (ø)
R/style-guides.R 99.43% <ø> (ø)
R/stylerignore.R 100.00% <ø> (ø)
R/token-define.R 66.66% <ø> (ø)
R/ui-styling.R 100.00% <ø> (ø)
R/compat-dplyr.R 92.85% <100.00%> (ø)
R/compat-tidyr.R 100.00% <100.00%> (ø)
R/nest.R 100.00% <100.00%> (ø)
R/parse.R 88.09% <100.00%> (ø)
R/token-create.R 96.92% <100.00%> (ø)
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 7c691c9 is merged into main:

  • ❗🐌cache_applying: 27.1ms -> 31.6ms [+15.61%, +16.95%]
  •   :rocket:cache_recording: 1.26s -> 875ms [-31.19%, -30%]
  •   :rocket:without_cache: 3.33s -> 2.2s [-34.28%, -33.43%]

Further explanation regarding interpretation and methodology can be found in the documentation.

@IndrajeetPatil
Copy link
Collaborator Author

@lorenzwalthert, @krlmlr That's quite the bump in performance when switching to data frames instead of tibbles as our data structure of choice! 😮

@lorenzwalthert
Copy link
Collaborator

Wow yes @IndrajeetPatil and without much code change even! Well done. Now only hurdle is to make it pass on old releases...

@MichaelChirico
Copy link
Contributor

wow, quite impressive speed improvement per LoC change!!

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice speedup! Can we encapsulate the choice of data structure in helper functions? We could add e.g. new_styler_df() and styler_df() that use vctrs::new_data_frame() and vctrs::data_frame() under the hood. This means that we could later change the underlying data structure with lesser effort.

Do we still need to import tibble?

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 63a27d0 is merged into main:

  • ❗🐌cache_applying: 37.9ms -> 41.6ms [+6.86%, +12.7%]
  •   :rocket:cache_recording: 1.91s -> 1.28s [-34.03%, -31.47%]
  •   :rocket:without_cache: 5.24s -> 3.33s [-37.36%, -35.55%]

Further explanation regarding interpretation and methodology can be found in the documentation.

@IndrajeetPatil IndrajeetPatil changed the title Check for performance improvements with data.frame Replace tibbles with data frames to improve performance Sep 27, 2022
@IndrajeetPatil
Copy link
Collaborator Author

Nice speedup! Can we encapsulate the choice of data structure in helper functions? We could add e.g. new_styler_df() and styler_df() that use vctrs::new_data_frame() and vctrs::data_frame() under the hood. This means that we could later change the underlying data structure with lesser effort.

Good idea. Done!

Do we still need to import tibble?

Only for tibble::tribble(), which we use in a few places. But, if we wish to get rid of {tibble} from imports, removing this function's usage should be easy to do. Should I do that?

@krlmlr
Copy link
Member

krlmlr commented Sep 27, 2022

Let's keep the tribble() call for now.

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if a0d6c20 is merged into main:

  • ❗🐌cache_applying: 26.8ms -> 30.8ms [+13.87%, +15.99%]
  •   :rocket:cache_recording: 1.25s -> 824ms [-34.47%, -33.66%]
  •   :rocket:without_cache: 3.32s -> 2.08s [-37.6%, -36.8%]

Further explanation regarding interpretation and methodology can be found in the documentation.

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if fa98f9c is merged into main:

  • ❗🐌cache_applying: 33.8ms -> 38ms [+8.2%, +16.44%]
  •   :rocket:cache_recording: 1.92s -> 1.19s [-41.8%, -34.25%]
  •   :rocket:without_cache: 4.31s -> 2.58s [-40.78%, -39.48%]

Further explanation regarding interpretation and methodology can be found in the documentation.

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Sep 27, 2022

Also, it seems removing {tibble} does not reduce recursive dependencies:

library(magrittr)
deps <- desc::desc_get_deps() %>%
  dplyr::filter(type == 'Imports') %>%
  dplyr::pull(package)

recursive_deps_before <- purrr::map(deps, ~names(renv:::renv_package_dependencies(.x))) %>%
  unlist() %>%
  unique()


deps_without_tibble <- setdiff(deps, 'tibble')

recursive_deps_after <- purrr::map(deps_without_tibble, ~names(renv:::renv_package_dependencies(.x))) %>%
  unlist() %>%
  unique()


waldo::compare(recursive_deps_before, recursive_deps_after)
#> ✔ No differences

Created on 2022-09-27 by the reprex package (v2.0.1)

This is because we use {rematch2} (in one place only, can be worked around probably some how), which in turn depends on {tibble}. That {tibble} dependency was suggested to be removed in r-lib/rematch2#14, where @krlmlr was not all in for the suggested implementation. With the additional development that happened over the last 2 years and more recursive dependencies added to tibble, I think it would be even more beneficial to remove that dependency.

Copy link
Collaborator

@lorenzwalthert lorenzwalthert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job.

@IndrajeetPatil IndrajeetPatil merged commit 1a8bab3 into r-lib:main Sep 27, 2022
@IndrajeetPatil IndrajeetPatil deleted the perf_dataframe branch September 27, 2022 07:23
@IndrajeetPatil
Copy link
Collaborator Author

I think this is a big enough improvement to consider creating a new CRAN release?

IndrajeetPatil added a commit that referenced this pull request Sep 28, 2022
* Get rid of unnecessary `.name_repair` arg

This is generating warnings.

Follow-up on #1007

* make the wrapper even thinner
@IndrajeetPatil
Copy link
Collaborator Author

I think this is a big enough improvement to consider creating a new CRAN release?

Any thoughts, @lorenzwalthert and @krlmlr?

We also need to get rid of NOTEs in checks: https://cran.r-project.org/web/checks/check_results_styler.html
Let's not wait to get an email about this 😬

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Oct 11, 2022

Yes, I agree. Do you want to m make a PR to main similar to #930, plus using fledge? If not, I can do it, but not this week. Once all checks green, I can submit it.

@lorenzwalthert
Copy link
Collaborator

I already bumped the version recently and tried to organise the news items a bit.

@IndrajeetPatil
Copy link
Collaborator Author

If not, I can do it, but not this week. Once all checks green, I can submit it.

@lorenzwalthert I can wait! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants