Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] [docs] add intro vignette (#3946) #4775

Merged
merged 9 commits into from
Nov 18, 2021
Merged

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Nov 6, 2021

Description from previous PR

Description copied from #3946. See #3946 (comment) for an explanation of why #3946's changes were moved to this PR.

First step towards #1944.

This initial vignette shows the very basic workflow of LightGBM. As such, it would not replace the demo with the corresponding name as that seems rather a guide how to deal with sparse data.

The html looks as in this zip here:

basic_walkthrough.zip

Some notes on R vignettes in general

  • CRAN does not rebuild the vignette html, but it still runs its R code.
  • While it is possible to use a logo, it is non-trivial.
  • Would you stick to the same code style as in the package code? I am asking because LightGBM is the only project that use R code with comma first.
  • The vignette html will be created automatically when building the package. Since the whole building process of LGB is complex, I have no idea if this works out as I hope... still, our colleagues here https://cran.r-project.org/web/packages/xgboost/index.html were successful.

Notes for Reviewers

See #3946 (review) for some relevant information needed to review this PR.

@StrikerRUS
Copy link
Collaborator

RTD builds are enabled for this branch: https://readthedocs.org/projects/lightgbm/builds/15200979/.

@jameslamb
Copy link
Collaborator Author

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 7, 2021

/gha run r-solaris

Workflow Solaris CRAN check has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1430378069

solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.3.1.99.tar.gz-08c15b9804194123b4d05bd2d9a260f4
solaris-x86-patched-ods: https://builder.r-hub.io/status/lightgbm_3.3.1.99.tar.gz-975f33015aa84dadb9e033a0dbe5394f
Reports also have been sent to LightGBM public e-mail: https://yopmail.com?lightgbm_rhub_checks
Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 7, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1430378410

Status: success ✔️.

@StrikerRUS
Copy link
Collaborator

Yeah, seems vignettes were built correctly, that's great!

However, I have one concern. That build with only one tiny vignette took ~620 sec compared to less than 500 sec for master branch. Our hard limit on RTD is 900 sec. I'm afraid that with growing number of vignettes we'll reach that limit quite soon. I wonder is there any workaround to rebuild only vignettes with any changes compared to the previous versions? Or the only way to stay within the limits is to write very simple examples and always cut down training rounds?

@jameslamb jameslamb changed the title WIP: [R-package] [docs] add intro vignette (#3946) [R-package] [docs] add intro vignette (#3946) Nov 7, 2021
@jameslamb
Copy link
Collaborator Author

that build with only one tiny vignette took ~620 sec compared to less than 500 sec for master branch.

I think that the particular vignette added in this PR probably added very little to that timing, and the most of the increase is attributable to the fact that LightGBM has to now be compiled twice (once each on these two lines).

LightGBM/docs/conf.py

Lines 283 to 284 in aafedd8

sh build-cran-package.sh || exit -1
R CMD INSTALL --with-keep.source lightgbm_*.tar.gz || exit -1

If I'm right about that, then it means that adding additional vignettes won't add a lot of time to those builds. I'll test that theory by pushing a test commit here that introduces a copy of basic_walkthrough.Rmd, with a different name.

Other strategies we could use if the amount of time to build vignettes becomes problematic in the future:

  • mark all code in vignettes as eval = FALSE (https://yihui.org/knitr/options/#code-evaluation) so that now R code in vignettes actually runs (and then change CI processes to force them to run, so they're tested)
  • try to limit all vignettes to very small num_iterations, num_leaves, datasets size, etc.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 7, 2021

I just pushed 40fb2e2 which adds 10 identical copies of basic_walthrough.Rmd.

https://lightgbm.readthedocs.io/en/dev-vignettes/R/articles/index.html

image

The RTD build took 655s (https://readthedocs.org/projects/lightgbm/builds/15206531/), only about 30s longer than the build with a single vignette (https://readthedocs.org/projects/lightgbm/builds/15203195/).

So I think this is strong evidence of what I mentioned in #4775 (comment), that most of the added build time in RTD builds from this PR is from re-compiling LightGBM.

As more vignettes are added in #1944 we should consider the strategies for limiting runtime that I mentioned in #4775 (comment), but I think there's a good amount of room to add new vignettes without risking hitting that RTD 900-second limit.

@StrikerRUS
Copy link
Collaborator

@jameslamb Thanks for your investigation! I agree we may add some number of new vignettes without any worries.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 13, 2021

/gha run r-solaris

Workflow Solaris CRAN check has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1457574729

solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.3.1.99.tar.gz-b9d6b89a01a1479c99c16fa742ac7604
solaris-x86-patched-ods: https://builder.r-hub.io/status/lightgbm_3.3.1.99.tar.gz-41fa02ae335346949d1e3c060c168c19
Reports also have been sent to LightGBM public e-mail: https://yopmail.com?lightgbm_rhub_checks
Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 13, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1457575555

Status: success ✔️.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy this PR allows to start writing cool vignettes! 🎉
The only disappointing thing is that tar trick.

Please check some my minor suggestions below:

.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.github/workflows/static_analysis.yml Outdated Show resolved Hide resolved
.vsts-ci.yml Outdated Show resolved Hide resolved
R-package/README.md Outdated Show resolved Hide resolved
R-package/README.md Outdated Show resolved Hide resolved
R-package/vignettes/basic_walkthrough.Rmd Outdated Show resolved Hide resolved
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
@jameslamb
Copy link
Collaborator Author

the only disappointing thing is that tar trick

agree 😢

Following https://www.r-project.org/bugs.html, I've request access to R's issue tracker. If that's approved, I'll open a bug report (and maybe even attempt a patch), based on the information in the third item mentioned at #3946 (review).

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jameslamb
Copy link
Collaborator Author

Thanks @StrikerRUS !

I just updated to latest master, just to do one more round of tests. If those don't show any issues, I'll merge this.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 17, 2021

/gha run r-solaris

Workflow Solaris CRAN check has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1469388575

solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.3.1.99.tar.gz-c943c7d3a0a846faa86af99f52d0b1c2
solaris-x86-patched-ods: https://builder.r-hub.io/status/lightgbm_3.3.1.99.tar.gz-aa58a079673641279b9cf48847ed7506
Reports also have been sent to LightGBM public e-mail: https://yopmail.com?lightgbm_rhub_checks
Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 17, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1469388869

Status: success ✔️.

@jameslamb jameslamb merged commit 5fa887b into master Nov 18, 2021
@jameslamb jameslamb deleted the dev-vignettes branch November 18, 2021 04:15
@jameslamb
Copy link
Collaborator Author

Thanks for this awesome contribution @mayer79 ! I'm sorry it took so long to get this (#3946) merged.

@mayer79
Copy link
Contributor

mayer79 commented Nov 19, 2021

Awesome @jameslamb ! We can soon start adding more vignettes. Should I open an issue with a list of possible topics?
Should we start listing topics in the original issue #1944?

Regarding naming: On CRAN, vignettes are sorted by the file name of the corresponding .rmd file, not by the vignette title or its %\VignetteIndexEntry{[...]} in the yaml header.

An example from our data.table colleagues:

Thus, we can try to use names like "v1_basic_classification.rmd" etc.

@jameslamb
Copy link
Collaborator Author

thanks for the great information @mayer79 ! I just responded over at #1944 (comment). Let's move the conversation there for greater visibility.

@StrikerRUS StrikerRUS mentioned this pull request Jan 6, 2022
13 tasks
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants