Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching sql chunks throwing error when output.var not used #1842

Closed
3 tasks done
mfherman opened this issue May 1, 2020 · 8 comments · Fixed by #2249
Closed
3 tasks done

Caching sql chunks throwing error when output.var not used #1842

mfherman opened this issue May 1, 2020 · 8 comments · Fixed by #2249
Assignees
Labels
bug Bugs

Comments

@mfherman
Copy link

mfherman commented May 1, 2020

I'm having trouble knitting an Rmd when I set cache=TRUE for sql code chunks.

There is some discussion of a related error #1544 and rstudio/rmarkdown#914, but those examples seem to be when output.var is set in the sql chunk and the bug was fixed a couple years ago.

The following Rmd will knit the first time, but the second time when it tries to read from the cache, I get this error:

Quitting from lines 12-13 (test-sql-cache-rmd.Rmd) 
Error in fun(environment()) : invalid first argument
Calls: <Anonymous> ... call_block -> <Anonymous> -> lazyLoad -> lazyLoadDBexec -> fun
Execution halted

When cache=FALSE, the error goes away.

---
title: "sql chunk"
output: html_document
---

```{r setup, include=FALSE}
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
knitr::opts_chunk$set(echo = TRUE, connection = "con")
```

```{sql select_1, cache=TRUE}
SELECT 1
```

This issue was previously posed on RStudio Community.

> xfun::session_info("knitr")
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134), RStudio 1.3.878

Locale:
  LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
  LC_NUMERIC=C                           LC_TIME=English_United States.1252    

Package version:
  evaluate_0.14   glue_1.4.0      graphics_3.6.1  grDevices_3.6.1 highr_0.8       knitr_1.28.5    magrittr_1.5    markdown_1.1    methods_3.6.1  
  mime_0.9        stats_3.6.1     stringi_1.4.6   stringr_1.4.0   tools_3.6.1     utils_3.6.1     xfun_0.13       yaml_2.2.1     

By filing an issue to this repo, I promise that

  • I have fully read the issue guide at https://yihui.org/issue/.
  • I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('knitr'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('yihui/knitr').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

@solarchemist
Copy link

Wow, this was a tricky bug.

I think I can add to it. This behaviour is present even with eval=FALSE. I had some chunks demoing SQL queries on my blog (using SQL chunks mostly for the sake of pretty code highlighting), and every blogdown run except the first would fail with:

Error in fun(environment()) : invalid first argument
Calls: local ... call_block -> <Anonymous> -> lazyLoad -> lazyLoadDBexec -> fun
Execution halted
Error in render_page(f) : 
  Failed to render 'content/post/2018-07-24-self-hosted-git-server/2018-07-24-gitea.Rmd'

When it failed, I would change it to a generic code chunk, and then it would run ok once before failing again the subsequent run. Anyway, thanks for explaining this weird behaviour!

This was my chunk options and code:

```{r global_options, echo=FALSE, results='hide', message=FALSE}
opts_chunk$set(
   eval    = TRUE,
   cache   = TRUE,
   message = FALSE,
   warning = FALSE)
```

```{sql eval=FALSE}
GRANT ALL ON gitea.* TO 'gitea'@'localhost';
FLUSH PRIVILEGES;
```
> xfun::session_info("knitr")
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS, RStudio 1.3.959

Locale:
  LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
  LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
  LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
  LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

Package version:
  evaluate_0.14   glue_1.4.1      graphics_3.6.2  grDevices_3.6.2 highr_0.8      
  knitr_1.28      magrittr_1.5    markdown_1.1    methods_3.6.2   mime_0.9       
  stats_3.6.2     stringi_1.4.6   stringr_1.4.0   tools_3.6.2     utils_3.6.2    
  xfun_0.15       yaml_2.2.1

@cderv cderv added the bug Bugs label Jan 29, 2021
@eitsupi
Copy link
Sponsor

eitsupi commented Apr 11, 2023

I ran into this bug and noticed that I can work around it by switching engines from sql to glue_sql.
This is rather odd since glue_sql calls sql engine internally, but does this give you any clue as to how to solve this problem?

The real code can be found here.
(Note that I use glue_sql even in code blocks without string glueing because I ran into this issue)
https://github.com/eitsupi/querying-with-prql/blob/59a687418bbfcb15e4a23a16a65247756d6338fd/docs/indexing.qmd#L1-L74

@cderv
Copy link
Collaborator

cderv commented Apr 11, 2023

Thanks. I still don't know why we have an issue but this will be providing a base for comparison. For reminder, the issue happens in lazyload feature when restoring from the cache.

Using cache.lazy = FALSE in chunk option will solve the issue. We could make this options be FALSE by default with sql engine. I don't know why it does work differently in both situation yet for the proper fix

@cderv
Copy link
Collaborator

cderv commented Apr 11, 2023

Oh I know now !

knitr/R/block.R

Lines 160 to 163 in 039b6cf

if (options$cache.rebuild || !cache.exists) block_cache(options, output, switch(
options$engine,
'stan' = options$output.var, 'sql' = options$output.var, character(0)
))

We are specifically storing output$output.var when using sql engine as objects for our block_cache() function, which will be NULL when not defined, whereas for other engine we pass character(0) - which is what happens when glue_sql is used.

It seems the lazyLoad() will have issue restoring objects created with this NULL value. We use kind of really internal stuff for this lazy load feature calling getFromNamespace('makeLazyLoadDB', 'tools') so maybe we should do anything if NULL is passed ?

@yihui I send a PR for a small fix, but we can discuss there if we need to handle this more deeply

@eitsupi
Copy link
Sponsor

eitsupi commented Apr 12, 2023

Wow, thanks for looking into this!

Perhaps there will be no need to give stan and sql special treatment there?
In other words, would it be great if wrappers for existing engines like glue_sql for sql handled cache in exactly the same way as original engines?

@cderv
Copy link
Collaborator

cderv commented Apr 12, 2023

In other words, would it be great if wrappers for existing engines like glue_sql for sql handled cache in exactly the same way as original engines?

Not sure to understand. You mean eng_glue_sql should pass the information it should behave like the sql engine ?

I think this could already be done by using options$engine <<- 'sql' so that the information is passed to knitr further processing.

Not sure if this is a recommended pattern to do this, but this would work.

To a more generic cache mechanism for engine, I believe it require refactor and this is a bigger project started to be tackled in #1505

This would indeed by clearer, and allow wrapper engine to call a cache_engine function probably.

@eitsupi
Copy link
Sponsor

eitsupi commented Apr 12, 2023

To a more generic cache mechanism for engine, I believe it require refactor and this is a bigger project started to be tackled in #1505

This would indeed by clearer, and allow wrapper engine to call a cache_engine function probably.

Sounds good, thank you for working on that.

@github-actions
Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants