Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameters passed using dbConnect( config = list() ) are ignored #83

Open
SimonCoulombe opened this issue Feb 23, 2024 · 9 comments
Open
Milestone

Comments

@SimonCoulombe
Copy link

SimonCoulombe commented Feb 23, 2024

Hi,
I saw some code (here) that used config=list() to pass parameters to a duckdb connection.
I tried it, but for me, they are ignored as illustrated below where the memory limit is 432GB instead of the requested 1GB.

Passing parameters using dbExecute(con, "PRAGMA threads=1; PRAGMA memory_limit='1GB';") appears to work.

Maybe config= list() is not a thing, just wanted to make sure:

library(dplyr)
library(dbplyr)
library(duckdb)

duckdb_path <- "/devroot/sandbox/tmp/duckdb.duckdb"

con <- dbConnect(duckdb::duckdb(dbdir = duckdb_path), config=list("memory_limit"="1GB", "threads" = "1") )
dbGetQuery(con, "select current_setting('memory_limit')")  #                         432.6GB
dbGetQuery(con, "select current_setting('threads')")  # 64
dbExecute(con, "PRAGMA threads=1; PRAGMA memory_limit='1GB';")
dbGetQuery(con, "select current_setting('memory_limit')")    # 1GB
dbGetQuery(con, "select current_setting('threads')")   # 1


 Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.0.2 (2020-06-22)
 os       OpenShift Enterprise
 system   x86_64, linux-gnu
 ui       RStudio
 language (EN)
 collate  en_CA.UTF-8
 ctype    en_CA.UTF-8
 tz       America/Toronto
 date     2024-02-23
 rstudio  2023.06.0+421.pro1 Mountain Hydrangea (server)
 pandoc   NA

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 ! package     * version date (UTC) lib source
 P cli           3.6.2   2023-12-11 [?] RSPM (R 4.0.5)
 P DBI         * 1.2.1   2024-01-12 [?] RSPM (R 4.0.5)
 P dbplyr      * 2.4.0   2023-10-26 [?] RSPM (R 4.0.5)
 P dplyr       * 1.1.4   2023-11-17 [?] RSPM (R 4.0.5)
 P duckdb      * 0.9.2-1 2023-11-28 [?] RSPM (R 4.0.5)
 P fansi         1.0.4   2023-01-22 [?] CRAN (R 4.0.2)
 P generics      0.1.3   2022-07-05 [?] RSPM (R 4.0.5)
 P glue          1.6.2   2022-02-24 [?] RSPM (R 4.0.5)
 P lifecycle     1.0.3   2022-10-07 [?] RSPM (R 4.0.5)
   magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.0.5)
 P pillar        1.9.0   2023-03-22 [?] RSPM (R 4.0.5)
 P pkgconfig     2.0.3   2019-09-22 [?] RSPM (R 4.0.3)
   R6            2.5.1   2021-08-19 [1] RSPM (R 4.0.5)
   renv          1.0.3   2023-09-19 [1] RSPM (R 4.0.2)
 P rlang         1.1.3   2024-01-10 [?] RSPM (R 4.0.5)
 P rstudioapi    0.15.0  2023-07-07 [?] RSPM (R 4.0.5)
 P sessioninfo   1.2.2   2021-12-06 [?] RSPM (R 4.0.5)
 P tibble        3.2.1   2023-03-20 [?] RSPM (R 4.0.5)
 P tidyselect    1.2.0   2022-10-10 [?] RSPM (R 4.0.5)
 P utf8          1.2.3   2023-01-31 [?] RSPM (R 4.0.5)
 P vctrs         0.6.5   2023-12-01 [?] RSPM (R 4.0.5)

[1] xxxxxxx /renv/library/R-4.0/x86_64-pc-linux-gnu
 [2] /opt/R/4.0.2/lib/R/library

 P ── Loaded and on-disk path mismatch.
@nbc
Copy link

nbc commented Feb 24, 2024

Parameters must be passed to duckdb OR to dbConnect, not the two :

> library(duckdb)
> 
> con <- dbConnect(duckdb(dbdir = "tpch.db"),
+                  config = list("memory_limit" = "3G")
+                 )
> dbGetQuery(con, "select current_setting('memory_limit')")
  current_setting('memory_limit')
1                          26.6GB
> 
> con <- dbConnect(duckdb(
+   dbdir = "tpch.db",
+   config = list("memory_limit" = "3G")
+   ),
+ )
> dbGetQuery(con, "select current_setting('memory_limit')")
  current_setting('memory_limit')
1                           3.0GB
> 
> con <- dbConnect(duckdb(),
+                  dbdir = "tpch.db",
+                  config = list("memory_limit" = "3G")
+ )
> dbGetQuery(con, "select current_setting('memory_limit')")
  current_setting('memory_limit')
1                           3.0GB

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 24, 2024

Thanks. It's a bit messy right now, the config passed to dbConnect() will only be honored when this function creates a new duckdb server; otherwise it will be silently ignored.

I need to understand the scope at which this configuration applies to be able to propose something better.

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 24, 2024

For now, it's safest to set config in duckdb::duckdb() .

@krlmlr krlmlr added this to the 0.10.0 milestone Feb 24, 2024
@krlmlr
Copy link
Collaborator

krlmlr commented Mar 3, 2024

@Tmonster: This is a follow-up to #73 (comment) . At the level of the C++ wrapping code, I don't understand the purpose of the DBWrapper struct. Why do both DBWrapper and ConnWrapper exist, is this still necessary today? Or could the logic of rapi_startup() be moved to rapi_connect() ?

CC @hannes.

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 24, 2024

This looks much better in #124, I'll merge it today, binaries will be available on https://duckdb.r-universe.dev/duckdb# soon. Can you confirm?

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 24, 2024

Spoke too soon. This will remain a "feature": config and read_only are only valid when instantiating the database object, which can be during the initial duckdb() call, or during dbConnect() if the dbdir argument is different. This is because config and read_only are defined for the database object, which can host multiple connection objects; each DuckDB file can have at most one database object associated with it.

The cleanest way forward would be to deprecate dbConnect(dbdir, config, read_only) and allow these only in the duckdb() call: #126.

@nicki-dese
Copy link

hi @krlmlr - as requested in #56, I ran gc() before reconnecting to the connection with read_only = F, and that fixed the issue.

@krlmlr
Copy link
Collaborator

krlmlr commented Apr 30, 2024

Thanks for the heads-up. I'd consider using gc() to achieve this behavior "off-label use". To truly fix this, please instantiate two driver objects with duckdb() and duckdb(read_only = TRUE), respectively. Now, one of those calls might not work until you call gc(), but it might as well work right away -- not sure.

@nicki-dese
Copy link

I'm not sure how likely this scenario will be - unless you decide half way through a session, 'oh - I do need to change that table!' - but it's still nice to know there's a reason for the behaviour and a work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants