Possibly a double-counting bug in addTableIntersectCount #671

OskarGauffin · 2024-06-20T12:44:49Z

Hi,

found this in the exercises of the Oxford RWE summer school, looks a bit like a bug to me, but I may be incorrect.

It's the exercise where we're supposed to find the average number of prescriptions, using patientprofiles.

//Oskar

`###############################
library(CodelistGenerator)
library(CDMConnector)
library(duckdb)
library(PatientProfiles)
library(dplyr)

con <- dbConnect(duckdb(), eunomia_dir())
cdm <- cdmFromCon(con = con, cdmSchema = "main", writeSchema = "main")

cdm <- generateConceptCohortSet(
cdm = cdm,
name = "sinusitis",
conceptSet = list(
"bacterial_sinusitis" = 4294548,
"viral_sinusitis" = 40481087,
"chronic_sinusitis" = 257012,
"any_sinusitis" = c(4294548, 40481087, 257012)
),
limit = "all",
end = 0
)

##############

solution:

cdm$sinusitis |>
addTableIntersectCount(
tableName = "drug_exposure",
window = c(-Inf, Inf),
targetEndDate = NULL,
nameStyle = "number_prescriptions"
) |> filter(cohort_definition_id == 2) |> # Filter on cohort after intersection.
summarise(mean_prescription = mean(number_prescriptions))

gives you 50.

cdm$sinusitis |>
filter(cohort_definition_id == 2) |> # Filter on cohort after intersection.
addTableIntersectCount(
tableName = "drug_exposure",
window = c(-Inf, Inf),
targetEndDate = NULL,
nameStyle = "number_prescriptions"
) |> summarise(mean_prescription = mean(number_prescriptions))

gives you 25.

Which one is correct?

Check number of prescriptions for subject_id = 806

This subject belongs in all four sinusitis cohorts:

cdm$sinusitis |> filter(subject_id == 806) |> distinct(subject_id, cohort_definition_id)

And there is 21 drugs in the drug_exposure table for person_id = 806.

cdm$drug_exposure |> filter(person_id == 806) |> count()

##################################################

cdm$sinusitis |> filter(subject_id == 806) |>
filter(cohort_definition_id == 2) |> ################ FILTER on cohort before intersection
addTableIntersectCount(
tableName = "drug_exposure",
window = list(c(-Inf, Inf)),
nameStyle = "number_prescriptions"
) |>
pull("number_prescriptions") |>
mean()

21 - correct.

##################

cdm$sinusitis |>
filter(subject_id == 806) |>
addTableIntersectCount(
tableName = "drug_exposure",
window = list(c(-Inf, Inf)),
nameStyle = "number_prescriptions"
) |>
filter(cohort_definition_id == 2) |> ################ FILTER on cohort after intersection
pull("number_prescriptions") |>
mean()

42. Double the correct answer. I find it a bit surprising / possible bug

that the count is doubled by not filtering on the cohort before the intersection.

`

catalamarti · 2024-06-27T12:46:54Z

thanks for reporting @OskarGauffin and thank @ilovemane for fixing it

catalamarti assigned ilovemane Jun 25, 2024

ilovemane mentioned this issue Jun 26, 2024

fixes addCategories and addIntersect Flag #672

Merged

ilovemane linked a pull request Jun 26, 2024 that will close this issue

fixes addCategories and addIntersect Flag #672

Merged

catalamarti closed this as completed in #672 Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly a double-counting bug in addTableIntersectCount #671

Possibly a double-counting bug in addTableIntersectCount #671

OskarGauffin commented Jun 20, 2024 •

edited

Loading

catalamarti commented Jun 27, 2024

Possibly a double-counting bug in addTableIntersectCount #671

Possibly a double-counting bug in addTableIntersectCount #671

Comments

OskarGauffin commented Jun 20, 2024 • edited Loading

solution:

gives you 50.

gives you 25.

Which one is correct?

Check number of prescriptions for subject_id = 806

This subject belongs in all four sinusitis cohorts:

And there is 21 drugs in the drug_exposure table for person_id = 806.

21 - correct.

42. Double the correct answer. I find it a bit surprising / possible bug

that the count is doubled by not filtering on the cohort before the intersection.

catalamarti commented Jun 27, 2024

OskarGauffin commented Jun 20, 2024 •

edited

Loading