Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly a double-counting bug in addTableIntersectCount #671

Closed
OskarGauffin opened this issue Jun 20, 2024 · 1 comment · Fixed by #672
Closed

Possibly a double-counting bug in addTableIntersectCount #671

OskarGauffin opened this issue Jun 20, 2024 · 1 comment · Fixed by #672
Assignees

Comments

@OskarGauffin
Copy link

OskarGauffin commented Jun 20, 2024

Hi,

found this in the exercises of the Oxford RWE summer school, looks a bit like a bug to me, but I may be incorrect.

It's the exercise where we're supposed to find the average number of prescriptions, using patientprofiles.

//Oskar

`###############################
library(CodelistGenerator)
library(CDMConnector)
library(duckdb)
library(PatientProfiles)
library(dplyr)

con <- dbConnect(duckdb(), eunomia_dir())
cdm <- cdmFromCon(con = con, cdmSchema = "main", writeSchema = "main")

cdm <- generateConceptCohortSet(
cdm = cdm,
name = "sinusitis",
conceptSet = list(
"bacterial_sinusitis" = 4294548,
"viral_sinusitis" = 40481087,
"chronic_sinusitis" = 257012,
"any_sinusitis" = c(4294548, 40481087, 257012)
),
limit = "all",
end = 0
)

##############

solution:

cdm$sinusitis |>
addTableIntersectCount(
tableName = "drug_exposure",
window = c(-Inf, Inf),
targetEndDate = NULL,
nameStyle = "number_prescriptions"
) |> filter(cohort_definition_id == 2) |> # Filter on cohort after intersection.
summarise(mean_prescription = mean(number_prescriptions))

gives you 50.

cdm$sinusitis |>
filter(cohort_definition_id == 2) |> # Filter on cohort after intersection.
addTableIntersectCount(
tableName = "drug_exposure",
window = c(-Inf, Inf),
targetEndDate = NULL,
nameStyle = "number_prescriptions"
) |> summarise(mean_prescription = mean(number_prescriptions))

gives you 25.

Which one is correct?

Check number of prescriptions for subject_id = 806

This subject belongs in all four sinusitis cohorts:

cdm$sinusitis |> filter(subject_id == 806) |> distinct(subject_id, cohort_definition_id)

And there is 21 drugs in the drug_exposure table for person_id = 806.

cdm$drug_exposure |> filter(person_id == 806) |> count()

##################################################

cdm$sinusitis |> filter(subject_id == 806) |>
filter(cohort_definition_id == 2) |> ################ FILTER on cohort before intersection
addTableIntersectCount(
tableName = "drug_exposure",
window = list(c(-Inf, Inf)),
nameStyle = "number_prescriptions"
) |>
pull("number_prescriptions") |>
mean()

21 - correct.

##################

cdm$sinusitis |>
filter(subject_id == 806) |>
addTableIntersectCount(
tableName = "drug_exposure",
window = list(c(-Inf, Inf)),
nameStyle = "number_prescriptions"
) |>
filter(cohort_definition_id == 2) |> ################ FILTER on cohort after intersection
pull("number_prescriptions") |>
mean()

42. Double the correct answer. I find it a bit surprising / possible bug

that the count is doubled by not filtering on the cohort before the intersection.

`

@catalamarti
Copy link
Collaborator

thanks for reporting @OskarGauffin and thank @ilovemane for fixing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants