Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cohen.d.formula(..., paired=T) gives different results depending on data's sorting order #48

Closed
carstenschwede opened this issue Feb 16, 2020 · 2 comments

Comments

@carstenschwede
Copy link

carstenschwede commented Feb 16, 2020

This is related to #27 but applies to paired data. In the paired=T case, effsize::cohen.d.formula expects the data to be arranged by group first. Although this is a standard convention in non-formula use-cases, it is usually not expected in the formula interface (i.e. t.test(..., paired=T) or other Cohens d implementation do not expect data to be ordered like this).

Code that made me discover this behaviour (see bottom of post for working example):

    dfA <- aggregate(value~subject+group, data=df, FUN=mean)
    d1 <- effsize::cohen.d(value~group, data=dfA, paired=T)
    #d estimate: -0.6141064 (medium)

    dfB <- aggregate(value~group+subject, data=df, FUN=mean)
    d2 <- effsize::cohen.d(value~group, data=dfB, paired=T)
    #d estimate: -0.1663497 (negligible)

Relevant code section in CohenD.R is here:

if( paired ){
    [...]
    s.dif = sd(diff(d,lag=n1))
    [...]
}

Working example showing the issue:

Example:

#Setup some dummy data
set.seed(1234)
numSubjects <- 20
subject <- 1:numSubjects
before <- runif(numSubjects)
after <- before - 0.35*runif(length(before))
df <- data.frame(
  subject=rep(subject,2),
  group=c(
    rep("before",numSubjects),
    rep("after",numSubjects)
  ),
  value=c(before, after)
)

##################################

valueBefore <- df[df$group == "before",]$value
valueAfter <- df[df$group == "after",]$value

# d1 == d2
d1 <- effsize::cohen.d(valueAfter, valueBefore, paired=T)
d2 <- effsize::cohen.d(value~group, data=df, paired=T)

##################################

# Sorted by subjectId
df <- df[order(df$subject),]
valueBefore <- df[df$group == "before",]$value
valueAfter <- df[df$group == "after",]$value

# d1==d2==d3 is the same: d estimate: -0.5307307 (medium)
# but d4 is not: -0.145407 (negligible)

d3 <- effsize::cohen.d(valueAfter, valueBefore, paired=T)
d4 <- effsize::cohen.d(value~group, data=df, paired=T)

##################################

# Sorted by group
df <- df[order(df$group),]
valueBefore <- df[df$group == "before",]$value
valueAfter <- df[df$group == "after",]$value

# d1 == d2 == d3 == d5 == d6 are the same (although still false due to #49)
d5 <- effsize::cohen.d(valueAfter, valueBefore, paired=T)
d6 <- effsize::cohen.d(value~group, data=df, paired=T)

##################################

@carstenschwede
Copy link
Author

carstenschwede commented Feb 17, 2020

Neither lsr::cohensD nor rstatix::cohens_d are affected by sorting order in comparison, however lsr::cohensD gives a reasonable warning:

calculating paired samples Cohen's d using formula input. Results will be incorrect if cases do not appear in the same order for both levels of the grouping factor

They also report different effect sizes for paired tests altogether, please see #49.

@mtorchiano
Copy link
Owner

Thank you for your report.

I am fixing the issue of dependency on order.

Also I am adding a warning similar to the one reported by ``lsr::cohensD`

mtorchiano added a commit that referenced this issue Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants