cohen.d.formula(..., paired=T) gives different results depending on data's sorting order #48

carstenschwede · 2020-02-16T23:54:10Z

This is related to #27 but applies to paired data. In the paired=T case, effsize::cohen.d.formula expects the data to be arranged by group first. Although this is a standard convention in non-formula use-cases, it is usually not expected in the formula interface (i.e. t.test(..., paired=T) or other Cohens d implementation do not expect data to be ordered like this).

Code that made me discover this behaviour (see bottom of post for working example):

    dfA <- aggregate(value~subject+group, data=df, FUN=mean)
    d1 <- effsize::cohen.d(value~group, data=dfA, paired=T)
    #d estimate: -0.6141064 (medium)

    dfB <- aggregate(value~group+subject, data=df, FUN=mean)
    d2 <- effsize::cohen.d(value~group, data=dfB, paired=T)
    #d estimate: -0.1663497 (negligible)

Relevant code section in CohenD.R is here:

if( paired ){
    [...]
    s.dif = sd(diff(d,lag=n1))
    [...]
}

Working example showing the issue:

Example:

#Setup some dummy data
set.seed(1234)
numSubjects <- 20
subject <- 1:numSubjects
before <- runif(numSubjects)
after <- before - 0.35*runif(length(before))
df <- data.frame(
  subject=rep(subject,2),
  group=c(
    rep("before",numSubjects),
    rep("after",numSubjects)
  ),
  value=c(before, after)
)

##################################

valueBefore <- df[df$group == "before",]$value
valueAfter <- df[df$group == "after",]$value

# d1 == d2
d1 <- effsize::cohen.d(valueAfter, valueBefore, paired=T)
d2 <- effsize::cohen.d(value~group, data=df, paired=T)

##################################

# Sorted by subjectId
df <- df[order(df$subject),]
valueBefore <- df[df$group == "before",]$value
valueAfter <- df[df$group == "after",]$value

# d1==d2==d3 is the same: d estimate: -0.5307307 (medium)
# but d4 is not: -0.145407 (negligible)

d3 <- effsize::cohen.d(valueAfter, valueBefore, paired=T)
d4 <- effsize::cohen.d(value~group, data=df, paired=T)

##################################

# Sorted by group
df <- df[order(df$group),]
valueBefore <- df[df$group == "before",]$value
valueAfter <- df[df$group == "after",]$value

# d1 == d2 == d3 == d5 == d6 are the same (although still false due to #49)
d5 <- effsize::cohen.d(valueAfter, valueBefore, paired=T)
d6 <- effsize::cohen.d(value~group, data=df, paired=T)

##################################

The text was updated successfully, but these errors were encountered:

carstenschwede · 2020-02-17T01:27:26Z

Neither lsr::cohensD nor rstatix::cohens_d are affected by sorting order in comparison, however lsr::cohensD gives a reasonable warning:

calculating paired samples Cohen's d using formula input. Results will be incorrect if cases do not appear in the same order for both levels of the grouping factor

They also report different effect sizes for paired tests altogether, please see #49.

mtorchiano · 2020-04-09T13:38:22Z

Thank you for your report.

I am fixing the issue of dependency on order.

Also I am adding a warning similar to the one reported by ``lsr::cohensD`

…r paired cohen.d Fixes issue #48

carstenschwede mentioned this issue Feb 17, 2020

Effect size is wrong for paired data #49

Closed

mtorchiano added a commit that referenced this issue Apr 9, 2020

Fixed a bug related to order of data and added a subject parameter fo…

d83e313

…r paired cohen.d Fixes issue #48

mtorchiano closed this as completed Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cohen.d.formula(..., paired=T) gives different results depending on data's sorting order #48

cohen.d.formula(..., paired=T) gives different results depending on data's sorting order #48

carstenschwede commented Feb 16, 2020 •

edited

Loading

carstenschwede commented Feb 17, 2020 •

edited

Loading

mtorchiano commented Apr 9, 2020

cohen.d.formula(..., paired=T) gives different results depending on data's sorting order #48

cohen.d.formula(..., paired=T) gives different results depending on data's sorting order #48

Comments

carstenschwede commented Feb 16, 2020 • edited Loading

carstenschwede commented Feb 17, 2020 • edited Loading

mtorchiano commented Apr 9, 2020

carstenschwede commented Feb 16, 2020 •

edited

Loading

carstenschwede commented Feb 17, 2020 •

edited

Loading