Goals of StatCheck? #97

steveharoz · 2024-06-08T04:12:14Z

I'm worried that some goals of StatCheck are in contradiction with each other. Here is my understanding of the goals:

Find statistical tests in text (from string to a data frame)
Check for statistical consistency (check if reported p-value is within range of computed p-value)
Check for incomplete reporting (look for p-values without the rest of the details)
Check for APA-compliant formatting (is report formatted correctly)

While I understand the value of checking for APA formatting, it currently seems to be impeding finding and checking as many tests as possible. Maybe it's worth extracting tests in the broadest set of formats possible and then checking for APA formatting in an optional second pass rather than simply not reporting on incompliant tests at all?

@MicheleNuijten Thoughts?

MicheleNuijten · 2024-08-19T11:19:46Z

Sorry for the late reply, I missed that there were new issues opened.

You're right that, ideally, statcheck would find everything, regardless of APA formatting. The reason that, right now, it sticks to APA only, is twofold:

it creates very clear inclusion/exclusion criteria for which stats are and aren't retrieved. This is mainly useful for research purposes, when you intend to scrape a large batch of papers.
a risk with allowing for more types of formatting, is that statcheck will make more mistakes retrieving & recalculating the stats. It might misread certain results and/or wrongly classify the type of test. I've always opted to err on the side of conservatism, because flagging potential errors in the work of others remains a sensitive topic.

The step where statcheck looks for p-values alone can give people somewhat of an indication what percentage of stats was retrieved. If a paper has an APA factor of .60, you can be quite sure that several tests were not picked up, and, vice versa, an APA factor of >.80 can increase your confidence that you checked most of the reported stats.

MicheleNuijten · 2024-08-19T11:21:29Z

Following up; the new version of statcheck will include an option to also look for non-APA formatting (see the branch feature-non-apa-new). I've opted for a modular approach that combines all kinds of small deviations from APA (square brackets; subscripts; semi-colons; etc.). So far, it's been a bit of a pain to make sure all combinations and deviations work well and don't break any existing code. Work in progress!

MicheleNuijten closed this as completed Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goals of StatCheck? #97

Goals of StatCheck? #97

steveharoz commented Jun 8, 2024 •

edited

Loading

MicheleNuijten commented Aug 19, 2024

MicheleNuijten commented Aug 19, 2024

Goals of StatCheck? #97

Goals of StatCheck? #97

Comments

steveharoz commented Jun 8, 2024 • edited Loading

MicheleNuijten commented Aug 19, 2024

MicheleNuijten commented Aug 19, 2024

steveharoz commented Jun 8, 2024 •

edited

Loading