Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goals of StatCheck? #97

Closed
steveharoz opened this issue Jun 8, 2024 · 2 comments
Closed

Goals of StatCheck? #97

steveharoz opened this issue Jun 8, 2024 · 2 comments

Comments

@steveharoz
Copy link

steveharoz commented Jun 8, 2024

I'm worried that some goals of StatCheck are in contradiction with each other. Here is my understanding of the goals:

  1. Find statistical tests in text (from string to a data frame)
  2. Check for statistical consistency (check if reported p-value is within range of computed p-value)
  3. Check for incomplete reporting (look for p-values without the rest of the details)
  4. Check for APA-compliant formatting (is report formatted correctly)

While I understand the value of checking for APA formatting, it currently seems to be impeding finding and checking as many tests as possible. Maybe it's worth extracting tests in the broadest set of formats possible and then checking for APA formatting in an optional second pass rather than simply not reporting on incompliant tests at all?

@MicheleNuijten Thoughts?

@MicheleNuijten
Copy link
Owner

Sorry for the late reply, I missed that there were new issues opened.

You're right that, ideally, statcheck would find everything, regardless of APA formatting. The reason that, right now, it sticks to APA only, is twofold:

  1. it creates very clear inclusion/exclusion criteria for which stats are and aren't retrieved. This is mainly useful for research purposes, when you intend to scrape a large batch of papers.
  2. a risk with allowing for more types of formatting, is that statcheck will make more mistakes retrieving & recalculating the stats. It might misread certain results and/or wrongly classify the type of test. I've always opted to err on the side of conservatism, because flagging potential errors in the work of others remains a sensitive topic.

The step where statcheck looks for p-values alone can give people somewhat of an indication what percentage of stats was retrieved. If a paper has an APA factor of .60, you can be quite sure that several tests were not picked up, and, vice versa, an APA factor of >.80 can increase your confidence that you checked most of the reported stats.

@MicheleNuijten
Copy link
Owner

Following up; the new version of statcheck will include an option to also look for non-APA formatting (see the branch feature-non-apa-new). I've opted for a modular approach that combines all kinds of small deviations from APA (square brackets; subscripts; semi-colons; etc.). So far, it's been a bit of a pain to make sure all combinations and deviations work well and don't break any existing code. Work in progress!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants