Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional datapoints: domain and inclusion criteria #94

Open
nwagu opened this issue Mar 21, 2023 · 13 comments
Open

Additional datapoints: domain and inclusion criteria #94

nwagu opened this issue Mar 21, 2023 · 13 comments

Comments

@nwagu
Copy link

nwagu commented Mar 21, 2023

This may not be very necessary for active learning, but it makes the data more meaningful, and accessible on its own. In a structured format, it can be read using scripts without needing to go to the source of the data.

Would very much prefer if the inclusion criteria is a list of the criteria all in boolean question format. This is important for a project I am working on.

And domain, the general field of the research so researchers can be selective.

@J535D165
Copy link
Member

Thanks @nwagu, great suggestions. We are working on a brand new dataset collection.

And domain, the general field of the research, so researchers can be selective.

We will add this in the next release of the dataset!

This may not be very necessary for active learning, but it makes the data more meaningful, and accessible on its own. In a structured format, it can be read using scripts without needing to go to the source of the data.

Would very much prefer if the inclusion criteria is a list of the criteria all in boolean question format. This is important for a project I am working on.

I agree this can be very interesting indeed. I thought about this as well. This can be a tremendous job to collect these criteria. In the upcoming version of the dataset collection, we will ensure that every dataset has the systematic review publication attached to it. That makes it easier to search for these variables and hopefully have a list of inclusion criteria as well.

Can you give an example of the format you would like to see?

@nwagu
Copy link
Author

nwagu commented Mar 21, 2023

Okay. Looking forward to the new dataset and am available to contribute to it.

The criteria can be adapted from the paper, for example:

  1. Does the passage discuss a digital game for human players? (Stated in the paper as "The paper discusses a digital game that is designed for human players.")

  2. Does the passage address realism or fidelity in the context of a digital game or digital games in general? (Stated as "The paper addresses realism in the context of that digital game or digital games in general")

So included papers can be assumed to have abstracts that answer yes to all the questions.

I feel like it would be easy (though manual) to get the questions from HCI review papers, not sure about all other fields.

@J535D165
Copy link
Member

Thanks for the example! I discussed this with the team, and this example helps. They think it is definitely interesting, and we will look into it.

Question: do you have a reference to the study you are referring to? It looks like a nice addition to our work (if the data is available as well)

@J535D165
Copy link
Member

I took a paper https://jcircadianrhythms.com/articles/10.5334/jcr.183 in our new collection and asked ChatGPT to extract the questions in the form you requested.

Hi, this is a part of an academic paper on a systematic review:

"Screening was performed in EROS (Early Review Organising Software; Institute of Clinical Effectiveness and Health Policy, Buenos Aires, Argentina) by two independent reviewers. Discrepancies were resolved by discussion among the reviewers. For our mapping review [17], we excluded (1) studies on other techniques than microdialysis (e.g. biosensors and microdialysis precursors such as push-pull perfusion), (2) studies measuring other substances than Hist and the amino acids Asn, Asp, GABA, Glu, Gln, Gly, Pro and Tau, (3) retro-dialysis studies, (4) microdialysis studies that did not report baseline values without the specified molecules in the perfusion fluid, (5) extra-cerebral microdialysis studies, (6) human and in vitro studies, and (7) papers not containing primary study data. Within publications, experiments using techniques other than microdialysis were also ignored (e.g. in [22]), as well as data on amino acids other than those we searched for (e.g. in [23]).

During screening, tags were added by KJ and CL to all studies on circadian rhythms, sleep and sleep deprivation. Tagged studies were subsequently screened for inclusion in this review based on the following criterion (besides being included in our mapping): studies measuring one or more of the molecules of interest during (1) naturally occurring sleep stages that were validated with polysomnographic measurements and/or (2) during sleep deprivation. Studies on e.g. carbachol-induced REM-sleep were thus excluded (e.g. in [23]).

To ensure capturing all relevant studies, we searched for studies with the terms “sleep”, “REM”, “rest”, “fatig”, and “somn” in the title within the studies included in our mapping review."

Can you help me to make a list of the inclusion criteria in the form of boolean questions?

The answer:

Sure! Here is a list of the inclusion criteria in the form of boolean questions:

Was the study conducted using microdialysis technique?
Did the study measure Hist and at least one of the following amino acids: Asn, Asp, GABA, Glu, Gln, Gly, Pro, or Tau?
Was the study a retro-dialysis study?
Did the study report baseline values without the specified molecules in the perfusion fluid?
Was the study an extra-cerebral microdialysis study?
Was the study a human or in vitro study?
Did the paper contain primary study data?
Did the study measure one or more of the molecules of interest during naturally occurring sleep stages that were validated with polysomnographic measurements?
Did the study measure one or more of the molecules of interest during sleep deprivation?
Did the study include any of the following terms in the title: "sleep", "REM", "rest", "fatig", or "somn"?
Was the study not using techniques other than microdialysis, or did it not report data on amino acids other than those we searched for?
I hope this helps! Let me know if you have any further questions.

So this looks quite good and interesting to add.

I also asked to make it into a JSON object, which is interesting:

{
  "criteria": [
    {
      "question": "Was the study conducted using microdialysis technique?",
      "exclusion": false
    },
    {
      "question": "Did the study measure Hist and at least one of the following amino acids: Asn, Asp, GABA, Glu, Gln, Gly, Pro, or Tau?",
      "exclusion": false
    },
    {
      "question": "Was the study a retro-dialysis study?",
      "exclusion": true
    },
    {
      "question": "Did the study report baseline values without the specified molecules in the perfusion fluid?",
      "exclusion": true
    },
    {
      "question": "Was the study an extra-cerebral microdialysis study?",
      "exclusion": true
    },
    {
      "question": "Was the study a human or in vitro study?",
      "exclusion": true
    },
    {
      "question": "Did the paper contain primary study data?",
      "exclusion": true
    },
    {
      "question": "Did the study measure one or more of the molecules of interest during naturally occurring sleep stages that were validated with polysomnographic measurements?",
      "exclusion": false
    },
    {
      "question": "Did the study measure one or more of the molecules of interest during sleep deprivation?",
      "exclusion": false
    },
    {
      "question": "Did the study include any of the following terms in the title: '*sleep*', '*REM*', '*rest*', '*fatig*', or '*somn*'",
      "exclusion": false
    },
    {
      "question": "Was the study not using techniques other than microdialysis, or did it not report data on amino acids other than those we searched for?",
      "exclusion": true
    }
  ]
}

I think your idea is very nice, and we might want to include this for all datasets or as an extension in the future.

@nwagu
Copy link
Author

nwagu commented Mar 25, 2023

ChatGPT to the rescue! I like the JSON format too

My example was from https://dl.acm.org/doi/pdf/10.1145/3491102.3501875. The data they have provided is the included papers only in a pdf

@rohitgarud
Copy link

I think ChatGPT is not the only option and a T5 model can be fine-tuned for this particular task.

@TimRepke
Copy link

We would also very much appreciate this addition. Would it help to crowd-source this? If a bunch of people do one or two reviews each, this is a manageable effort. I'd volunteer to do two!

This would need some instructions though, so it contains what you'd like in the form that you think is best.

@mcallaghan
Copy link

Please do not use ChatGPT or T5 or any other machine learning approach to create the inclusion criteria!

The synergy dataset serves an important purpose in that we can use it to test how well machine learning techniques work. It is therefore important that the data contained in it (including on inclusion criteria) is high quality and reliable.

The example given above looks OK on first glance, but read the last one:

    {
      "question": "Was the study not using techniques other than microdialysis, or did it not report data on amino acids other than those we searched for?",
      "exclusion": true
    }

If a study was using only microdialysis, then it was "not using techniques other than microdialysis", so it would pass this question and be excluded! Further, The original criteria included studies measuring one or more of the molecules of interest during (1) naturally occurring sleep stages that were validated with polysomnographic measurements and/or (2) during sleep deprivation. Yet the ChatGPT version splits these two into two separate criteria, making the and/or a simple and. It also mixes up inclusion criteria with the search strategy.

I think there are a couple of tasks involved here. The first one is to simply locate the text in the paper. The second is turn this into a structured form. I actually think just the first part would already be extremely helpful (and a task for humans not ChatGPT). It may or not be a good idea to use ChatGPT to turn this text into a structured format. This first example shows that it may be more difficult, but if we want to use a model for this task, we should first create a hand-labelled dataset to be able to assess how well the model can perform this task. So in any case we would need to create by hand an ideal structured version of the inclusion criteria.

I think 26 papers is not such a large number that this would not be feasible (at least in a sample of cases), and I don't think this number of papers justifies fine-tuning a model.

I'd be happy to volunteer to extract inclusion criteria for 19% of your systematic reviews

@Rensvandeschoot
Copy link
Member

@mcallaghan Thanks for your response!

I have copied and pasted the text containing the inclusion criteria for all SYNERGY papers into a table. The next step is to create a standardized format for presenting the criteria. Do you have ideas on how to present these?

@Rensvandeschoot
Copy link
Member

I observed three types of criteria:

1) Meta Criteria
These are often also part of the search query but are once again checked during the screening phase. It are general requirements applicable to all studies being considered for inclusion, often used to ensure consistency in language, accessibility, and document type.

Examples I came across:

  • Works can be in English, German, Dutch, French, Spanish, or Portuguese. Or Non-English studies are excluded.
  • Full-text access required;
  • Articles with only abstract access are excluded.
  • Include journal papers and conference proceedings. Excludes reviews, editorials, case reports, letters, abstract-only publications, books, book chapters, and non-peer-reviewed sources.
  • restrictions on the publication date range of the included studies.
  1. Generic Criteria
    These are broad inclusion criteria applicable to many studies regardless of the specific research question or topic. They help define the general characteristics of the included studies.

Examples:

  • Include prospective /or retrospective studies, randomized controlled trials (RCTs), non-randomized controlled trials, and comparative observational studies.
  • Exclude case reports, cross-sectional studies, and non-controlled studies.
  • Include human subjects, specific populations such as adults with certain conditions (e.g., cancer, knee osteoarthritis). Excludes pediatric populations under a specific age, non-human organisms, and certain conditions like late-life MDD.
  • Include specified drugs or therapies (e.g., DPen, trientine for WD). Comparators can include placebo, no treatment, or different treatments excluding the same drug in different doses.
  1. Specific Criteria
    These criteria are unique to the particular topic or question under investigation.

@mcallaghan
Copy link

mcallaghan commented Aug 6, 2024

Thanks very much for this!

I think a useful first step would be all the inclusion criteria - exactly as they are written - in text form. For experiments with using LLMs for screening, it would be interesting to see if providing the criteria as they are written helps identify studies better.

I think it could also be interesting to list of each of these, perhaps in a form similar to that suggested above, although I would perhaps consider a simplification

{
  "inclusion_criteria": [
    "Study includes human subjects",
    "Study is written in English",
    "..."
  ],
  "exclusion_criteria": [
    "Study is a human/in vitro study",
    "Study is a case study"
  ]

This is at least easier to understand to my mind. I think it is also crucial to pay attention to double negatives. One could consider writing the exclusion criteria in negative form. "Study must not be a case study", but I think the positive form is slightly easier.

Written like this, one could think of looping through each of the inclusion criteria, and each of the exclusion criteria and asking an LLM to provide the answer for each. Then you could come to the decision based on the combination of these answers.

One final thing is the boolean logic. I am assuming that a study is included if ALL of the inclusion criteria apply, and a study is excluded if ANY of the exclusion criteria apply. However, this might not always be the case. If not, one might want to make the logic explicit.

{
  "inclusion_criteria": (
    "Study includes human subjects" &
    ( 
      "Study is written in English" |
      "Study is written in Spanish"
    )
  ),
  "exclusion_criteria": (
    "Study is a human/in vitro study" |
    "Study is a case study"
   )

@mcallaghan
Copy link

It is also interesting that we have criteria like "Full-text access required".

Automated approaches will never be able to replicate this, unless we knew exactly which journals the reviewers had access to, and which papers had open fulltext available (on researchgate, requested through researchgate, on google scholar, etc.).

It makes me wonder if these criteria are all applied at the title and abstract level. If there is some distinction made, then it may be wise to separate title and abstract inclusion criteria from full text inclusion criteria.

@mpbron
Copy link

mpbron commented Aug 26, 2024

It would be nice to make the criteria more easily accessible for more datasets! My suggestion would be to use YAML over JSON:

Written like this, one could think of looping through each of the inclusion criteria, and each of the exclusion criteria and asking an LLM to provide the answer for each. Then you could come to the decision based on the combination of these answers.

In our recent paper (to be presented at IAL@ECML-PKDD 2024), we did exactly this!
In our experiments, the criteria were formatted as follows:

criteria:
  - a: |
      Is the study a longitudinal/prospective study with at least three-time point assessments measuring posttraumatic stress disorder (PTSD)?
      Answer with YES if it is a longitudinal/prospective study with at least three-time point assessments measuring PTSD.
      Answer with NO if the study is not longitudinal or prospective or does not measure PTSD.
      Answer with UNKNOWN if the study is longitudinal but the number of time-point assessments is not mentioned
  - b: |
      Does the study assess PTSD symptoms as a continuous variable using an eligible PTSD scale?
      Here are some eligible PTSD scales (answer with YES if the scale is in this list) that measure PTSD as a continuous variable: 
      * Clinician Administered PTSD Scale (CAPS)
      * PTSD Checklist (PCL) 
     [...]

We transformed the screening protocol to a more verbose version to make it more suitable for the LLM. The idea behind this formatting is as follows:
We use YAML for better multiline support over JSON. Some criteria contain long lists of eligible methods/scales; entering those in JSON is not fun.

Next, every criterion has an identifier (in our work, $a, b, c, d$), but this can be any string.
In the case study of our paper, a study can be included if $a \wedge b \wedge c \wedge d$. For now, the calculation of this decision function was hard-coded in our Python code, but specifying the decision function (for example, in the YAML file) could also be a future option. The logic should also account for unknowns, as not every criterion can be decided based on the title and abstract. We used ternary logic (True, False, Unknown) for this in our paper.

To summarize, I would propose the following format (using the example of @mcallaghan):

criteria:
  - human:  "Study includes human subjects"
  - english: "Study is written in English"
  - spanish: "Study is written in Spanish"
  - invitro: "Study is a human/in vitro study"
  - casestudy: "Study is a case study"
decision_function:
  - main: "inclusion AND NOT exclusion"
  - inclusion: "human AND (english OR spanish)"
  - exclusion: "invitro OR casestudy"

Encoding the decision function fully in YAML may also be an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants