Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update HACCP_term regex to required FOODON, add multivalue example #802

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mslarae13
Copy link
Contributor

@mslarae13 mslarae13 commented Jun 14, 2024

Address syntax match to examples
Update regexs for MIxS

Based on the description for HACCP this requires the FOODON ontology.
description: Hazard Analysis Critical Control Points (HACCP) food safety terms;
This field accepts terms listed under HACCP guide food safety term (http://purl.obolibrary.org/obo/FOODON_03530221)

While this doesn't perform any validation to check if what's been entered is really in FOODON, it does some string check.

@mslarae13 mslarae13 linked an issue Jun 14, 2024 that may be closed by this pull request
keywords:
- food
- term
slot_uri: MIXS:0001215
multivalued: true
range: string
pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$
pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex requires FOODON ontology. Is this what we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please use the pattern ^(\S[^\r\n]*) [FOODON:\d{7,8}]$ instead of ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ or see my notes on dynamic enumerations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intentionally remove the white-space between the label and the term id? I don't think that's consistent with other ontology term patterns in MIxS

Copy link
Contributor Author

@mslarae13 mslarae13 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbomam
For the white space, do you mean if it should be "lead poisoning [FOODON:03530243]" vs "lead poisoning[FOODON:03530243]"

So, the white space is supposed to be there? I thought I had it set to be valid with or without it... does it matter? If so, I'll make sure I correct it. Just tell me which is correct.

Looking at the submission schema the white space should be there. So I can make that update to the regex.

Copy link
Contributor Author

@mslarae13 mslarae13 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your comment here : #802 (comment)

^(\S[^\r\n]*) [FOODON:\d{7,8}]$

I f we want to use pattern-only validation, I suggest we go with that.

That regex ^(\S[^\r\n]*) [FOODON:\d{7,8}]$
is showing me that "lead poisoning [FOODON:03530243]" is invalid... :(

... are you sure that's right?? Or am I missing something about the formatting of the value for "lead poisoning [FOODON:03530243]" ?

I think it needs to be ^.+\s*\[FOODON:\d{7,8}]$

(https://www.ebi.ac.uk/ols4/ontologies/foodon/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FFOODON_03530221)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-07-09 at 4 53 27 PM Screenshot 2024-07-09 at 4 53 40 PM

@mslarae13 mslarae13 marked this pull request as ready for review June 28, 2024 17:48
@mslarae13
Copy link
Contributor Author

I didn't include an example. I am not at all familiar with the FoodAnimalAndAnimalFeed extension. Before I committed time to getting familair and making an example, I wanted to check that this was a good change.

@turbomam
Copy link
Member

turbomam commented Jul 9, 2024

Thanks @mslarae13. This is good progress. We can refine it a little:

First of all, how long are the numeric portions of FOODON URIs?

I used ChatGPT 4 to help me with that SAPRQL query

7 or 8, after subtracting the 38 characters in the base portion or the URIs, "http://purl.obolibrary.org/obo/FOODON_"

Next I asked ChatGPT 4

I want to write a regular expression for a FOODON label followed by one white-space and then a FOODON CURIe. The CURIes should be enclosed in square brackets. They start with "FOODON:" and are followed by 7 or 8 digits. The label must start with a non-white-space character but can have any number of any characters after that, as long as they aren't carriage returns, line feeds, etc.

after a little testing with regexr, we came up with

^(\S[^\r\n]*) [FOODON:\d{7,8}]$

I f we want to use pattern-only validation, I suggest we go with that.

@turbomam
Copy link
Member

turbomam commented Jul 9, 2024

That doesn't check that the label and id portion match, etc., and it doesn't limit the choices to sub-classes of haccp guide food safety term

A better LinkML validation strategy for this might be a dynamic enumeration. They are expressed with logic, but can be expanded to an enumeration with explicit permissible values. A limitation right now is that be that the permissible values won't include the label and the id won't be enclosed in square brackets. But I would like to use this case to motivate improvements to LinkML dynamic enumerations in support of MIxS.

@turbomam
Copy link
Member

turbomam commented Jul 9, 2024

The vskit command from the Ontology Access Kit can be used like this

vskit expand -s schema.yaml -o schema_expanded.yaml

to expand this

enums:
  HaccpTerm:
    reachable_from:
      source_ontology: bioregistry:foodon
      source_nodes:
      - FOODON:03530221   ## haccp guide food safety term
      is_direct: false
      relationship_types:
      - rdfs:subClassOf

into this

enums:
  HaccpTerm:
    reachable_from:
      source_ontology: bioregistry:foodon
      source_nodes:
      - FOODON:03530221   ## haccp guide food safety term
      is_direct: false
      relationship_types:
      - rdfs:subClassOf
    permissible_values:
      FOODON:03530231:
        text: FOODON:03530231
        meaning: FOODON:03530231
        title: hazard 3
      FOODON:03530244:
        text: FOODON:03530244
        meaning: FOODON:03530244
        title: sodium tripolyphosphate
      FOODON:03530237:
        text: FOODON:03530237
        meaning: FOODON:03530237
        title: hazard 9

@turbomam
Copy link
Member

turbomam commented Jul 9, 2024

If using this mechanism sounds promising to you, and you want the OAK code to be modified to emit "sodium tripolyphosphate [FOODON:03530244]" instead of "FOODON:03530244", please up-vote this

keywords:
- food
- term
slot_uri: MIXS:0001215
multivalued: true
range: string
pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$
pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please use the pattern ^(\S[^\r\n]*) [FOODON:\d{7,8}]$ instead of ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ or see my notes on dynamic enumerations.

@only1chunts
Copy link
Member

I agree that the change is suitable. As for the actual patturn being used, I bow to @turbomam's greater expertise on that!
The additional idea of using some sort of automated expansion thingy sounds like a good idea to me, so I have thumbs-up'd that ticket in the ontology access toolkit repo.

Copy link
Member

@only1chunts only1chunts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes seem reasonable to me, and I trust the combination of @turbomam and @mslarae13 to get the patturns correct (I dont have the expertise to know whats right).

@mslarae13 mslarae13 self-assigned this Sep 3, 2024
@mslarae13 mslarae13 added the 1-TermUpdate Update suggestion for existing term, including bugs. Issues from "cig-bug" label moved here. label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-TermUpdate Update suggestion for existing term, including bugs. Issues from "cig-bug" label moved here.
Projects
Status: Discuss
Development

Successfully merging this pull request may close these issues.

bad examples
3 participants