Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed schema changes #20

Open
jpolka opened this issue May 10, 2018 · 13 comments
Open

Proposed schema changes #20

jpolka opened this issue May 10, 2018 · 13 comments

Comments

@jpolka
Copy link
Member

jpolka commented May 10, 2018

Here are some suggested changes/additions!

  • Would it be possible to add some of the headers that are present in schema.yml - for example, OPEN PEER REVIEW, to the YAML files themselves? Furthermore, condensing down some of the white space (ie, between data fields that share the same reference url) might reduce the need to scroll so much. Each dsc could then be started with the name of the variable for disambiguation
  • "Peer review policy url (valid url)" -> "Open peer review policy url (valid url)"
  • Add url for preprint-citation
  • In the later part of the files, preprint urls follow the data fields, but in the early part, they preceed them. Suggest standardizing these
  • preprint url fields descriptions could be standardized as well
  • would be good to add "unknown" to most fields to signify that this information can't be found. If users don't look for info, they can leave the field null. But if they do look and can't find, they could use unknown. The absence of public information is also important...
  • From Journals associated with more than one policy #25: split journals into two fields like journals and journals-subpolicies
@dhimmel
Copy link
Member

dhimmel commented May 11, 2018

@jpolka can you make a pull request with your edits to the schema. Do not merge it, but once i have the commit I can take it from there. This will be the first time we are changing the schema after there are already some annotations. Here's the computational workflow I'm envisioning:

  1. Update the blank template data structure for each policy with existing annotations
  2. Apply the data fields taken from RoMEO

This means that any changes to to the RoMEO fields will get overwritten.

would be good to add "unknown" to most fields to signify that this information can't be found

Sounds reasonable to me! not specified would be an alternative, although perhaps unknown is a bit more general?

Would it be possible to add some of the headers that are present in schema.yml - for example, OPEN PEER REVIEW, to the YAML files themselves? Furthermore, condensing down some of the white space (ie, between data fields that share the same reference url) might reduce the need to scroll so much.

I will see what's possible here given the treacherous documentation and attitude of ruamel.yaml.

Each dsc could then be started with the name of the variable for disambiguation

If you did this, then I we wouldn't have to worry about section headers, and I could easily remove the newlines (which took me a very long time to add, but if they are more damaging then helpful, we should remove).

@dhimmel
Copy link
Member

dhimmel commented May 11, 2018

@jpolka IIRC you also mentioned the idea of adding a general comments field somewhere?

@jpolka
Copy link
Member Author

jpolka commented May 11, 2018

If you did this, then I we wouldn't have to worry about section headers, and I could easily remove the newlines (which took me a very long time to add, but if they are more damaging then helpful, we should remove).

Can we use \n for new lines in descriptions?

@jpolka IIRC you also mentioned the idea of adding a general comments field somewhere?

Yes, I will add this too!

jpolka2 added a commit to jpolka2/policies-database that referenced this issue May 11, 2018
dhimmel pushed a commit to dhimmel/policies-database that referenced this issue May 11, 2018
dhimmel pushed a commit to dhimmel/policies-database that referenced this issue May 11, 2018
@jpolka
Copy link
Member Author

jpolka commented May 16, 2018

As per the discussion on Gitter with @cameronblandford, it would be good to add the following changes:

(Please also suggest different tags for the field, @tonyR-H ?)
Can enum be used in a str inside a seq, @dhimmel ?

field
  type: seq
  desc: "Field - add all that apply. Add a new line for each. Start each line with two spaces, a dash, and one space, like this: '  - ' (biology/chemistry/physics/computer science/medicine/math/social science/humanities)"
  sequence:
    - type: str
      enum: [the above] #Not sure this will work -  http://www.kuwata-lab.com/kwalify/ruby/users-guide.01.html#schema 

visible 
  type: str
  desc: "Used by moderators to determine whether record is shown in frontend (yes/no) - do not edit 
  enum: [yes, no]

@dhimmel
Copy link
Member

dhimmel commented May 17, 2018

you could make it so any policy or journal can have multiple subjects and make sure people are liberal with their tagging

Regarding the field (discipline / subject) field, I'm not sure its a good idea to add this a manual field. Curating each policy with fields is a lot of work. Instead, I'd suggest that we look up subjects based on ISSNs of the journals covered by each policy. I think we could do this perhaps via Crossref or Scopus.

would adding an optional contributors field help incentivize people / hold them accountable you think?

The website should link to the source YAML file on GitHub which has a blame view. We could also calculate contribution summaries based on the git history for each YAML file.

how about some sort of visible or available-on-website or filled-out field that admins can flip, so it can be up to moderators' discretion when something becomes available in the database?

Currently, at http://transpose.surge.sh I don't see actual data (I think it's showing mock data, BTW awesome design @cameronblandford ). An alternative to a visible field would be to only display fields that are not null. Under the current proposal would all journals default to "not visible" and would have to be toggled to visible? I'm not against the idea, just that I'd need a little more guidance before implementing it.

@cameronblandford
Copy link

@dhimmel surge.sh only hosts static sites, so I haven't been able to provide updates to that deployment for any of the more recent changes. I'll put up something today. Only displaying fields that aren't null is already implemented in the dynamic version 👍 Thinking about it more now, I think it would be even better to display every policy regardless of how much has been filled out, and then have a link to the relevant file in the repo for each policy somewhere on the policy detail page, in a sort of "Contribute to this policy's collected information here" way.

@cameronblandford
Copy link

cameronblandford commented May 17, 2018

That way users can easily see where and how they can contribute, and contribute to areas that interest them.

I, a user, check out the transpose website to see if the journal i'm submitting to, "ExampleJournal", is listed.

It is, but there's no information available in the connected policy. But I see a button asking if I'd like to add more information about that policy to the database, along with tips of where to find all the relevant information.

Instead of feeling lost, I now have a sense of direction re: how to go forward in my quest for journal information and will (more likely than before at least) be incentivized to contribute to the repo.

@jpolka
Copy link
Member Author

jpolka commented May 18, 2018

This discussion sounds good!

Re fields - while I think a tag system would permit more flexibility (especially for interdisciplinary journal) than a system of categories, I completely agree with your point about using existing data where possible! It does seem like category-name from Crossref, which is apparently derived from Scopus, would be the way to go.

BTW I just noticed these little typos we might want to fix next time around...

# Publisher name from SHERAP/RoMEO. Do not edit
publisher: 

# Policy heading from SHERAP/RoMEO. Do not edit
policy-heading: 

@jpolka
Copy link
Member Author

jpolka commented May 31, 2018

From the editathon -

  • Need “before review” for which version of preprint is ok to post
  • Need a "conditional" or "other" option for peer review identities revealed/published and reports published Update romeo_600.yml #32

Also:
# Are there separate fields for technical & impact evaluation? (yes/no)

Should be

# Are there separate fields for technical & impact evaluation? If no information on impact is requested, select yes. (yes/no)

@dhimmel
Copy link
Member

dhimmel commented May 31, 2018

@jpolka can you make a PR that modifies the schema according to the edithon points? Then I can take it from there?

@jpolka
Copy link
Member Author

jpolka commented Jun 1, 2018

@dhimmel, absolutely, will do. However I am now also finding myself wanting a way to track:

Are [X] published? [mandatory/optional/conditional/no]

  • If optional or conditional, who decides? [some combination of: author/reviewer/editor]
  • If optional or conditional, when is the decision made? [at submission/after process is complete/other]
  • If optional or conditional, any other details? [free text]
  • Are they assigned a DOI?
  • DOI Type [Crossref peer review / other]
  • DOI ontology [for this object/ one DOI for a bundle of objects]
  • Are they open access? [no/yes/for some articles]
  • What license? [free text]

Where X is:

  • Peer review reports
  • Editorial decision letters
  • Author responses to reviewers
  • Previous versions of the manuscript
  • Reviewer identities

I think @tonyR-H has some other wishes as well.

@dhimmel
Copy link
Member

dhimmel commented Jun 5, 2018

If optional or conditional, who decides?

I don't see a kwalify option for conditional or dependent fields. Therefore, I don't think we can make the schema only allow setting "who decides" if the field is set to optional or conditional. However, we could always ignore the contents of "who decides" in those cases at a later stage. We can still have comments letting users know to fill in the field if optional or conditional is set.

The following three categories make sense with the above fields:

  • Peer review reports
  • Editorial decision letters
  • Author responses to reviewers

I am not sure what "Previous versions of the manuscript" means in this context and I don't think all of the fields above apply to "Reviewer identities".

Other comments: what is "DOI ontology"? What do you mean by "DOI Type"? Is this for the DOI registrar agency, e.g. Crossref, Datacite, ...?

@jpolka
Copy link
Member Author

jpolka commented Jun 5, 2018

Thanks @dhimmel - I mean whether all the review files and kept together in one DOI or whether each reviewer report/author response/decision letter has its own DOI Crossref peer review DOI type.

Good point about just ignoring input for "who decides" when it is not applicable. I think this is a reasonable approach regardless of whether we switch platforms to something that permits conditional responses, as discussed in #36.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants