Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding to_datacite method #596

Merged
merged 35 commits into from
May 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
1056afb
adding to_cite
djarecka Mar 27, 2021
f3adbd8
using the first contributor as a creator
djarecka Mar 29, 2021
a2d1595
fixing contributors and creators
djarecka Mar 29, 2021
1d0d38a
updating to_datacite
djarecka Apr 2, 2021
dd291ea
removing notes that were added by mistake
djarecka Apr 2, 2021
6c2c358
removing doi, url creation - this should be already in the metadata
djarecka Apr 2, 2021
8d5d0c3
merging
djarecka Apr 2, 2021
16e76ac
fixing a few items; allowing creators to be contributors
djarecka Apr 2, 2021
6930295
adding dandi identifier to relatedIdentifiers
djarecka Apr 2, 2021
7abd530
updates to to_datacite: fixing contribution type (choosing only from …
djarecka Apr 12, 2021
11d7c73
adding doi to PublishedDandisetMeta
djarecka Apr 13, 2021
c13ca95
using meta as an argument in to_datacite
djarecka Apr 13, 2021
e68c83d
adding tests for creating datacite for 04 and 08 datasets
djarecka Apr 13, 2021
3dace98
linting
djarecka Apr 13, 2021
872fc0f
using DATACITE_DEV_PASSWORD secrets to upload doi to datacite
djarecka Apr 13, 2021
8a99f60
resolving conflicts
djarecka Apr 26, 2021
b37cc7b
Merge pull request #595 from djarecka/datacite_tmp
djarecka Apr 26, 2021
d406d60
updating datacite tests, and data files with metadata
djarecka Apr 26, 2021
f642015
fix of buggy conflicts resolve
djarecka Apr 26, 2021
18d6bee
Merge branch 'datacite_tmp' into datacite_upstr_tmp
djarecka Apr 26, 2021
852d3b9
rearranging tests
djarecka Apr 27, 2021
21d99af
adding additional tests
djarecka Apr 27, 2021
61944a2
adding random int as dandi_it to avoid posting the same doi in parallel
djarecka Apr 27, 2021
c2f2c7d
simplifying the get and delete requests
djarecka Apr 27, 2021
4d9e815
adding random int to dand_id to avoid posting the same doi in parallel
djarecka Apr 27, 2021
062d000
fixing PublishedDandisetMeta inheritance
djarecka Apr 27, 2021
1f0e867
using api.dandiarchive.org to get dandisets metadata
djarecka Apr 28, 2021
2d4b82d
linting
djarecka Apr 28, 2021
5960a25
adding tests
djarecka Apr 28, 2021
955054f
small updates after the review
djarecka May 13, 2021
5b7b019
Merge branch 'master' into datacite_upstr_tmp
djarecka May 17, 2021
f462b93
fixing tests: adding assetsSummary and updating dandi_id format
djarecka May 17, 2021
cb56b8f
changing test_datacite so it uses exemplary metadata written in json …
djarecka May 17, 2021
c9818e0
fixing tests - adding some random number to doi to be able to run tes…
djarecka May 18, 2021
2ae1c02
Make exception more specific and informative
yarikoptic May 19, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ jobs:
runs-on: ${{ matrix.os }}
env:
NO_ET: 1
DATACITE_DEV_PASSWORD: ${{ secrets.DATACITE_DEV_PASSWORD }}

strategy:
fail-fast: false
Expand Down
183 changes: 182 additions & 1 deletion dandi/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,188 @@ def diff_models(model1, model2):
print(f"{field} is different")


DATACITE_CONTRTYPE = {
"ContactPerson",
"DataCollector",
"DataCurator",
"DataManager",
"Distributor",
"Editor",
"HostingInstitution",
"Producer",
"ProjectLeader",
"ProjectManager",
"ProjectMember",
"RegistrationAgency",
"RegistrationAuthority",
"RelatedPerson",
"Researcher",
"ResearchGroup",
"RightsHolder",
"Sponsor",
"Supervisor",
"WorkPackageLeader",
"Other",
}


DATACITE_IDENTYPE = {
"ARK",
"arXiv",
"bibcode",
"DOI",
"EAN13",
"EISSN",
"Handle",
"IGSN",
"ISBN",
"ISSN",
"ISTC",
"LISSN",
"LSID",
"PMID",
"PURL",
"UPC",
"URL",
"URN",
"w3id",
}
DATACITE_MAP = dict([(el.lower(), el) for el in DATACITE_IDENTYPE])


def to_datacite(meta):
dandiset_id = meta.identifier.split(":")[1]
doi = meta.doi

attributes = {}
attributes["identifiers"] = [
# TODO: the first elementis ignored, not sure how to fix it...
{"identifier": f"https://doi.org/{doi}", "identifierType": "DOI"},
{
"identifier": f"https://identifiers.org/DANDI:{dandiset_id}/{meta.version}",
"identifierType": "URL",
},
]

attributes["doi"] = doi
attributes["titles"] = [{"title": meta.name}]
attributes["descriptions"] = [
{"description": meta.description, "descriptionType": "Abstract"}
]
attributes["publisher"] = "DANDI Archive"
attributes["publicationYear"] = str(meta.datePublished.year)
# not sure about it dandi-api had "resourceTypeGeneral": "NWB"
attributes["types"] = {"resourceType": "NWB", "resourceTypeGeneral": "Dataset"}
# meta has also attribute url, but it often empty
attributes["url"] = meta.url
# assuming that all licenses are from SPDX?
attributes["rightsList"] = [
{
"schemeURI": "https://spdx.org/licenses/",
"rightsIdentifierScheme": "SPDX",
"rightsIdentifier": el.name,
}
for el in meta.license
]
attributes["schemaVersion"] = "http://datacite.org/schema/kernel-4"

contributors = []
creators = []
for contr_el in meta.contributor:
if RoleType("dandi:Sponsor") in contr_el.roleName:
# no info about "funderIdentifierType", "awardUri", "awardTitle"
dict_fund = {"funderName": contr_el.name}
if contr_el.identifier:
dict_fund["funderIdentifier"] = contr_el.identifier
if contr_el.awardNumber:
dict_fund["awardNumber"] = contr_el.awardNumber
attributes.setdefault("fundingReferences", []).append(dict_fund)
# if no more roles, it shouldn't be added to creators or contributors
contr_el.roleName.remove(RoleType("dandi:Sponsor"))
if not contr_el.roleName:
continue

contr_dict = {
"name": contr_el.name,
"contributorName": contr_el.name,
"schemeURI": "orcid.org",
}
if isinstance(contr_el, Person):
contr_dict["nameType"] = "Personal"
if len(contr_el.name.split(",")) == 2:
contr_dict["familyName"], contr_dict["givenName"] = contr_el.name.split(
","
)
if getattr(contr_el, "affiliation"):
contr_dict["affiliation"] = [
{"name": el.name} for el in contr_el.affiliation
]
else:
contr_dict["affiliation"] = []
elif isinstance(contr_el, Organization):
contr_dict["nameType"] = "Organizational"

if RoleType("dandi:Author") in getattr(contr_el, "roleName"):
create_dict = deepcopy(contr_dict)
create_dict["creatorName"] = create_dict.pop("contributorName")
creators.append(create_dict)
contr_el.roleName.remove(RoleType("dandi:Author"))
# if no more roles, it shouldn't be added to contributors
if not contr_el.roleName:
continue

if getattr(contr_el, "roleName"):
contr_all = [
el.name for el in contr_el.roleName if el.name in DATACITE_CONTRTYPE
]
if contr_all:
contr_dict["contributorType"] = contr_all[0]
else:
contr_dict["contributorType"] = "Other"
contributors.append(contr_dict)

# if there are no creators, the first contributor is also treated as the creator
if not creators and contributors:
creators = [deepcopy(contributors[0])]
creators[0]["creatorName"] = creators[0].pop("contributorName")
creators[0].pop("contributorType")

attributes["contributors"] = contributors
attributes["creators"] = creators

if getattr(meta, "relatedResource"):
attributes["relatedIdentifiers"] = []
for rel_el in meta.relatedResource:
ident = rel_el.identifier.split(":")
if len(ident) == 2:
ident_tp, ident_nr = ident
if ident_tp.lower() in DATACITE_MAP:
ident_tp = DATACITE_MAP[ident_tp.lower()]
else:
raise ValueError(
f"identifier has to be from the list: {DATACITE_IDENTYPE}, "
f"but {ident_tp} provided"
)
else:
raise ValueError(
"identifier is expected to be type:number,"
f" got {rel_el.identifier}"
)
rel_dict = {
"relatedIdentifier": ident_nr,
# in theory it should be from the specific list that contains e.g. DOI, arXiv, ...
"relatedIdentifierType": ident_tp,
"relationType": rel_el.relation.name,
}
attributes["relatedIdentifiers"].append(rel_dict)

if getattr(meta, "keywords"):
attributes["subjects"] = [{"subject": el} for el in meta.keywords]

datacite_dict = {"data": {"id": doi, "type": "dois", "attributes": attributes}}
return datacite_dict


def _sanitize(o):
if isinstance(o, dict):
return {_sanitize(k): _sanitize(v) for k, v in o.items()}
Expand Down Expand Up @@ -977,7 +1159,6 @@ class Publishable(DandiBaseModel):
class PublishedDandisetMeta(DandisetMeta, Publishable):
version: str = Field(readOnly=True, nskey="schema")
doi: str = Field(
None,
title="DOI",
readOnly=True,
regex=r"^10\.[A-Za-z0-9.\/-]+",
Expand Down
1 change: 1 addition & 0 deletions dandi/tests/data/metadata/meta_000004.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"id": "DANDI:000004/draft", "name": "A NWB-based dataset and processing pipeline of human single-neuron activity during a declarative memory task", "about": [{"name": "Medial Temporal Lobe", "schemaKey": "GenericType"}], "access": [{"status": "dandi:Open"}], "license": ["spdx:CC-BY-4.0"], "keywords": ["cognitive neuroscience", "data standardization", "decision making", "declarative memory", "neurophysiology", "neurosurgery", "NWB", "open source", "single-neurons"], "identifier": "DANDI:000004", "repository": "https://dandiarchive.org/", "contributor": [{"name": "Chandravadia, Nand", "email": "nandc10@gmail.com", "roleName": ["dandi:Author", "dandi:ContactPerson", "dandi:DataCurator", "dandi:DataManager", "dandi:FormalAnalysis", "dandi:Investigation", "dandi:Maintainer", "dandi:Methodology", "dandi:ProjectLeader", "dandi:ProjectManager", "dandi:ProjectMember", "dandi:Researcher", "dandi:Software", "dandi:Validation", "dandi:Visualization"], "schemaKey": "Person", "identifier": "0000-0003-0161-4007", "affiliation": [{"name": "Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Liang, Dehua", "email": "liang134@mail.chapman.edu", "roleName": ["dandi:Author", "dandi:Methodology", "dandi:ProjectMember", "dandi:Software", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Institute for Interdisciplinary Brain and Behavioral Sciences, Crean College of Health and Behavioral Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Schjetnan, Andrea Gomez Palacio", "email": "Andrea.Schjetan@uhnresearch.ca", "roleName": ["dandi:Author", "dandi:DataCollector", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "identifier": "0000-0002-4319-7689", "affiliation": [{"name": "Krembil Brain Institute, Toronto Western Hospital, Toronto, Canada", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Carlson, April", "email": "april.carlson@tufts.edu", "roleName": ["dandi:Author", "dandi:DataCurator", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "identifier": "0000-0002-9207-7069", "affiliation": [{"name": "Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Faraut, Mailys", "email": "mailyscm.faraut@gmail.com", "roleName": ["dandi:Author", "dandi:DataCollector", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Chung, Jeffrey M.", "email": "Jeffrey.Chung@cshs.org", "roleName": ["dandi:Author", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Department of Neurology, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Reed, Chrystal M.", "email": "Chrystal.Reed@csmc.edu", "roleName": ["dandi:Author", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Department of Neurology, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Dichter, Ben", "email": "ben.dichter@gmail.com", "roleName": ["dandi:Author", "dandi:Software", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Biological Systems & Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA", "schemaKey": "Organization", "includeInCitation": false}, {"name": "Department of Neurosurgery, Stanford University, Stanford, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Maoz, Uri", "email": "maoz.uri@gmail.com", "roleName": ["dandi:Author", "dandi:Conceptualization", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Institute for Interdisciplinary Brain and Behavioral Sciences, Crean College of Health and Behavioral Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, USA", "schemaKey": "Organization", "includeInCitation": false}, {"name": "Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Kalia, Suneil K.", "email": "suneil.kalia@uhn.ca", "roleName": ["dandi:Author", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Canada", "schemaKey": "Organization", "includeInCitation": false}, {"name": "Krembil Brain Institute, Toronto Western Hospital, Toronto, Canada", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Valiante, Taufik A.", "email": "Taufik.Valiante@uhn.ca", "roleName": ["dandi:Author", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Krembil Brain Institute, Toronto Western Hospital, Toronto, Canada", "schemaKey": "Organization", "includeInCitation": false}, {"name": "Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Canada", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Mamelak, Adam N.", "email": "Adam.Mamelak@cshs.org", "roleName": ["dandi:Author", "dandi:ProjectMember", "dandi:Validation"], "schemaKey": "Person", "affiliation": [{"name": "Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Rutishauser, Ueli", "email": "Ueli.Rutishauser@cshs.org", "roleName": ["dandi:Author", "dandi:Conceptualization", "dandi:FundingAcquisition", "dandi:ProjectMember", "dandi:Resources", "dandi:Software", "dandi:Supervision", "dandi:Validation"], "schemaKey": "Person", "identifier": "0000-0002-9207-7069", "affiliation": [{"name": "Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}, {"name": "Department of Neurology, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}, {"name": "Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA", "schemaKey": "Organization", "includeInCitation": false}, {"name": "Computational and Neural Systems Program, California Institute of Technology, Pasadena, CA, USA", "schemaKey": "Organization", "includeInCitation": false}, {"name": "Center for Neural Science and Medicine, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, USA", "schemaKey": "Organization", "includeInCitation": false}], "includeInCitation": true}, {"name": "Stroke, National Institute of Neurological Disorders and", "roleName": ["dandi:Sponsor"], "schemaKey": "Organization", "awardNumber": "U01NS103792", "includeInCitation": false}, {"name": "Foundation, National Science", "roleName": ["dandi:Sponsor"], "schemaKey": "Organization", "awardNumber": "1554105", "includeInCitation": false}, {"name": "Health, National Institute of Mental", "roleName": ["dandi:Sponsor"], "schemaKey": "Organization", "awardNumber": "R01MH110831", "includeInCitation": false}, {"name": "Neuroscience, McKnight Endowment for", "roleName": ["dandi:Sponsor"], "schemaKey": "Organization", "includeInCitation": false}, {"name": "Foundation, NARSAD Young Investigator grant from the Brain & Behavior Research", "roleName": ["dandi:Sponsor"], "schemaKey": "Organization", "includeInCitation": false}, {"name": "Foundation, Kavli", "roleName": ["dandi:Sponsor"], "schemaKey": "Organization", "includeInCitation": false}, {"name": "initiative, BRAIN", "roleName": ["dandi:Sponsor"], "schemaKey": "Organization", "awardNumber": "U19NS104590", "includeInCitation": false}], "description": "A challenge for data sharing in systems neuroscience is the multitude of different data formats used. Neurodata Without Borders: Neurophysiology 2.0 (NWB:N) has emerged as a standardized data format for the storage of cellular-level data together with meta-data, stimulus information, and behavior. A key next step to facilitate NWB:N adoption is to provide easy to use processing pipelines to import/export data from/to NWB:N. Here, we present a NWB-formatted dataset of 1863 single neurons recorded from the medial temporal lobes of 59 human subjects undergoing intracranial monitoring while they performed a recognition memory task. We provide code to analyze and export/import stimuli, behavior, and electrophysiological recordings to/from NWB in both MATLAB and Python. The data files are NWB:N compliant, which affords interoperability between programming languages and operating systems. This combined data and code release is a case study for how to utilize NWB:N for human single-neuron recordings and enables easy re-use of this hard-to-obtain data for both teaching and research on the mechanisms of human memory.", "schemaVersion": "0.3.0", "relatedResource": [{"url": "https://osf.io/hv7ja/", "name": "A NWB-based Dataset and Processing Pipeline of Human Single-Neuron Activity During a Declarative Memory Task", "relation": "dandi:IsDerivedFrom", "identifier": "DOI:10.17605/OSF.IO/HV7JA", "repository": "Open Science Framework"}, {"url": "https://www.nature.com/articles/s41597-020-0415-9", "relation": "dandi:IsDescribedBy", "identifier": "DOI:10.1038/s41597-020-0415-9"}]}
1 change: 1 addition & 0 deletions dandi/tests/data/metadata/meta_000008.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"id": "DANDI:000008/draft", "name": "Phenotypic variation within and across transcriptomic cell types in mouse motor cortex", "access": [{"status": "dandi:Open"}], "license": ["spdx:CC-BY-4.0"], "identifier": "DANDI:000008", "repository": "https://dandiarchive.org/", "contributor": [{"name": "Scala, Federico", "roleName": ["dandi:DataCollector", "dandi:Author", "dandi:ContactPerson"], "schemaKey": "Person", "includeInCitation": true}, {"name": "Kobak, Dmitry", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0002-5639-7209", "includeInCitation": true}, {"name": "Bernabucci, Matteo", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0003-4458-117X", "includeInCitation": true}, {"name": "Bernaerts, Yves", "roleName": ["dandi:Author"], "schemaKey": "Person", "includeInCitation": true}, {"name": "Cadwell, Cathryn Rene", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0003-1963-8285", "includeInCitation": true}, {"name": "Castro, Jesus Ramon", "roleName": ["dandi:Author"], "schemaKey": "Person", "includeInCitation": true}, {"name": "Hartmanis, Leonard", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0002-4922-8781", "includeInCitation": true}, {"name": "Jiang, Xiaolong", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0001-8066-1383", "includeInCitation": true}, {"name": "Laturnus, Sophie", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0001-9532-788X", "includeInCitation": true}, {"name": "Miranda, Elanine", "roleName": ["dandi:Author"], "schemaKey": "Person", "includeInCitation": true}, {"name": "Mulherkar, Shalaka", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0001-8736-527X", "includeInCitation": true}, {"name": "Tan, Zheng Huan", "roleName": ["dandi:Author"], "schemaKey": "Person", "includeInCitation": true}, {"name": "Yao, Zizhen", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0002-9361-5607", "includeInCitation": true}, {"name": "Zeng, Hongkui", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0002-0326-5878", "includeInCitation": true}, {"name": "Sandberg, Rickard", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0001-6473-1740", "includeInCitation": true}, {"name": "Berens, Philipp", "roleName": ["dandi:Author"], "schemaKey": "Person", "identifier": "0000-0002-0199-4727", "includeInCitation": true}, {"name": "Tolias, Andreas Savas", "roleName": ["dandi:Author", "dandi:ContactPerson"], "schemaKey": "Person", "identifier": "0000-0002-4305-6376", "includeInCitation": true}], "description": "Data from the Tolias Lab shared in the BICCN project", "schemaVersion": "0.3.0", "relatedResource": [{"url": "https://www.biorxiv.org/content/10.1101/2020.02.03.929158v1.full", "relation": "dandi:IsDescribedBy", "identifier": "doi:10.1101/2020.02.03.929158"}]}
Loading