-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection with and contribution to the MaRDA metadata extractors registry #207
Comments
In Germany, there is also a pretty huge activity in that direction from the FAIRmat consortium, where I am also trying to promote connections. Indeed, the motivation behind splitting RosettaSciIO out of HyperSpy was to create exactly such synergies, without the need for other projects to rely on the much larger parent project. Personally, I think that being focused on the interface to HyperSpy, where there remains enough work to do, I think that the current developers don't have much capacity at the moment to implement the interface to specific other projects, but would welcome any such links/contributions - and could of course help in case something needs to be adapted on the |
Thank you @ml-evs for getting in touch. As @jlaehne mentioned, there is limited resources in the hyperspy community, because the development is driven by each contributor needs for their research and as a community we make sure that work is done in a way that works well, future proof and is useful for the wider community. I am mentioning this explicitly because unlike most other projects we don't have resources (hyperspy doesn't and never had dedicated funding) that we can allocate to a achieve a specific aim/task, instead we are people working together because on their free will because we think that this is useful/needed! To extend a bit on @jlaehne already said, I will try to summarise the situation on metadata handling in the hyperspy community with the hope that it help understanding:
@ml-evs, I have been through some of the documentation available and I couldn't figure out what happen to the metadata, once they have been extracted. How end users are expected to use it?
|
Thanks for such a fruitful discussion! We (MaRDA Extractors WG) are of course aware of the FAIRmat folks - and they're aware of us, as you can see from Markus Scheidgen's contributions to the discussions in the repo. My mentor is also one of the task leaders in FAIRmat Area 3, and both of us (@ml-evs and I) are involved in planning a workshop in Berlin (madices.github.io) on related issues. But it's also understandable that there's a healthy degree of skepticism, as it's easy to over-promise. As for the metadata discussion, it's a difficult topic, and we've spent the best part of a year on it with a similar conclusion: kicking most of the tough bits down the road. My own parsers (as part of To answer the last two questions:
Currently, we have a proof of concept for the first two, and at least a mechanism for requesting metadata or data from the extractor for the third one, but we won't be able to get further without momentum, examples, and consensus, which is why we're having this discussion.
Well, we cannot guarantee the former, and cannot promise help on the latter, but getting your code in a maintained list of "you can extract these files using these codes" cannot hurt discovery. |
@ml-evs thanks for starting this discussion. I must admit that I had plans to follow this through a bit more but as @ericpre pointed out we are kind of limited in our development time. I'm currently trying to graduate which has made me focus my efforts a bit lately.
I've looked through some of .ymal files you have created for MaRDA (for example this is the Renishaw ymal) and it makes me think that it would be a good idea to add to our .ymal file as that might help us to organize our file readers by subject, data type etc and would help with interoperability. For example, things that we should probably add:
Maybe this is a good place to start. In my opinion there isn't a good reason to not have more information in the .ymal file and I'd rather have it defined in one place rather than split between different repositories. It's not a huge ask to get people to add that information as well. |
All sounds good to me! I'm happy to prepare an example of what one given rosetta extractor would look like in our format (though this will now have to wait until the new year).
Just wanted to chime-in with the rendered https://marda-registry.fly.dev/filetypes/renishaw-wdf and API version of this yaml file at https://marda-registry.fly.dev/api/v0.3.0/filetypes/renishaw-wdf so you get an idea of the registry connection. Hope to follow up this connection soon, have a good winter break everyone! |
Hi RosettaSciIO devs, just wanted to make a connection with a MaRDA working group we've been running this year that focuses on interoperability of metadata extraction in materials science/chemistry. We have developed our proof-of-concept registry, schema and API for describing extractor code, their target file types and their automatic execution, with the aim to enhance discoverability of existing initiatives in our field, and promote best practices for scientific ETL.
I came across RosettaSciIO a little while ago and was really happy that our designs look similar and compatible; would there be any interest in contributing the various RosettaSciIO file formats and extractor definitions to our registry? I think this could be readily scripted from your existing yaml definitions.
I won't go into too much more detail (there's info at all the links above if you are interested) -- I'd be very happy to speak to any of the developers about this in the new year, otherwise I'll try to do the work myself (with the benefit of enhanced discoverability of RosettaSciIO an the various formats you support). If you'd like to hear more we will be wrapping the WG with a presentation in January (see marda-alliance/metadata_extractors#21 for details).
cc @PeterKraus and @CSSFrancis (who I think I spoke to over email about this a while ago)
The text was updated successfully, but these errors were encountered: