Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal, add new top level environment field to manifest #163

Open
tlambert03 opened this issue May 11, 2022 · 14 comments
Open

Proposal, add new top level environment field to manifest #163

tlambert03 opened this issue May 11, 2022 · 14 comments

Comments

@tlambert03
Copy link
Collaborator

Had a very fruitful chat with @ctrueden about how to address #136 and #149 and have a new proposal for how to move forward.

Summary of challenge:
Some plugins will need to agree/communicate about shared resources, such as singletons (think Java virtual machine or QApp). Without communication/agreement, one plugin has the ability to accidentally break another plugin by being the first to "claim" or initialize the shared resource (for example: the first to call jpype.startJVM gets to set the classpaths).

Proposal:

  1. A new top level environment field in the manifest that can be populated with subfields (such as jvm_classpath)
  2. A community-agreed "blessed" napari plugin could query the npe2 plugin manager, gather up the statically collected dependencies, and initialize the shared resource. For example napari-scyjava would be a very thin package that just queries the classpaths gathered by npe2 and initializes the JVM accordingly with scyjava
  3. We document this well, and encourage all packages that need a JVM in napari to import/use napari-scyjava, rather than directly calling jpype.startJVM

Example manifest

name: my-plugin
contributions:
   ...
environment:
   jvm_classpath:
      - 'ome:formats-gpl:6.7.0'
      - 'io.scif:scifio:0.43.1'

Potential problems with this proposal
Assuming we stick with the lazy activation pattern (meaning a plugin is only imported when a user requests functionality from that plugin), this still means that a "rogue" plugin could be activated/called first, breaking all the other plugins. This will most likely pop up as a bug report either to napari or to a "well-behaving" plugin like napari-imagej.

We could "fix" this possibility by having napari-scyjava use an "on-startup" activation event to make sure that that package is poised. (This might involve actually jpype.startJVM ... with potential unintended side-effects ... but maybe not.). If we enable on-startup activation though, then we need to implement more tooling to i) allow users to opt-in to that activation and ii) make clear what is breaking if an exception occurs.

@ctrueden suggests we start with the optimistic approach: just document the preferred pattern well, and encourage people to use scyjava – moving to more aggressive fixes only if bugs start popping up.

Alternate proposals
#149 proposed adding a properties field that would allow arbitrary key/value fields that could follow various conventions. The benefit there is that communities could iterate without any involvement from npe2 (since keys/values are not validate). The problem though is that keys and values are not validated 😆 ... so it has the potential for abuse and is harder to impose correctness. By providing an explicit jvm_classpath field in the environment section here, we do take on some domain awareness... but we don't expect this to happen terribly often (maybe again for R, or julia, or something like that...)

Feedback requested
CC @ctrueden, @gselzer, @nclack, @Carreau, @jni, @sofroniewn

@sofroniewn
Copy link
Contributor

A community-agreed "blessed" napari plugin could query the npe2 plugin manager, gather up the statically collected dependencies, and initialize the shared resource.

At a high level I really like this idea. I think its a good way to move the community towards standardization and cross compatibility, while also allowing development to occur outside of this approach if there is strong need

@ctrueden suggests we start with the optimistic approach: just document the preferred pattern well, and encourage people to use scyjava – moving to more aggressive fixes only if bugs start popping up.

I'm also ok with the optimistic approach. If something breaks because people aren't using napari-scyjava we can direct them there.

I wouldn't require, but would also be happy if this plugin style of plugin were to live in napari GitHub org, so at napari/napari-scyjava and the napari community took on a maintenance responsibility of it alongside some folks in the community that had the need for such a package. Having the package in the napari GitHub org could encourage others to use it. We could ship it by default in the bundle etc.

A new top level environment field in the manifest that can be populated with subfields (such as jvm_classpath)

Something that wasn't clear to me is if they are using napari-scyjava why are they providing jvm_classpath to us in npe2 instead of just to that plugin? would it make sense to include napari-scyjava in the manifest, not just the requirements?

Some plugins will need to agree/communicate about shared resources, such as singletons (think Java virtual machine or QApp). Without communication/agreement, one plugin has the ability to accidentally break another plugin by being the first

I'm curious @tlambert03 @nclack if you have ideas about other environments we might want?

@tlambert03
Copy link
Collaborator Author

Something that wasn't clear to me is if they are using napari-scyjava why are they providing jvm_classpath to us in npe2 instead of just to that plugin? would it make sense to include napari-scyjava in the manifest, not just the requirements?

my understanding is that if plugin A and plugin B both need to provide a classpath added to the JVM when it starts, then if plugin A just started the JVM via napari-scyjava, then scyjava needs some way to know that plugin B also existed, and include their required paths as well on startup...
We could say "deal with this entirely in scyjava": so, scyjava could have it's own entry_point and search the environment for other packages. The upside is that npe2 is entirely left out of it, the downside there is that they have to reinvent the entry_point searching logic, and having this stuff listed in the npe2 manifest might help us in the longer run to create reproducible environments? (really don't know there... it's a good question... @ctrueden?)

@gselzer
Copy link

gselzer commented May 11, 2022

my understanding is that if plugin A and plugin B both need to provide a classpath added to the JVM when it starts, then if plugin A just started the JVM via napari-scyjava, then scyjava needs some way to know that plugin B also existed, and include their required paths as well on startup...

Yup, this is right.

By and large, I think this is a nice plan. I have a couple of questions.

  1. Do we need an actual plugin, or just a library? If napari is going to endorse scyjava as the "way to start the JVM", do we actually need a plugin and an on-startup activation event (side note, this would be new functionality, yeah?) Or can we just document that if you want to run Java code within napari, you start up the JVM with napari_scyjava.start_jvm()?

  2. Should the project be napari-scyjava, or something broader? Leaning on scyjava will bring in some SciJava JARs; will others who want to use the JVM be okay with that? Probably?

@tlambert03
Copy link
Collaborator Author

tlambert03 commented May 11, 2022

Do we need an actual plugin, or just a library?

just a library for now. you're correct, it would only need to be a plugin if it required an activation event.

on-startup activation event... side note, this would be new functionality, yeah?

correct, there is currently no option for activating "on startup"

Should the project be napari-scyjava, or something broader?

I leave that to you guys to decide. From our perspective, we'll gather some static metadata and hand it to whoever asks for it. You (the java community) can agree on what that thing is.

@gselzer
Copy link

gselzer commented May 11, 2022

I leave that to you guys to decide. From our perspective, we'll gather some static metadata and hand it to whoever asks for it. You (the java community) can agree on what that thing is.

I'm sure @ctrueden and I will just use scyjava. Just wanted to see if you napari guys had an opinion!

@ctrueden
Copy link

ctrueden commented May 12, 2022

Leaning on scyjava will bring in some SciJava JARs

It does not:

>>> import scyjava
>>> scyjava.start_jvm()
>>> System = scyjava.jimport('java.lang.System')
>>> System.getProperty('java.class.path')
''

The scyjava project is (ironically) not intended to be "SciJava-specific". Maybe we should rename it.

Should the project be napari-scyjava, or something broader?

Why not call it napari-jvm or napari-java to communicate the intent and scope?

I wouldn't require, but would also be happy if this plugin style of plugin were to live in napari GitHub org, so at napari/napari-scyjava and the napari community took on a maintenance responsibility of it alongside some folks in the community that had the need for such a package. Having the package in the napari GitHub org could encourage others to use it. We could ship it by default in the bundle etc.

👍 👍

We could say "deal with this entirely in scyjava": so, scyjava could have it's own entry_point and search the environment for other packages. The upside is that npe2 is entirely left out of it, the downside there is that they have to reinvent the entry_point searching logic

Hmm, I hadn't thought about that... but then, later we want napari-scyjava (or napari-java or whatever we call it) to provide an option in napari for setting the path to the JVM, so that was going to be another reason for napari-java to exist. So then, if we're going to need a napari-java plugin anyway, why not also leverage the npe2 manifest infrastructure, rather than me needing to implement some other parallel discovery mechanism in scyjava itself?

@ctrueden
Copy link

ctrueden commented May 12, 2022

Here is a summary I wrote up after today's discussion:

How we want to do Java-based plugins from napari in general such that they can work together, register Maven artifacts for a common classpath, etc.

  • On the napari side, add a toplevel key environment to napari.yml schema, with a subkey java_artifacts or similar for enumerating Maven coordinates. (What this issue is about)
  • Make a new component napari-java or some such, which is a thin package depending on napari and scyjava, with a function like prepare_environment() that asks napari/npe2 for all the environment['java_artifacts'] items, and adds them all as endpoints to scyjava.
  • Each Java-enabled plugin e.g. napari-imagej will call napari_java.prepare_environment() immediately on import, so that artifacts needed by all plugins are mixed into the Java environment before any of them start the JVM.
  • We also talked about what to do if a napari plugin tries to use JPype directly without going through napari-java, but we decided YAGNI for now, can potentially solve later by monkey patching JPype, but better to just document best practices for JVM-dependent plugins.
  • napari-java will also have a configuration option, once napari's subsystem for options is in place, for configuring the JVM location. And other needed Java-oriented options (e.g. startup flags for memory) would go there as well. Whereas config options specific to napari-imagej like "location of Fiji.app" go as config options there.

@gselzer
Copy link

gselzer commented May 12, 2022

Leaning on scyjava will bring in some SciJava JARs

It does not:

>>> import scyjava
>>> scyjava.start_jvm()
>>> System = scyjava.jimport('java.lang.System')
>>> System.getProperty('java.class.path')
''

Oh, good. I just assumed that it was leaning on scyjava because some of the utility functions import VersionUtils (such as this one). If we don't require scijava, though, should we not be careful with these imports?

The scyjava project is (ironically) not intended to be "SciJava-specific". Maybe we should rename it.

Can we do that? It's already post 1.0...

Why not call it napari-jvm or napari-java to communicate the intent and scope?

I like napari-java. Using jgo makes napari-jvm a little too narrow for my tastes.

but then, later we want napari-scyjava (or napari-java or whatever we call it) to provide an option in napari for setting the path to the JVM, so that was going to be another reason for napari-java to exist. So then, if we're going to need a napari-java plugin anyway, why not also leverage the npe2 manifest infrastructure, rather than me needing to implement some other parallel discovery mechanism in scyjava itself?

Hmm, I don't think you need a plugin to do this. It was my impression that @tlambert03 planned support for somehting like this without the need for a plugin. But I might be misremembering...

@Carreau
Copy link
Collaborator

Carreau commented May 12, 2022

My main question will be again:

  • why in napari/npe2 ?

Talley respond to some concern in #163 (comment) above,

But the problem I see is that if a library want to opt-in to say exposing a java call path, they have to opt-in in npe2 manifest and potentially have npe2 as a dependency.

So really I think that adding this in manifest is removing a lot of potential flexibility, and does actually cut standardisation possibilities.

And having this stuff listed in the npe2 manifest might help us in the longer run to create reproducible environments

I would be much more incline to bake some understanding of entry_points in npe2, so that you could use npe2 to list values in entry_points than in manifest file. For example a "please re-expose those entry points" key/values pair.

For me this is a similar problem than folks asking for package requirements of notebook files to be put in notebooks. It's great if Jupyter support it and auto-install packages, but that forces conda, pip, and many other tools to bake knowledge of Jupyter in them.

@ctrueden
Copy link

@Carreau I'm willing to bake in a discovery mechanism to scyjava, but am not a Python software ecosystem expert, so would need some advice on how best to make scyjava discover all requested Java classpath elements. E.g. the scyjava library, before starting Java, could walk through everything under site-packages from sys.path, looking for something like .../site-packages/<package-name>/scyjava.properties and aggregating them itself.

My questions about this approach would be:

  1. Is there a more standard way of recording and discovering metadata like this in the Python world? Something like Java's java.util.ServiceLoader. Or should I just roll my own here?

  2. What about common configuration like setting the Java home, and setting JVM startup properties like max heap allocation (-Xmx...)? Would you still support a common napari-java plugin for exposing this common configuration? For now I can put it napari-imagej but it's not an ImageJ/Fiji-specific thing.

@ctrueden
Copy link

ctrueden commented May 12, 2022

And a third question:

  1. What about if the user wants to manually include additional Java libraries, e.g. ImageJ/Fiji plugins to be made available on the classpath for napari-imagej? This would be a useful extensibility point. A static discovery mechanism would by design not provide this—rather, we would need a common napari-java plugin, through which all Java-enabled napari plugins funnel, that not only aggregates all the declaratively specified needed Java artifacts (whether by npe2 or scyjava.properties), but also those specified in as a napari configuration property. I don't see a way to achieve this without having a napari-java plugin, do you?

Edit: I guess if scyjava.properties supports giving a callback function, then scyjava can import modules with such callbacks and execute them before spinning up Java. Does that seem like a good plan to you?

@tlambert03
Copy link
Collaborator Author

Is there a more standard way of recording and discovering metadata like this in the Python world?

That would be https://packaging.python.org/en/latest/specifications/core-metadata/

Which can be queried at runtime with importlib.metadata https://docs.python.org/3/library/importlib.metadata.html

entry_points are typically used to point to a module or something, and they are often used as a plugin discovery mechanism. It's what we use in npe2 to find all of the napari.yaml manifests in the environment. You could conceivably use that mechanism to gather this info (though it's really designed to point to a python namespace, not so much a generic string)

@ctrueden
Copy link

@tlambert03 Awesome! That sounds very appropriate, and I love that it's part of the standard library. I'll try to dig into it soon.

@tlambert03
Copy link
Collaborator Author

tlambert03 commented May 12, 2022

Kinda wonder whether Requires-External could be "abused" for this?

Each entry contains a string describing some dependency in the system that the distribution is to be used. This field is intended to serve as a hint to downstream project maintainers, and has no semantics which are meaningful to the distutils distribution.

The format of a requirement string is a name of an external dependency, optionally followed by a version declaration within parentheses

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants