Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove optional dependencies #5

Closed
jjhelmus opened this issue Jul 19, 2016 · 39 comments · Fixed by #37
Closed

Remove optional dependencies #5

jjhelmus opened this issue Jul 19, 2016 · 39 comments · Fixed by #37

Comments

@jjhelmus
Copy link

I noticed that a number of optional dependencies (scipy, dask, etc) are being listed in the requirement section of this recipe, can these be removed? Does conda-forge have a recommendation on including optional dependencies in a recipe?

@ocefpaf
Copy link
Member

ocefpaf commented Jul 20, 2016

I got a request from AOS users to do that and ensure that everything "just works." But I understand that there is an extra burden on the network to download all those packages.

Unfortunately conda does not have the concept of optional dependencies, so we need to find a balance here. Which ones are a problem for you? I would like to keep dask and pynio. The former I believe you will take advantage too, the latter won't affect you if you are not on PY27.

@jjhelmus
Copy link
Author

As long as these dependencies were intentionally added I have no issue with them being there. Just wanted to make sure this was not included accidentally since they do differ from the requirements of the package in defaults.

@ocefpaf
Copy link
Member

ocefpaf commented Jul 20, 2016

Sometimes I do worry that the extra dependencies might cause the solver to prefer an older version of xarray. We saw that behavior with other packages in the past... Not sure if I should just mimic defaults and tell people to add the optional dependencies in their env files...

@ocefpaf ocefpaf mentioned this issue Jul 23, 2016
@rth
Copy link
Member

rth commented Sep 19, 2017

It is a bit confusing, particularly since setup.py of xarray doesn't require e.g. dask but the xarray conda-forge package does...

I got a request from AOS users to do that and ensure that everything "just works."

I do understand that. The other side, in production projects a too large number of uncontrolled dependencies can be reason enough not to use a package.. For instance in an environment with numpy, scipy, and pandas installed,

conda create -c conda-forge -n test-env numpy scipy pandas python=3.6
source activate test-env

xarray from the defaults conda channel pulls 0 dependencies (beyound itself), whereas xarray from conda-forge pulls 44 (list below). One might install it with --no-deps (and hopefully there is the defaults channel) but I guess there is still a middle ground. I mean with kerberos, libssl2, hdf4, hdf5 and libnetcdf as dependencies for an array manipulation library I do start to wonder.

Interestingly, matplotlib, while it can be used for plotting xarrays, doesn't get pulled, wheras bokeh does..

Maybe at least it could worthwhile to depend on the recently created dask-core package instead of the dask metapackage? So that bokeh etc. wouldn't be pulled when one installs xarray..

Sorry for the rant, xarray / conda-forge are great... I can do a PR to help..

Full list of packages below

    asn1crypto:       0.22.0-py36_0 conda-forge
    bkcharts:         0.2-py36_0    conda-forge
    bokeh:            0.12.7-py36_0            
    bottleneck:       1.2.1-py36_1  conda-forge
    cffi:             1.10.0-py36_0 conda-forge
    chardet:          3.0.4-py36_0  conda-forge
    click:            6.7-py36_0    conda-forge
    cloudpickle:      0.4.0-py36_0  conda-forge
    cryptography:     2.0.3-py36_0  conda-forge
    curl:             7.54.1-0      conda-forge
    dask:             0.15.2-py36_0 conda-forge
    dask-core:        0.15.2-py36_0 conda-forge
    distributed:      1.18.3-py36_0 conda-forge
    h5netcdf:         0.4.2-py_0    conda-forge
    h5py:             2.7.1-py36_1  conda-forge
    hdf4:             4.2.12-0      conda-forge
    hdf5:             1.8.18-0      conda-forge
    heapdict:         1.0.0-py36_0  conda-forge
    idna:             2.5-py36_0    conda-forge
    jinja2:           2.9.6-py36_0  conda-forge
    jpeg:             9b-0          conda-forge
    krb5:             1.14.2-0      conda-forge
    libffi:           3.2.1-3       conda-forge
    libnetcdf:        4.4.1.1-5     conda-forge
    libssh2:          1.8.0-1       conda-forge
    locket:           0.2.0-py36_1  conda-forge
    markupsafe:       1.0-py36_0    conda-forge
    msgpack-python:   0.4.8-py36_0  conda-forge
    netcdf4:          1.2.9-py36_1  conda-forge
    partd:            0.3.8-py36_0  conda-forge
    psutil:           5.2.2-py36_0  conda-forge
    pycparser:        2.18-py36_0   conda-forge
    pyopenssl:        17.2.0-py36_0 conda-forge
    pysocks:          1.6.7-py36_0  conda-forge
    pyyaml:           3.12-py36_1   conda-forge
    requests:         2.18.4-py36_1 conda-forge
    sortedcontainers: 1.5.7-py36_0  conda-forge
    tblib:            1.3.2-py36_0  conda-forge
    toolz:            0.8.2-py36_0  conda-forge
    tornado:          4.5.2-py36_0  conda-forge
    urllib3:          1.22-py36_0   conda-forge
    xarray:           0.9.6-py36_0  conda-forge
    yaml:             0.1.6-0       conda-forge
    zict:             0.1.2-py36_0  conda-forge

@ocefpaf
Copy link
Member

ocefpaf commented Sep 19, 2017

@rth I guess that you can still use defaults version, or even pip to avoid the extra download. The whole goal of conda-forge creation was to break the ties with the "standard" and give users an easier way to get what they need.

I can say for sure that 99% of the users of the conda-forge xarray package wants dask and netcdf4 when installing xarray, and probably 80% wants bottleneck and h5netcdf, but these latter two are pretty light weight. The real heavy-weight there is netcdf4, which pulls hdf5, hdf4`, etc.

However, I'll defer to the rest of the @conda-forge/xarray team here, but I am 👎 to remove dask or netcdf4 b/c the Met/Ocean community relies on this package to do what they expect.

@rth
Copy link
Member

rth commented Sep 19, 2017

I guess that you can still use defaults version, or even pip to avoid the extra download. The whole goal of conda-forge creation was to break the ties with the "standard" and give users an easier way to get what they need.
I can say for sure that 99% of the users of the conda-forge xarray package wants dask and netcdf4 when installing xarray

@ocefpaf Thanks for the detailed explanations! Fair enough, from the perspective of users who install xarray directly I understand.

My issue is with how this fits into the wider ecosystem. Say one works on some scientific package that has ~10 dependencies in a requirement.txt file. One of them is xarray. These can certainly be installed with pip, but because not all have pre-built wheels, and because one doesn't necessarily want to deal with compilation issues, it's getting more and more convenient to install them with conda (this is also due to the gaining popularity of conda forge; thanks for working on it!). Some of the packages are in both the defaults and conda-forge channel, some are only in the latter. The natural thing to do in this case would be to make the installation instructions,

conda install -c conda-forge -f requirements.txt

but then this would pull all the dependencies mentioned above due to xarray. This is problematic for several reasons,

  • contributors won't have a clear understanding what packages are actually used by the project (and what could impact what)
  • as you mentioned more downloads (slower CI, more disk space used, larger Docker images etc).
  • potential long-term maintenance issues: e.g. if I pin the latest version xarray==0.9.6 now, will this still install in a year or two, or would some of those 44 dependencies break.

To fix it I can certainly play with the conda channel priorities (good luck making sure this survives on user's PCs), or split the install in two etc. neither of which is very satisfactory. Another issue is that there is no way of knowing that the conda-forge version of xarray is designed for Met/Ocean community (short of talking to the maintainers) or to what extent this may change in the future.

For my personal use of xarray, this is not an actual issue at the moment. I'm also aware that there might not be much that could be done about this due to the way conda packaging works, so it's a more a general comment. Sorry for the long response.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 19, 2017

Another issue is that there is no way of knowing that the conda-forge version of xarray is designed for Met/Ocean community (short of talking to the maintainers) or to what extent this may change in the future.

Indeed that is not in the docs 😄 but conda-forge was born from the merge of the UK MetOffice and the IOOS conda channels, so it is deeply rooted in the Met/Ocean community 😉

or to what extent this may change in the future.

The future is an unknown. Since the creation of conda-forge we have many other communities joined us, like bioconda and their r-packages.

However, like I said before, even though I am 👎 I do defer this to the team here in case they want to change the current policy.

With that said conda is bringing the concept of optional packages soon we can wait for that. Another alternative would be a xarray-core package and make this feedstock a meta-package that depends on it (dask just did that btw). I don't really like the meta-package solution b/c that would be creating a new package name that does not exist on PyPI.

Sorry for the long response.

No problem. I welcome this kind of discussion, otherwise we won't evolve.

@jhamman
Copy link
Member

jhamman commented Sep 20, 2017

I wonder if we should/could setup a few metapackages (xarray, xarray[full], xarray[min]) to appease everyone. Im not sure exactly how conda handles this but I see it around so it could be an option here.

@jjhelmus
Copy link
Author

Would it make sense to create an xarray-core package that includes xarray and a minimum set of packages which xarray requires but none of the optional packages? Then the xarray package could become a meta-package which requires xarray-core as well as the other "recommended" packages.

This is the layout that the dask-core and dask packages are using.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 20, 2017

@jjhelmus that is what I suggested above. But like I mentioned I am not a big fan of 'expanding the namespace'.

@shoyer
Copy link
Contributor

shoyer commented Sep 20, 2017 via email

@rth
Copy link
Member

rth commented Sep 21, 2017

@ocefpaf Thanks for the interesting discussion!

conda-forge was born from the merge of the UK MetOffice and the IOOS conda channels, so it is deeply rooted in the Met/Ocean community 😉

Aww, I had no idea. Makes sense now )

I'm still a little surprised that conda doesn't have a notion of "optional
dependencies" like pip/setuptools.

Well, there is a PR conda/conda#4982 merged in conda 4.4, but I'm still not able to find good documentation about it. As far as I understood it a different approach than the square bracket notation on pypi conda/conda#3299 (comment) . For instance for dask that didn't seem to work conda-forge/dask-feedstock#22 (comment) hence the reason for going with a meta-package. I agree that cluttering the namespace with multiple names for the same package might not be the solution. Particularly since it would mean that xarray from defaults would correspond to xarray-core from conda-forge which is still confusing. Maybe it's better to wait for a better support of optional dependencies...

@ocefpaf
Copy link
Member

ocefpaf commented Sep 21, 2017

I agree that cluttering the namespace with multiple names for the same package might not be the solution. Particularly since it would mean that xarray from defaults would correspond to xarray-core from conda-forge which is still confusing.

👍 to that! @rth I'd rather break the current setup than going for meta-packages.

Maybe it's better to wait for a better support of optional dependencies...

If that is OK for you I'd rather wait as well.

Note that I am defending my community here, but if people can make a strong argument that removing the dependencies would be better for a larger community I will comply (with some complaining along the way 😉).

@rth
Copy link
Member

rth commented Oct 18, 2017

I can say for sure that 99% of the users of the conda-forge xarray package wants dask and netcdf4 when installing xarray,

For future reference, assuming that h5netcdf is (or will be one day) stable enough, feature complete (with repect to xarray needs) and that xarray will automatically switch to the available netcdf backend, it might be worth considering using it as dependency instead of netcdf. h5netcdf conda package only depends on h5py and hdf5 while the netcdf package will pull curl, hdf4, hdf5, jpeg, krb5, libnetcdf, libssh2 and netcdf4.

@SylvainCorlay
Copy link
Member

We came across this with @jasongrout and we would be in favor of dropping optional dependencies.

Dask and netcdf are pretty heavy dependencies.

For a project like ipyleaflet that depends (loosely) on xarray, having to pull all of these seems a bit heavy handed.

@jhamman
Copy link
Member

jhamman commented Aug 3, 2018

I'm leaning towards an xarray-core package and maintaining xarray as a meta-package.

@SylvainCorlay
Copy link
Member

Wouldn't that be weird to depart from the pure python package naming?

@ocefpaf
Copy link
Member

ocefpaf commented Aug 3, 2018

Wouldn't that be weird to depart from the pure python package naming?

That is one of the reasons I don't really like that solution, the xarray-core name would exist only on the conda realm.

B/c of that I'm more in favor of shipping xarray with the minimum requirements, until conda and let users install the rest manually, at least can support optional dependencies.

@SylvainCorlay
Copy link
Member

If you have constraints on the versions of dask and netcdf that play well with xarray, you can use run_constrained to specify them.

@SylvainCorlay
Copy link
Member

Otherwise, I would lean towards dropping completely unneeded dependencies, and let users install them via another meta package or simply in their environment.

@ocefpaf
Copy link
Member

ocefpaf commented Aug 3, 2018

BTW @SylvainCorlay I looked into the velocity plugin in ipyleaflet and I'm not sure xarray is a good choice for that. It would be more general to take u, and v directly from an array like object. Requiring such a high level object like xarray there does not look like a good pattern IMO.

(But that is an issue for ipyleaflet and not this thread. Sorry for digressing.)

xref.: jupyter-widgets/ipyleaflet#212

@SylvainCorlay
Copy link
Member

Yep! Thanks for looking into it!

Although that is a bit orthogonal to xarray, which is a data structure, depending on dask, a distributed computing framework.

@jasongrout
Copy link

Although that is a bit orthogonal to xarray, which is a data structure, depending on dask, a distributed computing framework.

Actually, the arguments at the ipyleaflet discussion probably apply here as well. Meta-issue: do you include dependencies because you think that lots of users will use it, or because it's actually needed? Same question for both repos :).

@SylvainCorlay
Copy link
Member

SylvainCorlay commented Aug 3, 2018

Actually, the arguments at the ipyleaflet discussion probably apply here as well. Meta-issue: do you include dependencies because you think that lots of users will use it, or because it's actually needed? Same question for both repos :).

Well, ipyleaflet actually uses xarray. There is a question about whether we should use it. (Not the same question.)

@SylvainCorlay
Copy link
Member

And the question was triggered by that thing pulling too many dependencies.

@jasongrout
Copy link

Well, ipyleaflet actually uses xarray. There is a question about whether we should use it. Not the same question.

@ocefpaf's point is that it's only using xarray because it's convenient (i.e., lots of users will use it that way) - and that xarray is not actually needed - that's the connection I saw. But digression aside, both specific questions have good points.

@jhamman jhamman reopened this Aug 3, 2018
@jhamman
Copy link
Member

jhamman commented Aug 3, 2018

Let's keep thinking about this. I want @shoyer to weigh in as well.

@ocefpaf
Copy link
Member

ocefpaf commented Aug 3, 2018

@jhamman note that everything I said here is my personal opinion and not a conda-forge policy. The actual policy is to have the @conda-forge/xarray team decide on what is best for their package.

@jhamman
Copy link
Member

jhamman commented Aug 3, 2018

@ocefpaf. Thanks. Remember, you're on the @conda-forge/xarray team too!

@shoyer
Copy link
Contributor

shoyer commented Aug 3, 2018

My general opinion is that it's best to restrict explicit dependencies to strict requirements. This retains maximum flexibility, at a slight cost of convenience. Optional dependencies can be put in new metapackages, e.g., "xarray[all]" or however you spell that with conda.

@jhamman
Copy link
Member

jhamman commented Aug 3, 2018

Okay, I expect this will cause some pain downstream but I'm okay with this plan. Should we time this with the upcoming minor release (xarray v0.11)?

@ocefpaf
Copy link
Member

ocefpaf commented Aug 3, 2018

however you spell that with conda.

That is the problem, you don't 😄 conda does not have that concept and a meta-package is a bad solution IMO b/c it means two packages: xarray-core with the minimum deps, and xarray with the extra deps. IMO that sucks b/c it deviates from the original name at PyPI. One can do the other way around and have xarray with the minimum deps and xarray-extras with the optional, same problem IMO.

I expect this will cause some pain downstream but I'm okay with this plan.

It is not too bad actually. People will have to install netcdf4, dask, etc manually but those are usually already listed explicitly in most of the scientists envs I see out there.

PS: @SylvainCorlay note that this will break the example in ipyleaflet though! Ironically you are using a remote netcdf file there, so netcdf4 and opendap capabilities will be needed for the example to run anyway!

@ocefpaf
Copy link
Member

ocefpaf commented Aug 3, 2018

I expect this will cause some pain downstream but I'm okay with this plan.

Also, as far as I remember xarray has some nice error messages instead of the regular ImportError, so users will know what to do immediately.

@SylvainCorlay
Copy link
Member

PS: @SylvainCorlay note that this will break the example in ipyleaflet though! Ironically you are using a remote netcdf file there, so netcdf4 and opendap capabilities will be needed for the example to run anyway!

Yes, the binder will need netcdf indeed, although probably not the ipyleaflet package.

@shoyer
Copy link
Contributor

shoyer commented Aug 3, 2018

IMO b/c it means two packages: xarray-core with the minimum deps, and xarray with the extra deps. IMO that sucks b/c it deviates from the original name at PyPI.

I would agree here.

One can do the other way around and have xarray with the minimum deps and xarray-extras with the optional, same problem IMO.

Why is xarray-extras problematic? Presumably it would be a simple package that depends on xarray, dask, netCDF4, etc. without providing any packages itself. This makes it easy to install the entire xarray stack, e.g., with conda install -c conda-forge xarray-extras

@ocefpaf
Copy link
Member

ocefpaf commented Aug 3, 2018

Why is xarray-extras problematic?

B/c it breaks the 1-1 correspondence with PyPI and sounds more like a forkish behavior from the conda community. (We are always criticized for that.).

I'd rather have a single xarray package, with minimum deps, and make pressure on the conda devs to add the optional packages functionality. (Ping @kalefranz 😉)

@jhamman jhamman mentioned this issue Aug 3, 2018
4 tasks
@shoyer
Copy link
Contributor

shoyer commented Aug 4, 2018

Why is xarray-extras problematic?

B/c it breaks the 1-1 correspondence with PyPI and sounds more like a forkish behavior from the conda community. (We are always criticized for that.).

I think xarray-extras only becomes problematic if people start declaring dependencies on it, e.g., in other conda packages. But for end user / application code it's fine.

Ideally, of course, something like xarray[extras] would work. That's consistent with how pip/PyPI/setuptools handle optional dependencies (as specified in setup.py).

@ocefpaf
Copy link
Member

ocefpaf commented Aug 4, 2018

I think xarray-extras only becomes problematic if people start declaring dependencies on it, e.g., in other conda packages. But for end user / application code it's fine.

I agree with you, the rest of the Python community that complains to me at events when I talk about conda/conda-forge does not.

Ideally, of course, something like xarray[extras] would work. That's consistent with how pip/PyPI/setuptools handle optional dependencies (as specified in setup.py).

Yes, that is the dream. I know it is on their road-map but not sure when this feature would land on the conda world. (Wink, wink at @kalefranz again.)

@shoyer shoyer changed the title Remove optional depdencies Remove optional dependencies Aug 5, 2018
@hmaarrfk
Copy link

I think it is possible to have:

  1. The easy installation that the giant xarray package provides. This seems to be a big plus for a few more casual users that I've talked to.
  2. The single recipe to maintain that packagers currently have.
  3. Multiple compatible installation options for those that are looking to slim down their packages.

The square bracket naming convention is unfortunately impossible with conda it seems due to fancy version specifications, although maybe we can use a . (dot) to indicate "options" as opposed to exclusion with the -gpu and -cpu variants. A dot might seem more pythonic 😕

The 3 points can be achieved with a combination of the run-constrained and outputs specification.

A sample recipe can be found here for dask.

Xref: Dask issue: conda-forge/dask-feedstock#46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants