Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenence: update mybinder environment.yml dependency versions #62

Closed
GenevieveBuckley opened this issue Feb 27, 2019 · 10 comments
Closed

Comments

@GenevieveBuckley
Copy link
Contributor

I noticed that the mybinder environment.yml file pins dask to version 0.20, but the latest dask release is now up to 1.1.2. It's probably time to update or unpin some of these dependencies. Should we do that?

Currently:
https://github.com/dask/dask-examples/blob/master/binder/environment.yml

channels:
  - conda-forge
dependencies:
  - python=3
  - bokeh=0.13
  - dask=0.20
  - dask-ml=0.10.0
  - distributed=1.24
  - jupyterlab=0.35.1
  - nodejs=8.9
  - numpy
  - pandas
  - pyarrow==0.10.0
  - scikit-learn=0.20
  - matplotlib
  - nbserverproxy
  - nomkl
  - h5py
  - xarray
  - bottleneck
  - py-xgboost
  - pip:
    - graphviz
    - dask_xgboost
    - seaborn
    - mimesis
@mrocklin
Copy link
Member

mrocklin commented Feb 27, 2019 via email

@GenevieveBuckley
Copy link
Contributor Author

Sure, I can do that.

Do you prefer to have things unpinned if possible, or just increment to a newer release? We'll probably run into this problem again (outdated dependencies) before too long if we keep things pinned, so I'd prefer not to. But I'd understand if you feel pull requests aren't frequent enough on this repo to trigger tests and alert us of any future issues.

@mrocklin
Copy link
Member

mrocklin commented Feb 27, 2019 via email

@GenevieveBuckley
Copy link
Contributor Author

To update: I think it's better to put this issue on hold until the next dask-ml release is available.

I think this because upgrading dask versions has unearthed a few bugs, and the new release should contain some of the fixes we need.

Error 1

AttributeError: module 'dask' has no attribute 'sharedict', seen in the dask-examples/machine-learning/incremental.ipynb notebook. This is addressed by the merged PR dask/dask-ml#455,

Error 2

ValueError: high is out of bounds for int32, `, seen in the dask-examples/machine-learning.ipynb notebook. I think the open pull request dask/dask-ml#462 is likely to resolve this.

Error 3

I found that dask dataframe .std() produces error in cases where there are NaNs in output. Seen in the dask-examples/dataframes.ipynb notebook.
I raised an issue for it here: dask/dask#4534, and the associated pull request is here: dask/dask#4535
We can sidestep it entirely by replacing the dask dataframe std() example with mean() instead. I've changed that in the example notebook, so we don't have to wait for this bugfix to be released.

Other potential issues ahead

I also want to update the version of bokeh. Currently it's at 0.13 and the latest is >= 1.0.0 (a stable version 1.1.0 seems pretty close to release). I anticipate this will mean we'll have to update the examples using bokeh in the notebooks, as there are quite a few changes here.

CI build logs

My branch is here: https://github.com/GenevieveBuckley/dask-examples/tree/update-binder-env
You can take a look at the CI build logs here: https://travis-ci.com/GenevieveBuckley/dask-examples/branches

@mrocklin
Copy link
Member

mrocklin commented Feb 28, 2019 via email

@mrocklin
Copy link
Member

Also, thanks for driving this and unearthing these bugs @GenevieveBuckley ! It's a great help.

@TomAugspurger
Copy link
Member

TomAugspurger commented Feb 28, 2019 via email

@TomAugspurger
Copy link
Member

TomAugspurger commented Feb 28, 2019 via email

@mrocklin
Copy link
Member

I've changed that in the example notebook, so we don't have to wait for this bugfix to be released.

If possible I'd like to revert this change. I'd like for us to avoid having the examples work around bugs. I'm more than happy to prioritize bugfixes that block this. (I think that the underlying issue has now been fixed, and a dask/dask release is imminent).

@GenevieveBuckley
Copy link
Contributor Author

I've changed that in the example notebook, so we don't have to wait for this bugfix to be released.

If possible I'd like to revert this change. I'd like for us to avoid having the examples work around bugs. I'm more than happy to prioritize bugfixes that block this. (I think that the underlying issue has now been fixed, and a dask/dask release is imminent).

Totally fine, I can revert that change. I was not expecting everyone to be so quick with bugfixes & new releases!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants