Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module 'dask' has no attribute 'sharedict' #454

Closed
akter-pi opened this issue Feb 12, 2019 · 7 comments
Closed

module 'dask' has no attribute 'sharedict' #454

akter-pi opened this issue Feb 12, 2019 · 7 comments

Comments

@akter-pi
Copy link

akter-pi commented Feb 12, 2019

Hi I am trying to use incremental from dask. My code is simple and following the existing example. My code is below. But I got error when it calls inc.fit and the error is "AttributeError: module 'dask' has no attribute 'sharedict'". I am using dask 1.1.1 and dask_ml 0.11.0

``

from dask.distributed import Client
client = Client()

import dask

import dask.dataframe as dd
import dask.array as da

df = dd.read_csv('BreastCancer.csv')
X = df.drop(['Id','Class','Bare.nuclei'],axis=1)
y = df['Class']=='benign'

from dask_ml.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X.to_dask_array(True), y.to_dask_array(True))

X_train, X_test, y_train, y_test = dask.persist(X_train, X_test, y_train, y_test)
classes = da.unique(y_train).compute()

from sklearn.linear_model import SGDClassifier
est = SGDClassifier(loss='log', penalty='l2', tol=1e-3)

from dask_ml.wrappers import Incremental
inc = Incremental(est, scoring='accuracy')

inc.fit(X_train, y_train, classes=classes)
``

The error is
`AttributeError Traceback (most recent call last)
in
25 inc = Incremental(est, scoring='accuracy')
26
---> 27 inc.fit(X_train, y_train, classes=classes)

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask_ml/wrappers.py in fit(self, X, y, **
fit_kwargs)
462 def fit(self, X, y=None, **fit_kwargs):
463 estimator = sklearn.base.clone(self.estimator)
--> 464 self._fit_for_estimator(estimator, X, y, **fit_kwargs)
465 return self
466

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask_ml/wrappers.py in _fit_f(self, estim
ator, X, y, **fit_kwargs)
453 random_state=self.random_state,
454 shuffle_blocks=self.shuffle_blocks,
--> 455 **fit_kwargs
456 )
457

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask_ml/_partial.py in fit(model, x, y, c
ompute, shuffle_blocks, random_state, **kwargs)
195 )
196
--> 197 new_dsk = dask.sharedict.merge((name, dsk), x.dask, getattr(y, "dask", {}))
198 value = Delayed((name, nblocks - 1), new_dsk)
199

AttributeError: module 'dask' has no attribute 'sharedict'
`

@akter-pi
Copy link
Author

If i import dask.sharedict then that error is solved but shows another error.


ValueError Traceback (most recent call last)
in
25 inc = Incremental(est, scoring='accuracy')
26
---> 27 inc.fit(X_train, y_train, classes=classes)

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask_ml/wrappers.py in fit(self, X, y, **fit_kwargs)
462 def fit(self, X, y=None, **fit_kwargs):
463 estimator = sklearn.base.clone(self.estimator)
--> 464 self._fit_for_estimator(estimator, X, y, **fit_kwargs)
465 return self
466

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask_ml/wrappers.py in _fit_f(self, estimator, X, y, **fit_kwargs)
453 random_state=self.random_state,
454 shuffle_blocks=self.shuffle_blocks,
--> 455 **fit_kwargs
456 )
457

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask_ml/_partial.py in fit(model, x, y, compute, shuffle_blocks, random_state, **kwargs)
199
200 if compute:
--> 201 return value.compute()
202 else:
203 return value

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask/base.py in compute(self, **kwargs)
154 dask.base.compute
155 """
--> 156 (result,) = compute(self, traverse=False, **kwargs)
157 return result
158

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask/base.py in compute(*args, **kwargs)
393 get=kwargs.pop('get', None))
394
--> 395 dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
396 keys = [x.dask_keys() for x in collections]
397 postcomputes = [x.dask_postcompute() for x in collections]

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask/base.py in collections_to_dsk(collections, optimize_graph, **kwargs)
185 groups = groupby(optimization_function, collections)
186 groups = {opt: _extract_graph_and_keys(val)
--> 187 for opt, val in groups.items()}
188
189 for opt in optimizations:

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask/base.py in (.0)
185 groups = groupby(optimization_function, collections)
186 groups = {opt: _extract_graph_and_keys(val)
--> 187 for opt, val in groups.items()}
188
189 for opt in optimizations:

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/dask/base.py in _extract_graph_and_keys(vals)
210 graph = HighLevelGraph.merge(*graphs)
211 else:
--> 212 graph = merge(*graphs)
213
214 return graph, keys

/mnt/d/PI.X/py3.5_venv/lib/python3.5/site-packages/toolz/dicttoolz.py in merge(*dicts, **kwargs)
37 rv = factory()
38 for d in dicts:
---> 39 rv.update(d)
40 return rv
41

ValueError: dictionary update sequence element #0 has length 43; 2 is required

@jrbourbeau
Copy link
Member

Thanks for opening up this issue @akter-pi! Can you provide a minimal example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

You might consider trying to use the latest development version of dask_ml, which includes some compatibility updates related to Dask deprecating ShareDict in favor of HighLevelGraph (see #439 for reference)

@TomAugspurger
Copy link
Member

Hmm I wonder if I forgot to release after fixing e211338...

I'll put a new version up on PyPI and conda-forge quick.

@TomAugspurger
Copy link
Member

@akter-pi does it work for you on master?

@stsievert
Copy link
Member

stsievert commented Feb 13, 2019

It doesn't work for me on master.

from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_classification
from dask_ml.wrappers import Incremental
import dask.array as da
from dask_ml.datasets import make_classification
import dask_ml
print(dask_ml.__version__)  # 0.11.1.dev59+gf5d3b1c


if __name__ == "__main__":
    est = SGDClassifier()

    inc = Incremental(est)
    n, d = 100, 5
    X, y = make_classification(n_samples=n, n_features=d, chunks=n // 10)

    inc.fit(X, y, classes=da.unique(y))

Traceback:

In [32]: %run test.py
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
~/Developer/stsievert/dask-ml/dask_ml/_partial.py in fit(model, x, y, compute, shuffle_blocks, random_state, **kwargs)
    201     try:
--> 202         from dask.highlevelgraph import HighLevelGraph
    203

ModuleNotFoundError: No module named 'dask.highlevelgraph'

During handling of the above exception, another exception occurred:

AssertionError Traceback (most recent call last)
~/Developer/stsievert/dask-ml/test.py in ()
14 X, y = make_classification(n_samples=n, n_features=d, chunks=n // 10)
15
---> 16 inc.fit(X, y, classes=da.unique(y))
17 inc.score(X, y)

~/Developer/stsievert/dask-ml/dask_ml/wrappers.py in fit(self, X, y, **fit_kwargs)
483 def fit(self, X, y=None, **fit_kwargs):
484 estimator = sklearn.base.clone(self.estimator)
--> 485 self._fit_for_estimator(estimator, X, y, **fit_kwargs)
486 return self
487

~/Developer/stsievert/dask-ml/dask_ml/wrappers.py in _fit_for_estimator(self, estimator, X, y, **fit_kwargs)
474 random_state=self.random_state,
475 shuffle_blocks=self.shuffle_blocks,
--> 476 **fit_kwargs
477 )
478

~/Developer/stsievert/dask-ml/dask_ml/_partial.py in fit(model, x, y, compute, shuffle_blocks, random_state, **kwargs)
206 from dask import sharedict
207
--> 208 new_dsk = sharedict.merge(graphs.values)
209
210 value = Delayed((name, nblocks - 1), new_dsk)

~/anaconda3/lib/python3.6/site-packages/dask/sharedict.py in merge(*dicts)
97 result.update_with_key(d, key=key)
98 else:
---> 99 result.update_with_key(d)
100 return result

~/anaconda3/lib/python3.6/site-packages/dask/sharedict.py in update_with_key(self, arg, key)
62 key = id(arg)
63
---> 64 assert isinstance(arg, dict)
65 if arg:
66 self.dicts[key] = arg

AssertionError:

In [33]:

@TomAugspurger
Copy link
Member

Ah. I suppose it works with dask master? Or at least the latest released version of dask?

@jrbourbeau
Copy link
Member

jrbourbeau commented Feb 13, 2019

I can confirm @stsievert example (thanks for posting the example!) works with both dask-ml master + dask master and dask-ml master and latest dask release (1.1.1), but breaks with dask 1.0.0. I think it's because we're passing the built-in dictionary values method to sharedict.merge instead of the dictionary values themselves.

new_dsk = sharedict.merge(graphs.values)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants