[DYNOTEARS] TypeError: Index must be integers #86

LukaJakovljevic · 2021-01-05T12:35:07Z

Description

Hi, I have a problem when running DYNOTEARS on top of dataframe.
Seems like the method does not recognise that df.index is int.

Steps to Reproduce

second cell from this example I have a question about Dynotears #74 (comment) (when trying to run dynotears)
also, same error when trying to apply from_numpy_dynamic or from_pandas_dynamic on other dataframes, which have indexes as integers, in increasing order

Expected Result

Executing dynotears on top of dataframe

Actual Result

TypeError: Index must be integers

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-82-5ce17a03d23b> in <module>
      1 from causalnex.structure.dynotears import from_pandas_dynamic
----> 2 g_learnt = from_pandas_dynamic(df,1,lambda_w=.1,lambda_a=.1,w_threshold=.1)
      3 g_learnt

~\anaconda3\envs\test_env\lib\site-packages\causalnex\structure\dynotears.py in from_pandas_dynamic(time_series, p, lambda_w, lambda_a, max_iter, h_tol, w_threshold, tabu_edges, tabu_parent_nodes, tabu_child_nodes)
     98     time_series = [time_series] if not isinstance(time_series, list) else time_series
     99 
--> 100     X, Xlags = DynamicDataTransformer(p=p).fit_transform(time_series, return_df=False)
    101 
    102     col_idx = {c: i for i, c in enumerate(time_series[0].columns)}

~\anaconda3\envs\test_env\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
    569         if y is None:
    570             # fit method of arity 1 (unsupervised transformation)
--> 571             return self.fit(X, **fit_params).transform(X)
    572         else:
    573             # fit method of arity 2 (supervised transformation)

~\anaconda3\envs\test_env\lib\site-packages\causalnex\structure\transformers.py in fit(self, time_series, return_df)
     88         """
     89         time_series = time_series if isinstance(time_series, list) else [time_series]
---> 90         self._check_input_from_pandas(time_series)
     91         self.columns = list(time_series[0].columns)
     92         self.return_df = return_df

~\anaconda3\envs\test_env\lib\site-packages\causalnex\structure\transformers.py in _check_input_from_pandas(self, time_series)
    203 
    204             if t.index.dtype != int:
--> 205                 raise TypeError("Index must be integers")
    206 
    207             if self.columns is not None:

TypeError: Index must be integers

Your Environment

CausalNex version used: 0.9.0
Python version used: 3.8.5
Operating system and version: Windows 10 Pro, x64

The text was updated successfully, but these errors were encountered:

GabrielAzevedoFerreiraQB · 2021-01-06T10:03:21Z

Hi Luka,
It is strange: I tested the code and it worked here

Could you show the steps you are following, please?

A few (hopefully helpful) notes:

The index of the dataframe you provide is quite important in the from_pandas_dynamic function: it represents the cadence of the time series in your data.

For example, the row 0 represents the time stamp 0, i.e. all the features obtained at moment 0, or x_0. The row 1 represents time x_1, and so on. Ideally we have a time series x_0, x_1, x_2... with occasionally some disruption points, where we dont have data for certain time stamps (e.g x_0, x_1, x_2,x_5,x_6,x_7,...). Your index represents this time series

This means a couple of things:

if the index on the df is, for example (0, 1, 3, 5) it means that you don't know what happens on time stamp "2", x_2 (it is missing information and from pandas has a way of dealing with that). If you have (0,2,4,6..) it means that you never have two consecutive events (you have x_0 but not x_1...), and the resulting network will be very different from when you have (0,1,2,3...) as index.
If your index is not an integer, there is no way for dynotears to compute events are consecutive and which are not. This, then, generates an error.
Finally, if the index are integers but not in order (for example, 0,1,2,4,3,5), we throw an error for safety purposes, since is more natural to store a time series in increasing order of events. This avoids the case where the user does not pay attention to the index.

LukaJakovljevic · 2021-01-06T15:19:44Z

Hi @GabrielAzevedoFerreiraQB,

Thank you for the fast answer and explanation.

I have executed the exact cells as in your example, this is what I (and some people that I have recently asked also to install library) get in cell [4]:

I believe this is because df.index.dtype returns int64. This can have something to do with numpy.
Indeed, if you type df.index.dtype == int after cell [3] you get False.
In causalnex\structure\transformers.py line 204 it compares it to int, that's where the error comes from.

When I change that line in code to if t.index.dtype != 'int64' everything works.

Maybe you can change that part in the code, to allow index to also be of type 'int64'?

Thanks,
Luka

GabrielAzevedoFerreiraQB · 2021-01-07T01:10:31Z

That makes sense! We will make a change to allow int64 and possible "other types of int".
Thanks a lot for finding this bug!

if you do not want to change the source code, For now, I suggest trying df.index.dtype = df.index.dtype.astype(int) or
using these versions of pandas and numpy below.

GabrielAzevedoFerreiraQB · 2021-01-07T01:14:45Z

On that note, would you you be able to share what is the Pandas and numpy version you are using?

LukaJakovljevic · 2021-01-07T11:58:03Z

You're welcome!

Here are the versions below:

P.S. I have tried that and similar commands for changing df.index type to plain int at first.
The default int is int64 as you can see, I didn't find a way to make it int using these versions and that's why I had to change code at the end, to extend to this type, which made everything later working.

Let me know if I can help with some further info

oentaryorj · 2021-08-11T12:52:01Z

A more robust integer type checking has been implemented in this commit and will be available in the next CausalNex release.

oentaryorj added the bug Something isn't working label Jul 26, 2021

oentaryorj closed this as completed Aug 11, 2021

oentaryorj self-assigned this Sep 7, 2021

qbphilip mentioned this issue Nov 10, 2021

Release/0.11.0 #141

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DYNOTEARS] TypeError: Index must be integers #86

[DYNOTEARS] TypeError: Index must be integers #86

LukaJakovljevic commented Jan 5, 2021

GabrielAzevedoFerreiraQB commented Jan 6, 2021 •

edited

Loading

LukaJakovljevic commented Jan 6, 2021 •

edited

Loading

GabrielAzevedoFerreiraQB commented Jan 7, 2021 •

edited

Loading

GabrielAzevedoFerreiraQB commented Jan 7, 2021

LukaJakovljevic commented Jan 7, 2021

oentaryorj commented Aug 11, 2021

[DYNOTEARS] TypeError: Index must be integers #86

[DYNOTEARS] TypeError: Index must be integers #86

Comments

LukaJakovljevic commented Jan 5, 2021

Description

Steps to Reproduce

Expected Result

Actual Result

Your Environment

GabrielAzevedoFerreiraQB commented Jan 6, 2021 • edited Loading

A few (hopefully helpful) notes:

LukaJakovljevic commented Jan 6, 2021 • edited Loading

GabrielAzevedoFerreiraQB commented Jan 7, 2021 • edited Loading

GabrielAzevedoFerreiraQB commented Jan 7, 2021

LukaJakovljevic commented Jan 7, 2021

oentaryorj commented Aug 11, 2021

GabrielAzevedoFerreiraQB commented Jan 6, 2021 •

edited

Loading

LukaJakovljevic commented Jan 6, 2021 •

edited

Loading

GabrielAzevedoFerreiraQB commented Jan 7, 2021 •

edited

Loading