Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2+ dimensional tensors of timestamps crash pd.Series.repr() #151

Closed
frreiss opened this issue Nov 23, 2020 · 2 comments · Fixed by #172
Closed

2+ dimensional tensors of timestamps crash pd.Series.repr() #151

frreiss opened this issue Nov 23, 2020 · 2 comments · Fixed by #172
Assignees
Labels
bug Something isn't working

Comments

@frreiss
Copy link
Member

frreiss commented Nov 23, 2020

Code to reproduce:

import text_extensions_for_pandas as tp
import pandas as pd
import numpy as np

times = pd.date_range('2018-01-01', periods=5, freq='H').to_numpy()
times_repeated = np.tile(times, (3, 1))
times_array = tp.TensorArray(times_repeated)
times_series = pd.Series(times_array)
print(repr(times_series))

Expected result: Display a 3x5 matrix of timestamps

Actual result: Crash from inside a Pandas routine that should only be called for 1-D arrays of timestamps. Stack trace follows.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-b6b4c1b73ef2> in <module>
      7 times_array = tp.TensorArray(times_repeated)
      8 times_series = pd.Series(times_array)
----> 9 repr(times_series)

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/core/series.py in __repr__(self)
   1331             min_rows=min_rows,
   1332             max_rows=max_rows,
-> 1333             length=show_dimensions,
   1334         )
   1335         result = buf.getvalue()

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/core/series.py in to_string(self, buf, na_rep, float_format, header, index, length, dtype, name, max_rows, min_rows)
   1396             max_rows=max_rows,
   1397         )
-> 1398         result = formatter.to_string()
   1399 
   1400         # catch contract violations

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in to_string(self)
    356 
    357         fmt_index, have_header = self._get_formatted_index()
--> 358         fmt_values = self._get_formatted_values()
    359 
    360         if self.truncate_v:

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in _get_formatted_values(self)
    345             None,
    346             float_format=self.float_format,
--> 347             na_rep=self.na_rep,
    348         )
    349 

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting)
   1177     )
   1178 
-> 1179     return fmt_obj.get_result()
   1180 
   1181 

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in get_result(self)
   1208 
   1209     def get_result(self) -> List[str]:
-> 1210         fmt_values = self._format_strings()
   1211         return _make_fixed_width(fmt_values, self.justify)
   1212 

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in _format_strings(self)
   1498             space=self.space,
   1499             justify=self.justify,
-> 1500             leading_space=self.leading_space,
   1501         )
   1502         return fmt_values

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting)
   1177     )
   1178 
-> 1179     return fmt_obj.get_result()
   1180 
   1181 

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in get_result(self)
   1208 
   1209     def get_result(self) -> List[str]:
-> 1210         fmt_values = self._format_strings()
   1211         return _make_fixed_width(fmt_values, self.justify)
   1212 

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in _format_strings(self)
   1468 
   1469         if self.formatter is not None and callable(self.formatter):
-> 1470             return [self.formatter(x) for x in values]
   1471 
   1472         fmt_values = format_array_from_datetime(

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/io/formats/format.py in <listcomp>(.0)
   1468 
   1469         if self.formatter is not None and callable(self.formatter):
-> 1470             return [self.formatter(x) for x in values]
   1471 
   1472         fmt_values = format_array_from_datetime(

~/pd/cn-update/env/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py in __iter__(self)
    565             end_i = min((i + 1) * chunksize, length)
    566             converted = ints_to_pydatetime(
--> 567                 data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp"
    568             )
    569             for v in converted:

pandas/_libs/tslibs/vectorized.pyx in pandas._libs.tslibs.vectorized.ints_to_pydatetime()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
@frreiss frreiss added the bug Something isn't working label Nov 23, 2020
@BryanCutler
Copy link
Member

The issue here is that the Series uses the Pandas Datetime64Formatter which converts the values to a DatetiemIndex that expects to be 1-dimensional. I will look some more for a possible workaround or upstream fix.

@BryanCutler
Copy link
Member

Submitted a fix upstream to Pandas at pandas-dev/pandas#38391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants