Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN/MAINT: Clean and annotate stata reader and writers #31072

Merged
merged 2 commits into from
Jan 26, 2020

Conversation

bashtage
Copy link
Contributor

  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bashtage. Nice! generally lgtm.

We have a Label alias in pandas._typing which I think you could use in several places.

also Union[float, int] is equivalent to just float for typing purposes. My preference is to do this, but others may disagree.

pandas/io/stata.py Outdated Show resolved Hide resolved
pandas/io/stata.py Outdated Show resolved Hide resolved
pandas/io/stata.py Show resolved Hide resolved
pandas/io/stata.py Show resolved Hide resolved
pandas/io/stata.py Outdated Show resolved Hide resolved
pandas/io/stata.py Outdated Show resolved Hide resolved
pandas/io/stata.py Outdated Show resolved Hide resolved
pandas/io/stata.py Outdated Show resolved Hide resolved
pandas/io/stata.py Outdated Show resolved Hide resolved
pandas/io/stata.py Outdated Show resolved Hide resolved
@simonjayhawkins simonjayhawkins added IO Stata read_stata, to_stata Typing type annotations, mypy/pyright type checking labels Jan 16, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Jan 16, 2020
@pep8speaks
Copy link

pep8speaks commented Jan 16, 2020

Hello @bashtage! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-24 10:35:54 UTC

pandas/io/stata.py Outdated Show resolved Hide resolved
@bashtage
Copy link
Contributor Author

Thanks @simonjayhawkins . I've hit a roadblock with the errors below. Is there anything that can be done about them?

pandas/io/stata.py:244: error: Unsupported operand types for * ("int" and "Series")
pandas/io/stata.py:244: error: Unsupported operand types for + ("int" and "Series")
pandas/io/stata.py:289: error: Incompatible return value type (got "datetime", expected "Series")
pandas/io/stata.py:289: error: Unsupported operand types for + ("datetime" and "Series")
pandas/io/stata.py:320: error: Unsupported operand types for // ("Series" and "int")
pandas/io/stata.py:321: error: Incompatible types in assignment (expression has type "int", variable has type "Series")
pandas/io/stata.py:321: error: Unsupported operand types for % ("Series" and "int")
pandas/io/stata.py:322: error: Argument 1 to "convert_year_days_safe" has incompatible type "int"; expected "Series"
pandas/io/stata.py:324: error: Unsupported operand types for // ("Series" and "int")
pandas/io/stata.py:325: error: Unsupported operand types for % ("Series" and "int")
pandas/io/stata.py:326: error: Argument 1 to "convert_year_month_safe" has incompatible type "int"; expected "Series"
pandas/io/stata.py:326: error: Argument 2 to "convert_year_month_safe" has incompatible type "int"; expected "Series"
pandas/io/stata.py:328: error: Unsupported operand types for // ("Series" and "int")
pandas/io/stata.py:329: error: Incompatible types in assignment (expression has type "int", variable has type "Series")
pandas/io/stata.py:329: error: Unsupported operand types for % ("Series" and "int")
pandas/io/stata.py:330: error: Argument 1 to "convert_year_month_safe" has incompatible type "int"; expected "Series"
pandas/io/stata.py:332: error: Unsupported operand types for // ("Series" and "int")
pandas/io/stata.py:333: error: Unsupported operand types for % ("Series" and "int")
pandas/io/stata.py:334: error: Argument 1 to "convert_year_month_safe" has incompatible type "int"; expected "Series"
pandas/io/stata.py:334: error: Argument 2 to "convert_year_month_safe" has incompatible type "int"; expected "Series"
pandas/io/stata.py:336: error: Incompatible types in assignment (expression has type "Series", variable has type "int")
pandas/io/stata.py:338: error: Argument 1 to "convert_year_month_safe" has incompatible type "int"; expected "Series"
pandas/io/stata.py:370: error: Unsupported left operand type for - ("Series")
pandas/io/stata.py:371: error: "bool" has no attribute "values"
pandas/io/stata.py:374: error: "DatetimeIndex" has no attribute "year"
pandas/io/stata.py:374: error: "DatetimeIndex" has no attribute "month"

pandas/io/stata.py Outdated Show resolved Hide resolved
@bashtage bashtage force-pushed the stata-maint branch 2 times, most recently from 3780cbe to 2b288df Compare January 16, 2020 16:09
@simonjayhawkins
Copy link
Member

Thanks @simonjayhawkins . I've hit a roadblock with the errors below. Is there anything that can be done about them?

I'll pull this locally and have a look in a bit.

if version == 114 and convert_strl is not None:
raise ValueError("strl is not supported in format 114")

# TODO: There must be a better way?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a # type: ignore would be reasonable here. just add the mypy error (error: Name 'statawriter' already defined (possibly by an import)) as a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an alternate since they all inherit from statawriter would be to declare something like writer: StataWriter before the if...else blocks and assign to that within blocks

No strong preference either way. The current method is wordier but on the plus side highlights the differences in arguments between each, which isn't bad either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ignored for now and reverted this. I can always go back to the wordier version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think typing writer as StataWriter doesn't work since the different flavors accept different arguments, which triggers a warning. I should fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but fixing it would change the API and so I better not.

pandas/io/stata.py Outdated Show resolved Hide resolved
pandas/io/stata.py Show resolved Hide resolved
pandas/io/stata.py Show resolved Hide resolved
pandas/io/stata.py Show resolved Hide resolved
@bashtage
Copy link
Contributor Author

Any idea how the final block can be typed? Or should I just remove the typing information from this function?

@jreback
Copy link
Contributor

jreback commented Jan 20, 2020

Any idea how the final block can be typed? Or should I just remove the typing information from this function?

cc @simonjayhawkins @WillAyd

@WillAyd
Copy link
Member

WillAyd commented Jan 20, 2020

I think the mypy issues are a result of the arithmetic ops for pandas objects being defined at runtime and not statically. I opened #31160 to track

As far as this PR is concerned I would remove annotations from those functions or type: ignore failing lines, depending on which is easier for you

@bashtage
Copy link
Contributor Author

All green now. I requested a rereview to see if this can get across the line.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtage lgtm

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. needs a rebase and can merge on green.

Ensure all pytest raises are not bare
@bashtage bashtage force-pushed the stata-maint branch 2 times, most recently from cbe52f5 to d6f2278 Compare January 24, 2020 09:50
Add type annotations in Stata
General clean up
Fix one small bug using in rather than ==
Use method to write data
@bashtage
Copy link
Contributor Author

@jreback Green. I squashed to two distinct commits, one for the typing and one for the final bare pytest raises.

@jreback jreback merged commit b54aaf7 into pandas-dev:master Jan 26, 2020
@jreback
Copy link
Contributor

jreback commented Jan 26, 2020

thanks @bashtage very nice!

@jreback
Copy link
Contributor

jreback commented Jan 26, 2020

@jreback Green. I squashed to two distinct commits, one for the typing and one for the final bare pytest raises.

great, though when we merge it gets squashed to one :-> (but history is preserved on this PR)

keechongtan added a commit to keechongtan/pandas that referenced this pull request Jan 27, 2020
…ndexing-1row-df

* upstream/master: (194 commits)
  DOC Remove Python 2 specific comments from documentation (pandas-dev#31198)
  Follow up PR: pandas-dev#28097 Simplify branch statement (pandas-dev#29243)
  BUG: DatetimeIndex.snap incorrectly setting freq (pandas-dev#31188)
  Move DataFrame.info() to live with similar functions (pandas-dev#31317)
  ENH: accept a dictionary in plot colors (pandas-dev#31071)
  PERF: add shortcut to Timestamp constructor (pandas-dev#30676)
  CLN/MAINT: Clean and annotate stata reader and writers (pandas-dev#31072)
  REF: define _get_slice_axis in correct classes (pandas-dev#31304)
  BUG: DataFrame.floordiv(ser, axis=0) not matching column-wise bheavior (pandas-dev#31271)
  PERF: optimize is_scalar, is_iterator (pandas-dev#31294)
  BUG: Series rolling count ignores min_periods (pandas-dev#30923)
  xfail sparse warning; closes pandas-dev#31310 (pandas-dev#31311)
  REF: DatetimeIndex.get_value wrap DTI.get_loc (pandas-dev#31314)
  CLN: internals.managers (pandas-dev#31316)
  PERF: avoid copies if possible in fill_binop (pandas-dev#31300)
  Add test for multiindex json (pandas-dev#31307)
  BUG: passing TDA and wrong freq to TimedeltaIndex (pandas-dev#31268)
  BUG: inconsistency between PeriodIndex.get_value vs get_loc (pandas-dev#31172)
  CLN: remove _set_subtyp (pandas-dev#31301)
  CI: Updated version of macos image (pandas-dev#31292)
  ...
@bashtage bashtage deleted the stata-maint branch July 28, 2020 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Stata read_stata, to_stata Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants