Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unable to build Pandas with xlc on z/OS #35829

Merged
merged 18 commits into from
May 26, 2021

Conversation

pitmanst
Copy link
Contributor

@pitmanst pitmanst commented Aug 20, 2020

This is an attempt at being able to build Pandas using xlc on z/OS. There's two issues being fixed here:

  1. Raise the default standard that being used with xlc++ (-qlanglvl) only when compiling c++ code, and fix other macros that otherwise cause errors when compiling with that option. I put the change directly above where it's used, but an alternate spot that it could go is directly within the declarations of each cpp file + it's respective macros. But since this would need to be added to any additional c++ code that's added, and all extra options are the same, I left it here.

  2. Fix the compilation errors that occur - it's currently missing two functions that aren't in the std:: namespace. The first is that signbit is declared as a macro and not a function, so we cannot redeclare the function without changing the macro. The next, isnan, is declared as both a macro and a function. According to the xlc docs, to use the function you can simply undefine isnan, which is done here, and then it's also placed in the std namespace. To use the macro form we could take the same approach as signbit.

If we don't want to touch the std namespace, an alternative could be to swap the declarations (for z/OS only) within pandas/_libs/window/aggregations.pyx to not use std for those two functions.

@pitmanst
Copy link
Contributor Author

The test failures from the CI build seem to the same as on master.

@alimcmaster1 alimcmaster1 added the Build Library building on various platforms label Aug 20, 2020
@jbrockmendel
Copy link
Member

How common is z/OS? Is it worthwhile/feasible to test it in the CI?

@pitmanst
Copy link
Contributor Author

We've been seeing lots of interest from users in being able to run Pandas on z/OS, and xlc (which this PR fixes) is the most common compiler found on that platform. For CI - on z/OS we're unable to use miniconda/anaconda right now, so at least in it's current form this is not feasible yet - but it's something that we're looking into getting fixed.

@PeterFandelAtRocket
Copy link

Hi Steve, I am with Rocket Software. We have a z/OS Miniconda in beta now. Should be released in a couple weeks. We also have ported Python and pandas to z/OS. If the pandas community is interested in formally adopting z/OS as a platform we would be happy to offer our sources to assist. Which python are you using on z/OS? IBM recently released their own Python port. You can reach me directly at pfandel@rocketsoftware.com. -Peter

@TomAugspurger
Copy link
Contributor

@PeterFandelAtRocket see #33971 for some discussion. "Officially" supporting a platform would need (binary) NumPy packages and freely available CI systems ( ideally through TravisCI or azure-pipelines since that's what we're using), and a champion to handle issues as they come in.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 11, 2020
@TomAugspurger
Copy link
Contributor

@dsaxton I didn't follow #36336 closely, but were we able to implement "only mark as stale if it's awaiting review"? The last action here was a commit, so ideally we wouldn't get a stale label.

@dsaxton
Copy link
Member

dsaxton commented Oct 12, 2020

@dsaxton I didn't follow #36336 closely, but were we able to implement "only mark as stale if it's awaiting review"? The last action here was a commit, so ideally we wouldn't get a stale label.

There are some labels that can be added to make it ignore the PR; right now those are "Needs Review", "Blocked", and "Needs Discussion'". I'm not sure if there's a way of setting conditions based on who made the last action.

@pitmanst
Copy link
Contributor Author

Is there anything that I could do/provide to help get a reviewer to take a look at this PR?

@jbrockmendel
Copy link
Member

@pitmanst the content here looks pretty benign; the limiting factor is our inability to include this in the CI. If we had a good strategy to make that feasible, this would move much more quickly. Absent that, @TomAugspurger is there a plan B?

@WillAyd
Copy link
Member

WillAyd commented Oct 23, 2020

Is there a way to automatically test support for the OS? Perhaps a docker container that can be set up in CI? #30641 may be a good template to work off of

@arw2019 arw2019 added Needs Discussion Requires discussion from core team before further action CI Continuous Integration and removed Stale labels Nov 4, 2020
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

@pitmanst happy to take this if you can merge master, move the note to 1.3, and confirm that this works. note we won't have official support, but can mention this as experimental.

@simonjayhawkins
Copy link
Member

@pitmanst can you merge master to resolve conflicts

@simonjayhawkins
Copy link
Member

@pitmanst can you merge master to resolve conflicts

Thanks @pitmanst for updating

@pitmanst happy to take this if you can merge master, move the note to 1.3, and confirm that this works. note we won't have official support, but can mention this as experimental.

Is this now ready for review?

@pitmanst
Copy link
Contributor Author

Thanks for taking it into consideration. The failures that we see on the checks seem to be unrelated to this change. It should be ready for review.

@pitmanst
Copy link
Contributor Author

I've updated this again, and looks like all checks are passing now. Would someone be able to review this?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, ping on green.

@@ -903,6 +903,9 @@ Styler

Other
^^^^^
- Bug in :class:`Index` constructor sometimes silently ignorning a a specified ``dtype`` (:issue:`38879`)
-
=======
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated entry, pls revert

@@ -913,6 +916,7 @@ Other
- Bug in :func:`pandas.testing.assert_series_equal`, :func:`pandas.testing.assert_frame_equal`, :func:`pandas.testing.assert_index_equal` and :func:`pandas.testing.assert_extension_array_equal` incorrectly raising when an attribute has an unrecognized NA type (:issue:`39461`)
- Bug in :meth:`DataFrame.equals`, :meth:`Series.equals`, :meth:`Index.equals` with object-dtype containing ``np.datetime64("NaT")`` or ``np.timedelta64("NaT")`` (:issue:`39650`)
- Bug in :func:`pandas.util.show_versions` where console JSON output was not proper JSON (:issue:`39701`)
- Fixed Pandas not being able to compile on z/OS when using xlc (:issue:`35826`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a link to xlc

@pitmanst
Copy link
Contributor Author

pitmanst commented Apr 30, 2021

Thanks for the review @jreback. I've addressed the comments and the checks are green.

@jreback jreback added this to the 1.3 milestone Apr 30, 2021
@lithomas1 lithomas1 removed the Needs Discussion Requires discussion from core team before further action label May 8, 2021
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pitmanst. Added some ideas to make things more readable and be mindful of the already huge size of our setup.py.

setup.py Outdated
comp_macros = data.get("macros", macros)
undef_macros = []

if is_platform_zos():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we need this function, why not simply if sys.platform == "zos":?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it to be consistent with how the other platforms handled it. I've minimized the change as suggested.

setup.py Outdated
undef_macros = []

if is_platform_zos():
language = data.get("language", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is already the default. Also, since you aren't using the language more than in the if, I'd simply write if data.get('language') == 'c++': and make this more compact.

This file is huge, so would prefer to not make this unnecessarily longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

setup.py Outdated
if is_platform_zos():
language = data.get("language", None)
if language == "c++":
compiler = os.environ.get("CXX", "/bin/xlc++")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we do assume that if CXX is not in env, the compiler is xlc++?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On z/OS, xlc/xlc++ is the default compiler that is provided, so most users will have it.

setup.py Outdated
compiler = os.environ.get("CXX", "/bin/xlc++")
compiler_name = os.path.basename(compiler)

if (compiler_name == "xlc") or (compiler_name == "xlc++"):
Copy link
Member

@datapythonista datapythonista May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels that all these if could be just a single condition:

if sys.platform == "zos" and data.get('language') == 'c++' and os.path.basename(os.environ.get('CXX')) in ('xlc', 'xlc++'):

Does it make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - I've made this change. Thanks

setup.py Outdated
define_macros=data.get("macros", macros),
extra_compile_args=extra_compile_args,
define_macros=comp_macros,
extra_compile_args=extra_comp_args,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is extra_compile_args being used elsewhere now? Not easy to see in the diff, but just in case you can simply append to extra_compile_args and not make a copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra_compile_args isn't used elsewhere. I've removed the copy and just appended directly to it now.

@@ -913,6 +913,7 @@ Other
- Bug in :func:`pandas.testing.assert_series_equal`, :func:`pandas.testing.assert_frame_equal`, :func:`pandas.testing.assert_index_equal` and :func:`pandas.testing.assert_extension_array_equal` incorrectly raising when an attribute has an unrecognized NA type (:issue:`39461`)
- Bug in :meth:`DataFrame.equals`, :meth:`Series.equals`, :meth:`Index.equals` with object-dtype containing ``np.datetime64("NaT")`` or ``np.timedelta64("NaT")`` (:issue:`39650`)
- Bug in :func:`pandas.util.show_versions` where console JSON output was not proper JSON (:issue:`39701`)
- Fixed Pandas not being able to compile on z/OS when using `xlc <https://www.ibm.com/products/xl-cpp-compiler-zos>`_ (:issue:`35826`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small personal preference, but I'd rephrase as "Let pandas compile with xlc in z/OS" or similar. I don't think it was broken, and now it's fixed, since that was never supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've swapped up the wording for this similar to as you've put it

@pep8speaks
Copy link

pep8speaks commented May 13, 2021

Hello @pitmanst! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-13 19:22:56 UTC

@datapythonista
Copy link
Member

Thanks @pitmanst, great changes, look much clearer and compact now IMO. Can you run black setup.py to see if it solves the with the if line being too long? Otherwise you'll have to splt it in multiple lines manually (and then also run black). Looks good to me after that.

I guess there is not CI (external to pandas) that can test this before we merge it, right?

@pitmanst
Copy link
Contributor Author

Thanks for the reviews @datapythonista! It didn't look like black pandas fixed it, so I just manually did it. There is no CI for this currently - I've been testing the changes myself.

@pitmanst
Copy link
Contributor Author

For CI - is there any documentation anywhere on what's required (env, any packages, how to connect) to add a machine to the CI (i.e. a z/OS machine)? I see that it's running on Azure, but any extra docs would be helpful.

@datapythonista
Copy link
Member

Thanks for the fix @pitmanst. I think in Python is preferred to use brackets instead os backslashes for multiline, Like:

foo = (long_var
       + next_line_var)

foo = long_var + \
    next_line_var

And also, I think it's more readable to have a condition per line.

No big deal, but if we need to iterate further, feel free to check and see if that makes the code more readable.

For the CI, probably more than documentation you can have a look at ci/setup_env.sh.

@pitmanst
Copy link
Contributor Author

Thanks for the pointers on CI - I'll take a look. As for the change - I've applied the suggested change.

@jreback jreback merged commit 036c930 into pandas-dev:master May 26, 2021
@jreback
Copy link
Contributor

jreback commented May 26, 2021

thanks @pitmanst

TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Unable to build Pandas on z/OS when using xlc