Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops #10016

Merged
merged 10 commits into from
Jan 20, 2022

Conversation

revans2
Copy link
Contributor

@revans2 revans2 commented Jan 11, 2022

These already exist as a part of the AST. Spark's AND/OR implementations follow these requirements and to be able to re-implement it using existing CUDF functionality ended up being very expensive. We found that this one change could cut almost 13% off the total run time on TPC-DS query 28. AND/OR are common enough in all queries we expect this to have a major performance impact generally.

We tried to use the AST version instead, but depending on the hardware used the overhead of AST does not pay for itself when the input/intermediate outputs are boolean columns. It appears to be because the amount of memory transfers saved is relatively small in most boolean cases and on large GPUs like the a100 the intermediate results might even fit entirely in the L2 cache.

@revans2 revans2 added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jan 11, 2022
@revans2 revans2 self-assigned this Jan 11, 2022
@revans2 revans2 requested review from a team as code owners January 11, 2022 15:29
@github-actions github-actions bot added the CMake CMake build issue label Jan 11, 2022
Copy link
Contributor

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor question

@jbrennan333
Copy link
Contributor

I am not seeing src/binaryop/compiled/NullLogicalOr.cu nor src/binaryop/compiled/NullLogicalAnd.cu in the PR?

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add convenience functions in BinaryOperable to make this a bit cleaner to call from the Java API?

@vyasr
Copy link
Contributor

vyasr commented Jan 11, 2022

Will this PR finish the remainder of #8980?

@revans2 revans2 linked an issue Jan 11, 2022 that may be closed by this pull request
@revans2
Copy link
Contributor Author

revans2 commented Jan 11, 2022

Will this PR finish the remainder of #8980?

yes I just linked the two

@codecov
Copy link

codecov bot commented Jan 11, 2022

Codecov Report

Merging #10016 (b3ff013) into branch-22.02 (967a333) will decrease coverage by 0.08%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02   #10016      +/-   ##
================================================
- Coverage         10.49%   10.40%   -0.09%     
================================================
  Files               119      119              
  Lines             20305    20557     +252     
================================================
+ Hits               2130     2139       +9     
- Misses            18175    18418     +243     
Impacted Files Coverage Δ
python/custreamz/custreamz/kafka.py 29.16% <0.00%> (-0.63%) ⬇️
python/dask_cudf/dask_cudf/sorting.py 92.66% <0.00%> (-0.25%) ⬇️
python/dask_cudf/dask_cudf/core.py 70.85% <0.00%> (-0.17%) ⬇️
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dtypes.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/scalar.py 0.00% <0.00%> (ø)
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8fd7dd2...b3ff013. Read the comment docs.

cpp/include/cudf/binaryop.hpp Outdated Show resolved Hide resolved
cpp/tests/binaryop/binop-compiled-test.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/operation.cuh Outdated Show resolved Hide resolved
@revans2
Copy link
Contributor Author

revans2 commented Jan 13, 2022

I posted a small performance improvement that replaced the conditionals with boolean operations. This showed an 11% performance boost on operations that do not have nulls in them. I have no idea why it didn't help the operations that do have nulls in them. It might be nice to look at the AST at some point to see if we can make an improvement there too, but that is separate.

@revans2
Copy link
Contributor Author

revans2 commented Jan 14, 2022

rerun tests

@revans2
Copy link
Contributor Author

revans2 commented Jan 18, 2022

rerun tests

Copy link
Contributor

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this looks good to me.

@revans2
Copy link
Contributor Author

revans2 commented Jan 19, 2022

rerun tests

@revans2
Copy link
Contributor Author

revans2 commented Jan 19, 2022

Something is going on with the build where the python tests are failing and I have no idea what it could be. Some of them have errors like

11:53:37 ________________ ERROR collecting cudf/tests/test_api_types.py _________________
11:53:37 ImportError while importing test module '/workspace/python/cudf/cudf/tests/test_api_types.py'.
11:53:37 Hint: make sure your test modules/packages have valid Python names.
11:53:37 Traceback:
11:53:37 /opt/conda/envs/rapids/lib/python3.8/importlib/__init__.py:127: in import_module
11:53:37     return _bootstrap._gcd_import(name[level:], package, level)
11:53:37 cudf/tests/test_api_types.py:8: in <module>
11:53:37     import cudf
11:53:37 /opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/__init__.py:5: in <module>
11:53:37     validate_setup()
11:53:37 /opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/utils/gpu_utils.py:18: in validate_setup
11:53:37     from rmm._cuda.gpu import (
11:53:37 E   ImportError: cannot import name 'cudaDeviceAttr' from 'rmm._cuda.gpu' (/opt/conda/envs/rapids/lib/python3.8/site-packages/rmm/_cuda/gpu.py)
11:53:37 ________________ ERROR collecting cudf/tests/test_apply_rows.py ________________

They all appear to indicate that I am having problems importing 'cudaDeviceAttr', but that is something cuda specific and I didn't touch anything python, or RMM, etc. Any help debugging this would really be appreciated.

@bdice
Copy link
Contributor

bdice commented Jan 19, 2022

@revans2 CI will fail until #10008 is merged.

@vyasr
Copy link
Contributor

vyasr commented Jan 20, 2022

rerun tests

@revans2
Copy link
Contributor Author

revans2 commented Jan 20, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d5f1aed into rapidsai:branch-22.02 Jan 20, 2022
@revans2 revans2 deleted the null_aware_and_or branch January 20, 2022 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Null-tolerant logical AND and OR operations
8 participants