Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added streams to JSON reader and writer api #14313

Merged
merged 16 commits into from
Oct 30, 2023

Conversation

shrshi
Copy link
Contributor

@shrshi shrshi commented Oct 23, 2023

Description

This PR contributes to #13744.

  • Added stream parameters to public APIs
    cudf::io::read_json
    cudf::io::write_json
  • Added stream gtests
  • Added copy constructor to internal JSON struct that was breaking for non-default streams

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 23, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Oct 23, 2023
@shrshi shrshi added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 23, 2023
@shrshi
Copy link
Contributor Author

shrshi commented Oct 23, 2023

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Oct 23, 2023

/ok to test

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I have a couple of change requests, then this should be good to go.

cpp/include/cudf/io/json.hpp Outdated Show resolved Hide resolved
cpp/tests/streams/io/functions_test.cpp Outdated Show resolved Hide resolved
@shrshi shrshi requested a review from bdice October 23, 2023 22:03
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bdice
Copy link
Contributor

bdice commented Oct 23, 2023

/ok to test

@bdice
Copy link
Contributor

bdice commented Oct 24, 2023

@shrshi Looks like there's a test failure (link to logs).

 C++ exception with description "cudf_identify_stream_usage found unexpected stream!" thrown in the test body.

This indicates that there might be an API that is using the default stream rather than a passed-in user stream.

@vuule
Copy link
Contributor

vuule commented Oct 24, 2023

Can you please also remove unused #include <cudf/utilities/default_stream.hpp> from JSON code? Looks like a missed a few in #13903
IMO it fits in the PR as a part of "don't use the default stream in JSON".

@shrshi
Copy link
Contributor Author

shrshi commented Oct 24, 2023

/ok to test

@karthikeyann karthikeyann self-requested a review October 25, 2023 03:25
@shrshi
Copy link
Contributor Author

shrshi commented Oct 25, 2023

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Oct 25, 2023

Thank you for the feedback, @vuule and @bdice. I'm marking this PR as ready for review now.

@shrshi shrshi marked this pull request as ready for review October 25, 2023 16:07
@shrshi shrshi requested a review from a team as a code owner October 25, 2023 16:07
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Was the copy constructor of column_to_strings_fn the missing piece for the earlier errors?

@shrshi
Copy link
Contributor Author

shrshi commented Oct 25, 2023

Yes, the error was that the implicitly generated copy constructor of column_to_strings_fn was calling the constructor of the string_scalar member with the default stream.

@shrshi
Copy link
Contributor Author

shrshi commented Oct 25, 2023

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Oct 25, 2023

/ok to test

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests can be cleaned up to exclude the default-able parameters.
Looks good otherwise!

Comment on lines +570 to +572
cudf::io::write_json(options_builder.build(),
cudf::test::get_default_stream(),
rmm::mr::get_current_device_resource());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these just be

Suggested change
cudf::io::write_json(options_builder.build(),
cudf::test::get_default_stream(),
rmm::mr::get_current_device_resource());
cudf::io::write_json(options_builder.build());

We have the same values for the defaults. Not sure why mr was specified in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now I see the difference, cudf::get_default_stream vs cudf::test::get_default_stream
does this matter for these tests? I don't see other PRs adding the stream parameter to all tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the stream parameter can be left as is for now. For the moment, cudf::test::get_default_stream is only necessary for the stream tests. That may change in the future, but probably isn't worth planning for right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've cleaned up the rmm::mr default parameter in both the calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what "as is" is in @vyasr 's comment, but it looks like we might be adding the stream parameter to all API calls in the future so we might as do it now.

FWIW, the mr parameter can be removed from write_json calls in this file as well. I'm not blocking the PR on that, though.

@shrshi shrshi requested a review from vuule October 26, 2023 20:06
@shrshi
Copy link
Contributor Author

shrshi commented Oct 26, 2023

/ok to test

Comment on lines 507 to 519
column_to_strings_fn(column_to_strings_fn const& other)
: options_{other.options_},
stream_{other.stream_},
mr_{other.mr_},
narep(other.narep, other.stream_),
struct_value_separator(other.struct_value_separator, other.stream_),
struct_row_begin_wrap(other.struct_row_begin_wrap, other.stream_),
struct_row_end_wrap(other.struct_row_end_wrap, other.stream_),
list_value_separator(other.list_value_separator, other.stream_),
list_row_begin_wrap(other.list_row_begin_wrap, other.stream_),
list_row_end_wrap(other.list_row_end_wrap, other.stream_)
{
}
Copy link
Contributor

@karthikeyann karthikeyann Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column_to_strings_fn is not meant to be copied.
I just added the patch to fix the copy constructor from being invoked. (please apply pre-commit hooks for clang-format, after applying this patch)

Copy link
Contributor

@karthikeyann karthikeyann Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/cpp/src/io/json/write_json.cu b/cpp/src/io/json/write_json.cu
index c98e93aec4..98b35fc943 100644
--- a/cpp/src/io/json/write_json.cu
+++ b/cpp/src/io/json/write_json.cu
@@ -633,17 +633,17 @@ struct column_to_strings_fn {
 
     auto child_string_with_null = [&]() {
       if (child_view.type().id() == type_id::STRUCT) {
-        return (*this).template operator()<cudf::struct_view>(
+        return this->template operator()<cudf::struct_view>(
           child_view,
           children_names.size() > child_index ? children_names[child_index].children
                                               : std::vector<column_name_info>{});
       } else if (child_view.type().id() == type_id::LIST) {
-        return (*this).template operator()<cudf::list_view>(child_view,
+        return this->template operator()<cudf::list_view>(child_view,
                                                             children_names.size() > child_index
                                                               ? children_names[child_index].children
                                                               : std::vector<column_name_info>{});
       } else {
-        return cudf::type_dispatcher(child_view.type(), *this, child_view);
+        return cudf::type_dispatcher<cudf::id_to_type_impl, column_to_strings_fn const&>(child_view.type(), *this, child_view);
       }
     };
     auto new_offsets = cudf::lists::detail::get_normalized_offsets(
@@ -706,17 +706,17 @@ struct column_to_strings_fn {
                      auto const& current_col = thrust::get<1>(i_current_col);
                      // Struct needs children's column names
                      if (current_col.type().id() == type_id::STRUCT) {
-                       return (*this).template operator()<cudf::struct_view>(
+                       return this->template operator()<cudf::struct_view>(
                          current_col,
                          children_names.size() > i ? children_names[i].children
                                                    : std::vector<column_name_info>{});
                      } else if (current_col.type().id() == type_id::LIST) {
-                       return (*this).template operator()<cudf::list_view>(
+                       return this->template operator()<cudf::list_view>(
                          current_col,
                          children_names.size() > i ? children_names[i].children
                                                    : std::vector<column_name_info>{});
                      } else {
-                       return cudf::type_dispatcher(current_col.type(), *this, current_col);
+                       return cudf::type_dispatcher<cudf::id_to_type_impl, column_to_strings_fn const&>(current_col.type(), *this, current_col);
                      }
                    });
 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
column_to_strings_fn(column_to_strings_fn const& other)
: options_{other.options_},
stream_{other.stream_},
mr_{other.mr_},
narep(other.narep, other.stream_),
struct_value_separator(other.struct_value_separator, other.stream_),
struct_row_begin_wrap(other.struct_row_begin_wrap, other.stream_),
struct_row_end_wrap(other.struct_row_end_wrap, other.stream_),
list_value_separator(other.list_value_separator, other.stream_),
list_row_begin_wrap(other.list_row_begin_wrap, other.stream_),
list_row_end_wrap(other.list_row_end_wrap, other.stream_)
{
}
column_to_strings_fn(column_to_strings_fn const& other) = delete;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly other constructors and assignment operators can be deleted too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @karthikeyann. The patch has been applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL you can get the type dispatcher to take the functor by reference.
Thanks @karthikeyann!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job on this, @karthikeyann. 👏

@shrshi
Copy link
Contributor Author

shrshi commented Oct 27, 2023

/ok to test

Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@shrshi
Copy link
Contributor Author

shrshi commented Oct 30, 2023

/merge

@rapids-bot rapids-bot bot merged commit abc0d41 into rapidsai:branch-23.12 Oct 30, 2023
56 of 57 checks passed
@karthikeyann
Copy link
Contributor

does codecov/patch check not stop the merge ?!

@bdice
Copy link
Contributor

bdice commented Oct 31, 2023

@karthikeyann Codecov is purely informational. It tends to have a lot of noise and false positives / false negatives. We are hoping that switching to a new branching strategy (like always merging to dev) will someday fix these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants