Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40942: [Java] Implement C Data Interface for StringView #41967

Merged
merged 36 commits into from
Jun 21, 2024

Conversation

vibhatha
Copy link
Collaborator

@vibhatha vibhatha commented Jun 4, 2024

Rationale for this change

Recent inclusion of Utf8View and BinaryView support to Java also requires adding C Data interface for integrating it with other systems.

What changes are included in this PR?

  • Adding core functionality for C Data interface for Utf8View and BinaryView
  • Adding RoundtripTest
  • Adding StreamingTest

Are these changes tested?

Yes, with new tests.

Are there any user-facing changes?

No

@vibhatha vibhatha force-pushed the stringview-c-data-interface branch 2 times, most recently from 0039b56 to 9d6670b Compare June 6, 2024 08:05
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 6, 2024
@vibhatha vibhatha marked this pull request as ready for review June 6, 2024 08:19
@vibhatha vibhatha requested a review from lidavidm as a code owner June 6, 2024 08:19
@vibhatha vibhatha changed the title GH-40942: [Java] Implement C Data Interface for StringView [WIP] GH-40942: [Java] Implement C Data Interface for StringView Jun 6, 2024
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable the integration tests?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 7, 2024
@vibhatha
Copy link
Collaborator Author

vibhatha commented Jun 7, 2024

Can we enable the integration tests?

Let me take a look.

@vibhatha vibhatha force-pushed the stringview-c-data-interface branch from c48353f to abd244b Compare June 7, 2024 07:58
@vibhatha
Copy link
Collaborator Author

vibhatha commented Jun 7, 2024

@lidavidm I updated the PR.

@vibhatha vibhatha requested a review from lidavidm June 7, 2024 07:59
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 7, 2024
@vibhatha
Copy link
Collaborator Author

vibhatha commented Jun 7, 2024

@lidavidm seems like we have to append an empty data buffer too?[1]: https://github.com/apache/arrow/actions/runs/9414397121/job/25933218677?pr=41967#step:6:8512

[1].

return Status::Invalid("Expected at least 3 buffers for imported type ",

Is this correct according to the spec?

case Utf8View:
return "vu";
case BinaryView:
return "VZ";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the BinaryView format string is lowercase "vz"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.

Copy link

@urvishdesai urvishdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments regarding the missing buffer at the end that stores variadic buffer lengths.

@vibhatha vibhatha force-pushed the stringview-c-data-interface branch from 8dd7d7d to d1f70d7 Compare June 9, 2024 01:21
@@ -331,7 +360,7 @@ public List<ArrowBuf> visit(ArrowType.Duration type) {
}

@Override
public List<ArrowBuf> visit(ListView type) {
public List<ArrowBuf> visit(ArrowType.ListView type) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor change to keep the consistency which was a typo made in a previous PR.

@vibhatha
Copy link
Collaborator Author

vibhatha commented Jun 9, 2024

@lidavidm I enabled CIs please take a look. Also logic was updated. The initial version didn't have the variadic size buffer.

@vibhatha
Copy link
Collaborator Author

vibhatha commented Jun 9, 2024

Added some comments regarding the missing buffer at the end that stores variadic buffer lengths.

@urvishdesai I updated the PR. Could you please take a look?

java/c/src/test/python/integration_tests.py Show resolved Hide resolved
java/c/src/test/python/integration_tests.py Show resolved Hide resolved
java/c/src/main/java/org/apache/arrow/c/ArrayExporter.java Outdated Show resolved Hide resolved
@@ -210,9 +210,38 @@ public List<ArrowBuf> visit(ArrowType.Utf8 type) {
}
}

private List<ArrowBuf> visitVariableWidthView(ArrowType type) {
final int viewBufferIndex = 1;
try (ArrowBuf view = importFixedBytes(type, viewBufferIndex, BaseVariableWidthViewVector.ELEMENT_SIZE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is view being treated specially here? Shouldn't it be the size buffer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treated specially means? Sorry I didn't follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the view buffer the first one we import? The one we care about is the sizes buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nothing special there, I can change the order. Validity buffer, view buffer, sizes buffer then variadic buffer.
Any issue doing it this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sense that...you need the sizes buffer to do anything, so import that first, then discard it at the end? Why are we treating the view buffer specially when it's not special here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated, is it okay?

final int variadicBufferReadOffset = 2;
try (ArrowBuf variadicSizeBufferPrime = importBuffer(type, variadicSizeBufferIndex,
variadicSizeBufferCapacity)) {
variadicSizeBufferPrime.getReferenceManager().retain();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And shouldn't we discard the variadic size buffer at the end, since we don't need it anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, probably yes, let me check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... I thought the try-with-resource would be enough?

@github-actions github-actions bot added the awaiting changes Awaiting changes label Jun 19, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 19, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 19, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 19, 2024
@vibhatha vibhatha requested a review from lidavidm June 20, 2024 00:13
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 20, 2024
@vibhatha vibhatha requested a review from lidavidm June 20, 2024 08:49
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 20, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jun 21, 2024
@lidavidm lidavidm merged commit 3711657 into apache:main Jun 21, 2024
17 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Jun 21, 2024
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3711657.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 52 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants