-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40942: [Java] Implement C Data Interface for StringView #41967
Conversation
0039b56
to
9d6670b
Compare
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we enable the integration tests?
java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java
Outdated
Show resolved
Hide resolved
java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java
Outdated
Show resolved
Hide resolved
Let me take a look. |
c48353f
to
abd244b
Compare
@lidavidm I updated the PR. |
@lidavidm seems like we have to append an empty data buffer too?[1]: https://github.com/apache/arrow/actions/runs/9414397121/job/25933218677?pr=41967#step:6:8512 [1]. arrow/cpp/src/arrow/c/bridge.cc Line 1742 in a045770
Is this correct according to the spec? |
case Utf8View: | ||
return "vu"; | ||
case BinaryView: | ||
return "VZ"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK the BinaryView format string is lowercase "vz"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments regarding the missing buffer at the end that stores variadic buffer lengths.
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java
Outdated
Show resolved
Hide resolved
8dd7d7d
to
d1f70d7
Compare
@@ -331,7 +360,7 @@ public List<ArrowBuf> visit(ArrowType.Duration type) { | |||
} | |||
|
|||
@Override | |||
public List<ArrowBuf> visit(ListView type) { | |||
public List<ArrowBuf> visit(ArrowType.ListView type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor change to keep the consistency which was a typo made in a previous PR.
@lidavidm I enabled CIs please take a look. Also logic was updated. The initial version didn't have the variadic size buffer. |
@urvishdesai I updated the PR. Could you please take a look? |
@@ -210,9 +210,38 @@ public List<ArrowBuf> visit(ArrowType.Utf8 type) { | |||
} | |||
} | |||
|
|||
private List<ArrowBuf> visitVariableWidthView(ArrowType type) { | |||
final int viewBufferIndex = 1; | |||
try (ArrowBuf view = importFixedBytes(type, viewBufferIndex, BaseVariableWidthViewVector.ELEMENT_SIZE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is view being treated specially here? Shouldn't it be the size buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treated specially means? Sorry I didn't follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the view buffer the first one we import? The one we care about is the sizes buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nothing special there, I can change the order. Validity buffer, view buffer, sizes buffer then variadic buffer.
Any issue doing it this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sense that...you need the sizes buffer to do anything, so import that first, then discard it at the end? Why are we treating the view buffer specially when it's not special here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated, is it okay?
final int variadicBufferReadOffset = 2; | ||
try (ArrowBuf variadicSizeBufferPrime = importBuffer(type, variadicSizeBufferIndex, | ||
variadicSizeBufferCapacity)) { | ||
variadicSizeBufferPrime.getReferenceManager().retain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And shouldn't we discard the variadic size buffer at the end, since we don't need it anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, probably yes, let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm... I thought the try-with-resource would be enough?
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java
Outdated
Show resolved
Hide resolved
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3711657. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 52 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Recent inclusion of
Utf8View
andBinaryView
support to Java also requires adding C Data interface for integrating it with other systems.What changes are included in this PR?
Utf8View
andBinaryView
RoundtripTest
StreamingTest
Are these changes tested?
Yes, with new tests.
Are there any user-facing changes?
No