Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41656: [MATLAB] Add C Data Interface format import/export functionality for arrow.array.Array #41737

Merged
merged 17 commits into from
May 20, 2024

Conversation

kevingurney
Copy link
Member

@kevingurney kevingurney commented May 20, 2024

Rationale for this change

Now that #41653 and #41654 have been addressed, we should add MATLAB APIs for importing/exporting arrow.array.Array objects using the C Data Interface format.

This pull request adds two new APIs for importing and exporting arrow.array.Array objects using the C Data Interface format.

Example

>> expected = arrow.array([1, 2, 3]) 

expected = 

  Float64Array with 3 elements and 0 null values:

    1 | 2 | 3

>> cArray = arrow.c.Array()

cArray = 

  Array with properties:

    Address: 140341875084944

>> cSchema = arrow.c.Schema()

cSchema = 

  Schema with properties:

    Address: 140341880022320

% Export the Array to C Data Interface Format
>> expected.export(cArray.Address, cSchema.Address)

% Import the Array from C Data Interface Format
>> actual = arrow.array.Array.import(cArray, cSchema)

actual = 

  Float64Array with 3 elements and 0 null values:

    1 | 2 | 3

% The Array is the same after round-tripping to C Data Interface format
>> isequal(actual, expected)

ans =

  logical

   1

What changes are included in this PR?

  1. Added new arrow.array.Array.export(cArrowArrayAddress, cArrowSchemaAddress) method for exporting Array objects to C Data Interface format.
  2. Added new static arrow.array.Array.import(cArray, cSchema) method for importing Arrays from C Data Interface format.
  3. Added new internal arrow.c.internal.ArrayImporter class for importing Array objects from C Data Interface format.

Are these changes tested?

Yes.

  1. Added new test file matlab/test/arrow/c/tRoundTrip.m with basic round-trip tests for importing/exporting Array objects using the C Data Interface format.

Are there any user-facing changes?

Yes.

  1. There are now two new user-facing APIs added to the arrow.array.Array class. These are arrow.array.Array.export(cArrowArrayAddress, cArrowSchemaAddress) and arrow.array.Array.import(cArray, cSchema). These APIs can be used to import/export Array objects using the C Data Interface format.

Future Directions

  1. Add integration tests for sharing data between MATLAB/mlarrow and Python/pyarrow running in the same process using the MATLAB interface to Python.
  2. Add support for exporting/importing arrow.tabular.RecordBatch objects using the C Data Interface format.
  3. Add support for the Arrow C stream interface format.

Notes

  1. Thanks @sgilmore10 for your help with this pull request!

Copy link
Member

@sgilmore10 sgilmore10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. But looks good to me!

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

matlab/src/cpp/arrow/matlab/array/proxy/array.cc Outdated Show resolved Hide resolved
matlab/src/cpp/arrow/matlab/array/proxy/array.cc Outdated Show resolved Hide resolved
matlab/src/cpp/arrow/matlab/c/proxy/array_importer.cc Outdated Show resolved Hide resolved
matlab/src/cpp/arrow/matlab/c/proxy/array_importer.cc Outdated Show resolved Hide resolved
matlab/src/cpp/arrow/matlab/c/proxy/array_importer.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels May 20, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels May 20, 2024
kevingurney and others added 5 commits May 20, 2024 11:32
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@github-actions github-actions bot added awaiting review Awaiting review awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 20, 2024
@github-actions github-actions bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting change review Awaiting change review labels May 20, 2024
@kevingurney
Copy link
Member Author

+1

@kevingurney kevingurney merged commit 5809daf into apache:main May 20, 2024
9 checks passed
@kevingurney kevingurney deleted the GH-41656 branch May 20, 2024 17:53
@kevingurney kevingurney removed the awaiting review Awaiting review label May 20, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 5809daf.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…nctionality for `arrow.array.Array` (apache#41737)

### Rationale for this change

Now that apache#41653 and apache#41654 have been addressed, we should add MATLAB APIs for importing/exporting `arrow.array.Array` objects using the C Data Interface format.

This pull request adds two new APIs for importing and exporting `arrow.array.Array` objects using the C Data Interface format.

#### Example

```matlab
>> expected = arrow.array([1, 2, 3]) 

expected = 

  Float64Array with 3 elements and 0 null values:

    1 | 2 | 3

>> cArray = arrow.c.Array()

cArray = 

  Array with properties:

    Address: 140341875084944

>> cSchema = arrow.c.Schema()

cSchema = 

  Schema with properties:

    Address: 140341880022320

% Export the Array to C Data Interface Format
>> expected.export(cArray.Address, cSchema.Address)

% Import the Array from C Data Interface Format
>> actual = arrow.array.Array.import(cArray, cSchema)

actual = 

  Float64Array with 3 elements and 0 null values:

    1 | 2 | 3

% The Array is the same after round-tripping to C Data Interface format
>> isequal(actual, expected)

ans =

  logical

   1
```

### What changes are included in this PR?

1. Added new `arrow.array.Array.export(cArrowArrayAddress, cArrowSchemaAddress)` method for exporting `Array`  objects to C Data Interface format.
2. Added new static `arrow.array.Array.import(cArray, cSchema)` method for importing `Array`s from C Data Interface format.
3. Added new internal `arrow.c.internal.ArrayImporter` class for importing `Array` objects from C Data Interface format.

### Are these changes tested?

Yes.

1. Added new test file `matlab/test/arrow/c/tRoundTrip.m` with basic round-trip tests for importing/exporting `Array` objects using the C Data Interface format.

### Are there any user-facing changes?

Yes.

1. There are now two new user-facing APIs added to the `arrow.array.Array` class. These are `arrow.array.Array.export(cArrowArrayAddress, cArrowSchemaAddress)` and `arrow.array.Array.import(cArray, cSchema)`. These APIs can be used to import/export `Array` objects using the C Data Interface format.

### Future Directions

1. Add integration tests for sharing data between MATLAB/mlarrow and Python/pyarrow running in the same process using the [MATLAB interface to Python](https://www.mathworks.com/help/matlab/call-python-libraries.html).
2. Add support for exporting/importing `arrow.tabular.RecordBatch` objects using the C Data Interface format.
3. Add support for the Arrow [C stream interface format](https://arrow.apache.org/docs/format/CStreamInterface.html).

### Notes

1. Thanks @ sgilmore10  for your help with this pull request!
* GitHub Issue: apache#41656

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Kevin Gurney <kevin.p.gurney@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants