Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++] Geometry Dataframe #3212

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

XanthosXanthopoulos
Copy link
Collaborator

This PR adds the new SOMAGeometryDataFrame object. This new object requires spatial axes to be determined and uses them internally to index geometry objects stored as a WKB encoded attribute. To differentiate between binary attributes and WKB binary attributes the Arrow schema requires a dtype metadata key with the value WKB.

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.97%. Comparing base (99ba3a3) to head (ead7104).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3212      +/-   ##
==========================================
+ Coverage   83.86%   83.97%   +0.10%     
==========================================
  Files          51       51              
  Lines        5505     5505              
==========================================
+ Hits         4617     4623       +6     
+ Misses        888      882       -6     
Flag Coverage Δ
python 83.97% <ø> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 83.97% <ø> (+0.10%) ⬆️
libtiledbsoma ∅ <ø> (∅)

@XanthosXanthopoulos XanthosXanthopoulos requested review from nguyenv, jp-dark and johnkerl and removed request for nguyenv and jp-dark October 21, 2024 21:11
@XanthosXanthopoulos XanthosXanthopoulos marked this pull request as ready for review October 21, 2024 21:12
Copy link
Collaborator

@jp-dark jp-dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That parts of this I understand look good to me, but I'm not familiar enough with the arrow adapter code to have a good sense of what impact the change might have there.

if (type_metadata.compare("WKB") == 0) {
type = TILEDB_GEOM_WKB;
} else {
throw std::runtime_error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use fmt::format with {} as elsewhere in this file

return {attr, enmr};
}

std::pair<Dimension, bool> ArrowAdapter::tiledb_dimension_from_arrow_schema(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer you move the body of this dimension function above the body of the corresponding attribute funtion

* @return std::pair<Dimension, bool> The TileDB dimension with a boolean
* flag indicating whether or not the dimension uses `current domain`.
*/
static std::pair<Dimension, bool> tiledb_dimension_from_arrow_schema(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer you move the declaration of this dimension function above the corresponding attribute function

static void create(
std::string_view uri,
std::unique_ptr<ArrowSchema> schema,
ArrowTable index_columns,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see no reason not to use const references here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agrre on const reference here but will require refactoring all other classes which tiledb_schema_from_arrow_schema. Should be done to a follow-up PR maybe?

Copy link
Member

@johnkerl johnkerl Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't need to be done everywhere -- it can be done in places you add

we already have const references in some places not others -- you can have this PR be part of the work that doesn't need to be redone later

it can also be done all at once in a follow-on, of course

* @param mode read or write
* @param ctx SOMAContext
* @param column_names A list of column names to use as user-defined index
* columns (e.g., ``['cell_type', 'tissue_type']``). All named columns must
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these example column names legit for geometry dataframes?

ctx->tiledb_ctx(),
std::move(schema),
ArrowTable(
std::move(index_columns.first), std::move(index_columns.second)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe you can avoid this by having this constructor take the ArrowTable parameters as const references

libtiledbsoma/test/unit_soma_geometry_dataframe.cc Outdated Show resolved Hide resolved
libtiledbsoma/test/unit_soma_geometry_dataframe.cc Outdated Show resolved Hide resolved
SOMAGeometryDataFrame::create(
uri,
std::move(schema),
ArrowTable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use const references to avoid the moves

REQUIRE(soma_object->type() == "SOMAGeometryDataFrame");
soma_object->close();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we could use some tests which write & read data -- is this planned for a future PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is #3222 which adds tests for read/write.

Comment on lines 983 to 1018
if (type_metadata.compare("WKB") != 0) {
throw std::runtime_error(
std::string(
"Unkwown type metadata for `soma_geometry`. "
"Expected 'WKB', found ") +
std::string(type_metadata.data()));
}

FilterList filter_list = ArrowAdapter::_create_dim_filter_list(
child->name, platform_config, soma_type, ctx);
for (int64_t j = 0;
j < spatial_column_info.second->n_children;
++j) {
auto min_dim = tiledb_dimension_from_arrow_schema(
ctx,
spatial_column_info.second->children[j],
spatial_column_info.first->children[j],
soma_type,
type_metadata,
"tiledb__internal__",
"__min",
platform_config);

auto max_dim = tiledb_dimension_from_arrow_schema(
ctx,
spatial_column_info.second->children[j],
spatial_column_info.first->children[j],
soma_type,
type_metadata,
"tiledb__internal__",
"__max",
platform_config);

dims.insert({min_dim.first.name(), min_dim.first});
dims.insert({max_dim.first.name(), max_dim.first});

use_current_domain &= min_dim.second;
use_current_domain &= max_dim.second;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, I think this can be pulled out into a separate _set_spatial_dimension (or better name) helper function.

}
if (strcmp(child->name, index_column_schema->children[i]->name) ==
0) {
if (strcmp(child->name, "soma_geometry") == 0 &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should const std::string some of these in utils/common.h like "soma_geometry", "tiledb__internal__", "__min", and "__max"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants