Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add split and join expressions #2064

Closed
wants to merge 5 commits into from

Conversation

ZeLonewolf
Copy link
Contributor

@ZeLonewolf ZeLonewolf commented Jan 13, 2023

This PR adds two expressions, split and joint. This helps to simplify style logic, for example osm-americana/openstreetmap-americana#680:

  • split takes a string and separates it into an array, split by a delimiter character. So ["split", "/", "a/b/c"] results in ["a","b","c"].
  • join combines the elements of an array together with a delimiter, so ["join", ";", ["literal", ["a", "b", "c"]]] results in a;b;c.

Launch Checklist

  • Confirm your changes do not include backports from Mapbox projects (unless with compliant license) - if you are not sure about this, please ask!
  • Briefly describe the changes in this PR.
  • Link to related issues.
  • Write tests for all new functionality.
  • Document any changes to public APIs.
  • Add an entry to CHANGELOG.md under the ## main section.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2023

Bundle size report:

Size Change: +27 B
Total Size Before: 207 kB
Total Size After: 207 kB

Output file Before After Change
maplibre-gl.js 198 kB 198 kB +27 B
maplibre-gl.css 9.1 kB 9.1 kB 0 B
ℹ️ View Details
Source file Before After Change
src/util/evented.ts 525 B 3.97 kB +3.45 kB
node_modules/gl-matrix/esm/vec4.js 392 B 543 B +151 B
src/render/uniform_binding.ts 572 B 647 B +75 B
src/style/create_style_layer.ts 174 B 244 B +70 B
src/data/bucket/pattern_attributes.ts 111 B 172 B +61 B
src/symbol/symbol_size.ts 579 B 637 B +58 B
src/style-spec/expression/definitions/interpolate.ts 1.22 kB 1.27 kB +43 B
src/util/performance.ts 732 B 766 B +34 B
src/style-spec/expression/definitions/index.ts 1.63 kB 1.66 kB +32 B
src/util/actor.ts 846 B 871 B +25 B
src/data/bucket/fill_attributes.ts 92 B 109 B +17 B
src/util/ajax.ts 2.09 kB 2.11 kB +12 B
node_modules/pbf/index.js 2.82 kB 2.83 kB +9 B
src/style/style_layer/raster_style_layer.ts 66 B 75 B +9 B
node_modules/csscolorparser/csscolorparser.js 2.05 kB 2.06 kB +7 B
node_modules/@mapbox/whoots-js/index.mjs 273 B 280 B +7 B
src/style/style_layer/fill_extrusion_style_layer.ts 925 B 931 B +6 B
src/style-spec/util/interpolate.ts 230 B 235 B +5 B
src/style/style_layer/circle_style_layer_properties.g.ts 226 B 231 B +5 B
src/style/evaluation_parameters.ts 386 B 390 B +4 B
src/util/intersection_tests.ts 929 B 933 B +4 B
src/data/bucket/fill_bucket.ts 1.09 kB 1.09 kB +4 B
src/style/format_section_override.ts 353 B 357 B +4 B
node_modules/@mapbox/point-geometry/index.js 630 B 633 B +3 B
src/style-spec/expression/types/collator.ts 204 B 207 B +3 B
src/util/is_char_in_unicode_block.ts 878 B 881 B +3 B
src/style/style_layer/circle_style_layer.ts 531 B 534 B +3 B
src/util/color_ramp.ts 432 B 435 B +3 B
src/util/config.ts 83 B 85 B +2 B
src/style-spec/util/color.ts 316 B 318 B +2 B
src/style-spec/expression/definitions/let.ts 434 B 436 B +2 B
src/style-spec/expression/definitions/index_of.ts 489 B 491 B +2 B
src/data/bucket/heatmap_bucket.ts 82 B 84 B +2 B
src/util/webp_supported.ts 374 B 375 B +1 B
src/style-spec/expression/scope.ts 194 B 195 B +1 B
src/style-spec/expression/definitions/literal.ts 311 B 312 B +1 B
src/style-spec/expression/definitions/assertion.ts 624 B 625 B +1 B
src/style-spec/expression/evaluation_context.ts 310 B 311 B +1 B
src/style-spec/expression/definitions/case.ts 459 B 460 B +1 B
src/style-spec/validate/validate.ts 433 B 434 B +1 B
src/style-spec/validate/validate_image.ts 62 B 63 B +1 B
src/util/transferable_grid_index.ts 1.02 kB 1.02 kB +1 B
src/data/bucket/circle_attributes.ts 92 B 93 B +1 B
src/shaders/encode_attribute.ts 81 B 82 B +1 B
node_modules/gl-matrix/esm/common.js 181 B 182 B +1 B
src/style/style_layer/line_style_layer_properties.g.ts 275 B 276 B +1 B
src/symbol/transform_text.ts 201 B 202 B +1 B
src/style/style_layer/symbol_style_layer.ts 1.16 kB 1.17 kB +1 B
src/style/style_layer/fill_style_layer.ts 275 B 276 B +1 B
src/util/dictionary_coder.ts 227 B 228 B +1 B
src/style-spec/validate/validate_constants.ts 113 B 112 B -1 B
src/style-spec/expression/types/formatted.ts 264 B 263 B -1 B
src/style-spec/util/padding.ts 258 B 257 B -1 B
src/style-spec/expression/runtime_error.ts 111 B 110 B -1 B
src/style-spec/expression/definitions/coalesce.ts 444 B 443 B -1 B
src/style-spec/expression/definitions/at.ts 377 B 376 B -1 B
src/style-spec/expression/definitions/length.ts 332 B 331 B -1 B
src/util/web_worker_transfer.ts 1.03 kB 1.03 kB -1 B
src/style/style_layer.ts 1.21 kB 1.21 kB -1 B
node_modules/murmurhash-js/index.js 66 B 65 B -1 B
src/data/feature_position_map.ts 608 B 607 B -1 B
src/data/extent.ts 32 B 31 B -1 B
src/style/query_utils.ts 485 B 484 B -1 B
src/data/bucket/symbol_attributes.ts 763 B 762 B -1 B
node_modules/ieee754/index.js 567 B 566 B -1 B
src/symbol/shaping.ts 3.63 kB 3.63 kB -1 B
src/geo/lng_lat.ts 586 B 585 B -1 B
src/geo/lng_lat_bounds.ts 623 B 622 B -1 B
src/style-spec/expression/compound_expression.ts 734 B 732 B -2 B
src/style-spec/expression/definitions/match.ts 751 B 749 B -2 B
src/style/style_layer/fill_style_layer_properties.g.ts 197 B 195 B -2 B
src/data/dem_data.ts 834 B 832 B -2 B
src/style-spec/expression/types.ts 517 B 514 B -3 B
src/util/browser.ts 450 B 446 B -4 B
src/data/bucket/circle_bucket.ts 971 B 967 B -4 B
src/style/parse_glyph_pbf.ts 370 B 366 B -4 B
src/render/image_atlas.ts 838 B 834 B -4 B
src/symbol/quads.ts 1.88 kB 1.88 kB -4 B
src/style/style_layer/symbol_style_layer_properties.g.ts 655 B 651 B -4 B
src/source/tile_id.ts 1.33 kB 1.32 kB -4 B
src/style/properties.ts 1.91 kB 1.9 kB -5 B
src/style/style_layer/hillshade_style_layer_properties.g.ts 163 B 157 B -6 B
node_modules/earcut/src/earcut.js 2.82 kB 2.81 kB -6 B
src/style-spec/expression/definitions/step.ts 643 B 636 B -7 B
src/util/script_detection.ts 1.65 kB 1.64 kB -7 B
src/data/feature_index.ts 1.74 kB 1.74 kB -7 B
node_modules/murmurhash-js/murmurhash3_gc.js 399 B 384 B -15 B
src/style/style_layer/line_style_layer.ts 831 B 814 B -17 B
node_modules/gl-matrix/esm/mat4.js 2.8 kB 2.79 kB -18 B
src/style/style_layer/background_style_layer.ts 84 B 63 B -21 B
src/symbol/symbol_layout.ts 3.83 kB 3.8 kB -28 B
node_modules/murmurhash-js/murmurhash2_gc.js 279 B 244 B -35 B
src/data/array_types.g.ts 2.9 kB 2.86 kB -46 B
src/style-spec/util/color_spaces.ts 804 B 756 B -48 B
src/source/rtl_text_plugin.ts 968 B 918 B -50 B
src/util/tile_request_cache.ts 936 B 867 B -69 B
src/util/util.ts 2.58 kB 2.42 kB -162 B
src/style-spec/error/validation_error.ts 3.56 kB 123 B -3.44 kB

@ZeLonewolf ZeLonewolf marked this pull request as ready for review January 13, 2023 03:00
@HarelM
Copy link
Collaborator

HarelM commented Jan 13, 2023

Why are we implementing string manipulation operations in the style spec rather than creating this data up front?

@ZeLonewolf
Copy link
Contributor Author

The vector data is not necessarily within the style developer's control, especially if working against a standard schema such as OpenMapTiles. Additionally, since mvt tile features are organized in key-value pairs rather than key-array pairs, it's often necessary to pack information into a feature value. We do A LOT of string manipulation in Americana to deal with language and fallback language support, and it's a big reason why our style.json is nearly 1MB.

Beyond the discussion of where a particular piece of processing should go in the stack, I implemented this functionality because I thought it belonged as part of the library's general-purpose string-processing capability. It's one of the unimplemented features listed in mapbox/mapbox-gl-js#6484 and since I'd just figured out how to work with expressions, I thought I'd pitch in. I hope as I become more familiar with the code base that I might pitch in on other functionality. If my energy is misdirected, please let me know.

@HarelM
Copy link
Collaborator

HarelM commented Jan 13, 2023

I see your point. Although I don't think this will help reduce the size of the style.json.
My concern is more about the linked feature you sent, i.e. I prefer to have a discussion/issue where we define what we plan to do and then do it.
Creating a language inside the style spec (or keep expanding it) is not a great direction from my point of view as you'll be complicating the style instead of organizing the data properly.
The argument about not controlling the data is very weak, if your product owner wants to see something on the map and it's not there in terms of data you'll need to add it, I don't think you can really separate the data and the styling this way.

With string manipulation you can start adding a lot of functions and those needs to be supported in both web and native, and maintained, this isn't cheap (start with, contains, char at, ends with, trim, replace, is empty, sub string, and these are just from the top of my head).

So, I would advise to start a discussion around which of those are absolutely necessary/high priority/commonly used and then go about implementing them.

@zekefarwell
Copy link

zekefarwell commented Jan 30, 2023

Creating a language inside the style spec (or keep expanding it) is not a great direction from my point of view as you'll be complicating the style instead of organizing the data properly.

I agree in principle. I had expected that the sensible way to handle semicolon separated values from OSM tags would be to parse the values into an array at tile generation time so that the vector tile rendering library has a nice data structure to work with. Unfortunately, the Mapbox Vector Tile specification does not include arrays so this would not be possible (barring a new vector tile specification). This being the case, some way of handling stringified pseudo arrays is needed. For example the MVT spec suggests an array might be encoded like this:

property: "[\"value_one\",\"value_two\",\"value_three\"]"

A raw OSM tag with multiple values would be formatted like this:

property: "value_one;value_two;value_three"

I'm sure there are other possible solutions to this problem than general purpose string manipulation functions, but it definitely seems like a clear need given the lack of true arrays in the MVT specification.

@wipfli
Copy link
Contributor

wipfli commented Jan 30, 2023

I think an expression to split strings is an interesting feature, but also I would probably use it when I do some more logic like language switching for which I would need some JavaScript. I guess this is what people call run-time styling or so.

We do A LOT of string manipulation in Americana to deal with language and fallback language support

Can you share an example?

@ZeLonewolf
Copy link
Contributor Author

We do A LOT of string manipulation in Americana to deal with language and fallback language support

Can you share an example?

Yes. We intend to create a working demonstration to demonstrate the utility of split/join functionality, with a methodology as discussed in osm-americana/openstreetmap-americana#763. My plan is to use osm-americana/openstreetmap-americana#747 for style size metrics and as-yet-undetermined mechanisms for client performance profiling. Unfortunately, the main performance profiler is presently broken (#2122) but we'll cross that bridge as we get there.

@1ec5
Copy link
Contributor

1ec5 commented Feb 5, 2023

With string manipulation you can start adding a lot of functions and those needs to be supported in both web and native, and maintained, this isn't cheap (start with, contains, char at, ends with, trim, replace, is empty, sub string, and these are just from the top of my head).

Other than replace and trim, the style specification already implements all of the functions you’ve named, or at least the building blocks to easily implement them by composing expressions.

We do A LOT of string manipulation in Americana to deal with language and fallback language support

Can you share an example?

The most relevant example is that the style replaces each semicolon in a list with a more presentable delimiter: osm-americana/openstreetmap-americana#666. Theoretically, a tileset could come with this modification already in place. However, there would be serious tradeoffs in doing so, because these are largely stylistic modifications that are not inherent to the data.

For one thing, we decided that the appropriate delimiter is a newline when the symbol is point-placed but a bullet character (•) when the symbol is line-placed. This is a purely stylistic choice, apart from the need to work around mapbox/mapbox-gl-js#8575. A different style could validly choose to separate the names by dashes or slashes according to the designer’s taste, but the designer should not need to generate their own global tileset to make this modification.

The style specification’s expression language supports neither string replacement (#2059) nor splitting and joining (which are building blocks for string replacement), so we implemented our own factory function for recursively generating a string replacement expression. Recursing more than a few levels deep crashes Firefox due to the sheer level of nesting in the style JSON: osm-americana/openstreetmap-americana#680.

This Rube Goldberg contraption soon gave way to another. In osm-americana/openstreetmap-americana#670, we collapsed space padding after the semicolon as a compromise with the OSM community. We also deduplicated list items that happened to match the main name in the user’s preferred language (another stylistic choice). This required us to parse the property value as a list so we could compare each item to the main name. So we tossed out the replacement function and implemented a custom tokenizer to make the logic more straightforward.

It surely would’ve been easier to write a tokenizer in a more complete programming language on the server side. However, the user’s preferred language comes from a user preference, which we apply using runtime styling. It would be infeasible to regenerate a new tileset in response to the user selecting a different interface language.

In osm-americana/openstreetmap-americana@b83ad88...1ec5-gljs-expr-split-join-680, I prototyped what a less hacky replacement for the tokenizer would look like in Americana, measuring the overall style JSON’s size along the way. The existing tokenizer implementation doesn’t dominate the style JSON by any means, but any savings allows us to focus on things that are more relevant to the style and also makes it easier for others to develop sophisticated styles:

  1. osm-americana/openstreetmap-americana@b83ad88: 945,999 bytes
  2. Adding a replace-all operator: 912,159 bytes (but able to replace an unlimited number of semicolons)
  3. Adding split and join operators: 918,361 bytes (but more robust)
  4. Allowing concat to concatenate arrays: 917,587 bytes
  5. Adding a filter expression that takes an array and a filter as input: 911,699 bytes

By the end of this experiment, what began as a browser stress test of linguistic gymnastics turned into a rather mundane string operation. After all, that was the advantage of expressions over the legacy style function syntax.

@1ec5
Copy link
Contributor

1ec5 commented Feb 5, 2023

By the way, I think split and join, in conjunction with functional operators like filter and map, would solve 90% of the need for a declarative language for Turing-complete functions (mapbox/mapbox-gl-js#7010) but without all that complexity.

@HarelM HarelM mentioned this pull request Feb 7, 2023
6 tasks
@ZeLonewolf
Copy link
Contributor Author

Closing per #2059 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants