Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] add cluster mapreduce feature to style spec #7004

Closed
wants to merge 2 commits into from

Conversation

redbmk
Copy link

@redbmk redbmk commented Jul 23, 2018

Launch Checklist

  • briefly describe the changes in this PR
  • write tests for all new functionality
  • document any changes to public APIs
  • post benchmark scores
  • manually test the debug page
  • tagged @mapbox/studio and/or @mapbox/maps-design if this PR includes style spec changes

Notes

This should fix #2412. It works from the testing I've done so far, and I changed the cluster test to use this feature.

This is still a WIP because I just threw this together without really discussing the implementation (e.g. does it make sense to use point and cluster variables, or is there another way to do this?). Also, I still need to do some benchmarking and finish documenting the new features (I've added some documentation to the style spec, but I'd imagine there's more to do than that). Any help would be much appreciated!

I'm not sure how to tag @mapbox/studio and @mapbox/maps-design - those seem relevant since this affects the style spec, but github doesn't autocomplete when I type those, and they don't appear as links when I check Preview. Does anybody know how to tag them like the PR template suggests?

@mourner It might also be good to add access to zoom at some point, which would involve supercluster changes (essentially just also passing in zoom to the _accumulate function).

@anandthakker I wanted to add some global variables for Infinity and -Infinity in case you want to do something like min, max without setting the initial value to 0 (e.g. what if you want the max, but all the numbers are negative?). I couldn't figure out how to do this though. I tried setting the globals to include them like in the following snippet, but then when I tried to access them with ["get", "infinity"] or ["get", "-infinity"], I got null. Do you know the right way to support those?

const globals = {
  infinity: Infinity,
  "-infinity": -Infinity
};

Maybe we could add them as properties, and then have the actual properties of the point be exposed in point (currently that's how it is in the reduce function, but not in map). That would give us variables point (with the point's properties), infinity, and -infinity (and on reduce we would also have cluster).

New Feature Description

This adds a new field to a GeoJSON source called clusterMapReduce, which will allow you to use mapbox expressions to pass data into supercluster's initial, map, and reduce options. The usage would look like this:

"sources": {
  "geojson": {
    "type": "geojson",
    "data": "local://data/places.geojson",
    "cluster": true,
    "clusterRadius": 25,
    "clusterMapReduce": {
      "initial": {
        "max": 1,
        "sum": 0
      },
      "map": {
        "max": ["get", "scalerank"],
        "sum": ["get", "scalerank"]
      },
      "reduce": {
        "max": ["max",
          ["get", "max", ["get", "cluster"]],
          ["get", "max", ["get", "point"]]
        ],
        "sum": ["+",
          ["get", "sum", ["get", "cluster"]],
          ["get", "sum", ["get", "point"]]
        ]
      }
    }
  }
}

In order to use supercluster's mapreduce, the important property is reduce. By default, initial will be an empty object ({}), and map will return the point's properties. So, for example if you have price as a property on a point, and you want the clusters to have a property price that is actually the min of those, you could simply do:

"clusterMapReduce": {
  "reduce": {
    "price": ["min",
      ["get", "price", ["get", "cluster"]],
      ["get", "max", ["get", "point"]]
    ]
  }
}

In reduce, the accumulated cluster is exposed as the variable cluster, and the current point is exposed as point.

In map, the properties are exposed as as regular properties, so you don't need to nest the get (i.e. you can do ["get, "value"] instead of ["get", "value", ["get", "point"]]).

In initial, you don't have access to any feature, but expressions are still supported.

@redbmk
Copy link
Author

redbmk commented Jul 23, 2018

I fixed the flow tests, but I'm having trouble with the unit tests. I now get stuff like not ok clusterMapReduce.reduce.* stray key. As far as I can tell, I'm implementing this the same way as the various sources, where they allow arbitrary keys by adding "*" key with "type": "*".

@mourner
Copy link
Member

mourner commented Jul 23, 2018

This is really awesome — thank you very much for taking on this! I like that the implementation is relatively short — not as scary as I thought it would be. I'll review this in more detail this week.

@anandthakker
Copy link
Contributor

Amazing @redbmk!

This prompted me to start thinking about how the design of the reduce expressions should look, and I filed #7010. I think that along with the the "{}" / "[]" operators proposed by @jfirebaugh in #6155 (comment) will set us up for a nice syntax for specifying cluster map / reduce expressions without introducing specialized semantics for "get". E.g., something like:

"initial": [ "{}", "max", 1, "sum", 0 ],
"map": [ "{}",
    "max", ["get", "scalerank"],
    "sum", ["get", "scalerank"]
],
"reduce": [ "fn", "cluster", "point",
    "{}",
    "max", ["max",
        ["get", "max", ["var", "cluster"]],
        ["get", "max", ["var", "point"]]
    ]
    "sum": ["+",
        ["get", "sum", ["var", "cluster"]],
        ["get", "sum", ["var", "point"]]
    ]
]

@redbmk
Copy link
Author

redbmk commented Jul 23, 2018

@anandthakker I like it - that would be pretty cool and actually simplify the cluster code and the style spec. Would there be a way in the style spec to specify that initial, map, and reduce need to return an object?

Technically initial and map are functions also, but I think leaving map as a property expression keeps things simple instead of having to do ["fn", "point", ["{}", "max", ["get", "scalerank", ["var", "point"]]]. We could probably even have the reduce function be a property expression as well, with the feature being point, but then you would still need to reference ["var", "cluster"].

@lonnibesancon
Copy link

lonnibesancon commented Sep 19, 2018

Any update on that @mourner and @redbmk ? I am currently trying to have data-driven paint attributes for my clusters. I've seen that using supercluster was an option since I want to use other features that just point_counts, but the only example out there using supercluster is broken and out-of-date. Would very much like to see how one could achieve that either by using the MR work shown here or with a separate use of supercluster.

See the related Stack Overflow question:

@sheerun
Copy link

sheerun commented Nov 7, 2018

Awesome, we need such kind of feature to show minimum and maximum prices of cluster

@mourner
Copy link
Member

mourner commented Jan 9, 2019

Continued in #7584.

@mourner mourner closed this Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support property aggregation on clustered features
5 participants