Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geo-centroid and weighted average aggregations documentation #7613

Merged
merged 26 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a033264
Add geo-centroid and weighted avaerage aggregations documentation
vagimeli Jul 2, 2024
61353ee
Add geocentroid content and examples
vagimeli Jul 3, 2024
144092a
Add weighted average content and examples
vagimeli Jul 3, 2024
a0eaf29
Merge branch 'main' into metric-aggs-content-gap
vagimeli Jul 3, 2024
32ddef9
Merge branch 'main' into metric-aggs-content-gap
vagimeli Jul 8, 2024
ff10997
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
0309c56
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
1c3e335
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
ec76d2d
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
cb8efc0
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
e159226
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
28796e0
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
f5e19e3
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
fd99759
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
46b2c9b
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
4d91453
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
9e18581
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
d86f529
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
66d981c
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
d1ed725
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
7972376
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
2efeb6d
Update _aggregations/metric/weighted-avg.md
vagimeli Jul 9, 2024
b5f46ad
Merge branch 'main' into metric-aggs-content-gap
vagimeli Jul 9, 2024
037cedb
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
643b2aa
Update _aggregations/metric/geocentroid.md
vagimeli Jul 9, 2024
1287811
Merge branch 'main' into metric-aggs-content-gap
vagimeli Jul 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 256 additions & 0 deletions _aggregations/metric/geocentroid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
---
layout: default
title: Geocentroid
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 45
---

# Geocentroid

Check failure on line 9 in _aggregations/metric/geocentroid.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _aggregations/metric/geocentroid.md#L9

[OpenSearch.Spelling] Error: Geocentroid. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Geocentroid. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_aggregations/metric/geocentroid.md", "range": {"start": {"line": 9, "column": 3}}}, "severity": "ERROR"}

The OpenSearch `geo_centroid` aggregation is a powerful tool that allows you to calculate the weighted geographic center or focal point of a set of spatial data points. This metric aggregation operates on `geo_point` fields and returns the centroid location as a latitude-longitude pair.

## Using the aggregation
Copy link
Collaborator Author

@vagimeli vagimeli Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical reviewer: Please confirm this example is relevant to an OpenSearch user. I tested the example using Dev Tools. If another example is more appropriate, please replace the draft example with your example. Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


Follow these steps to use the `geo_centroid` aggregation:

**1. Create an index with a `geopoint` field**

First, you need to create an index with a `geo_point` field type. This field stores the geographic coordinates you want to analyze. For example, to create an index called `restaurants` with a `location` field of type `geo_point`, use the following request:

```json
PUT /restaurants
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"location": {
"type": "geo_point"
}
}
}
}
```
{% include copy-curl.html %}

**2. Index documents with spatial data**

Next, index your documents containing the spatial data points you want to analyze. Make sure to include the `geo_point` field with the appropriate latitude-longitude coordinates. For example, index your documents using the following request:

```json
POST /restaurants/_bulk?refresh
{"index": {"_id": 1}}
{"name": "Cafe Delish", "location": "40.7128, -74.0059"}
{"index": {"_id": 2}}
{"name": "Tasty Bites", "location": "51.5074, -0.1278"}
{"index": {"_id": 3}}
{"name": "Sushi Palace", "location": "48.8566, 2.3522"}
{"index": {"_id": 4}}
{"name": "Burger Joint", "location": "34.0522, -118.2437"}
```
{% include copy-curl.html %}

**3. Run the `geo_centroid` aggregation**

To caluculate the centroid location across all documents, run a search with the `geo_centroid` aggregation on the `geo_point` field. For example, use the following request:

```json
GET /restaurants/_search
{
"size": 0,
"aggs": {
"centroid": {
"geo_centroid": {
"field": "location"
}
}
}
}
```
{% include copy-curl.html %}

The response includes a `centroid` object with `lat` and `lon` properties representing the weighted centroid location of all indexed data point, as shown in the following example:

```json
"aggregations": {
"centroid": {
"location": {
"lat": 43.78224998130463,
"lon": -47.506300045643
},
"count": 4
```
{% include copy-curl.html %}

**4. Nest under other aggregations (optional)**

You can also nest the `geo_centroid` aggregation under other bucket aggregations, such as `terms`, to calculate the centroid for subsets of your data. For example, to find the centroid location for each city, use the following request:

```json
GET /restaurants/_search
{
"size": 0,
"aggs": {
"cities": {
"terms": {
"field": "city.keyword"
},
"aggs": {
"centroid": {
"geo_centroid": {
"field": "location"
}
}
}
}
}
}
```
{% include copy-curl.html %}

This returns a centroid location for each city bucket, allowing you to analyze the geographic center of data points in different cities.

## Using `geo_centroid` with `geohash_grid` aggregation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either "a" or "the" should precede geohash_grid.

vagimeli marked this conversation as resolved.
Show resolved Hide resolved

The `geohash_grid` aggregation partitions geospatial data into buckets based on geohash prefixes.

When a document contains multiple geopoint values in a field, the `geohash_grid` aggregation assigns the document to multiple buckets, even if one or more of its geopoints are outside the bucket boundaries. This behavior is different from how individual geopoints are treated, where only those within the bucket boundaries are considered.

When you nest the `geo_centroid` aggregation under the `geohash_grid` aggregation, each centroid is calculated using all geopoints in a bucket, including those that may be outside the bucket boundaries. This can result in centroid locations that fall outside the geographic area represented by the bucket.

#### Example

In this example, the `geohash_grid` aggregation with a `precision` of `3` creates buckets based on geohash prefixes of length `3`. Because each document has multiple geopoints, they may be assigned to multiple buckets, even if some of the geopoints fall outside the bucket boundaries.

The `geo_centroid` subaggregation calculates the centroid for each bucket using all geopoints assigned to that bucket, including those outside the bucket boundaries. This means that the resulting centroid locations may not necessarily lie within the geographic area represented by the corresponding geohash bucket.

First, create an index and index documents containing multiple geopoints:

```json
PUT /locations
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"coordinates": {
"type": "geo_point"
}
}
}
}

POST /locations/_bulk?refresh
{"index": {"_id": 1}}
{"name": "Point A", "coordinates": ["40.7128, -74.0059", "51.5074, -0.1278"]}
{"index": {"_id": 2}}
{"name": "Point B", "coordinates": ["48.8566, 2.3522", "34.0522, -118.2437"]}
```

Then, run `geohash_grid` with `geo_centroid` subaggregation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are articles missing here.

vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
GET /locations/_search
{
"size": 0,
"aggs": {
"grid": {
"geohash_grid": {
"field": "coordinates",
"precision": 3
},
"aggs": {
"centroid": {
"geo_centroid": {
"field": "coordinates"
}
}
}
}
}
}
```
{% include copy-curl.html %}

<details markdown="block">
  <summary>
    Response
  </summary>
  {: .text-delta}

```json
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"grid": {
"buckets": [
{
"key": "u09",
"doc_count": 1,
"centroid": {
"location": {
"lat": 41.45439997315407,
"lon": -57.945750039070845
},
"count": 2
}
},
{
"key": "gcp",
"doc_count": 1,
"centroid": {
"location": {
"lat": 46.11009998945519,
"lon": -37.06685005221516
},
"count": 2
}
},
{
"key": "dr5",
"doc_count": 1,
"centroid": {
"location": {
"lat": 46.11009998945519,
"lon": -37.06685005221516
},
"count": 2
}
},
{
"key": "9q5",
"doc_count": 1,
"centroid": {
"location": {
"lat": 41.45439997315407,
"lon": -57.945750039070845
},
"count": 2
}
}
]
}
}
}
```
{% include copy-curl.html %}

</details>
Loading
Loading