Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BKD backed polygon intersection is slow #50531

Closed
blkbltjns opened this issue Dec 30, 2019 · 7 comments
Closed

BKD backed polygon intersection is slow #50531

blkbltjns opened this issue Dec 30, 2019 · 7 comments
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@blkbltjns
Copy link

blkbltjns commented Dec 30, 2019

Elasticsearch version 7.5.0
Kibana 7.5.0 plugin installed
Windows Server 2016 Datacenter

When moving from 6.3.2 to 7.5.0, spatial intersections on a geo_shape field in an index with 12 million polygons is significantly slower than it was before. This index is about 20GB. All these polygons represent land parcels and commonly touch and/or slightly overlap with neighboring shapes. I am doing the intersection with a bounding box roughly the size of the southern United States.

The first intersection query (see end of this post for example) I do after a fresh ES 7.5.0 server restart takes 2+ minutes. On a fresh ES 6.3.2 server restart this exact same query against the same data (using quadtree geo_shape) takes 800ms. I notice that there is very heavy disk read activity during the 7.5.0 query. This does not happen during the 6.3.2 query. Note that this is only happening with polygon geo_shapes; point geo_shapes do not have this problem from what I can tell.

After this first query (with a hot cache?), the situation improves somewhat but the 7.5.0 query still takes over a second to run while the 6.3.2 query takes 100ms.

Interestingly, doing an ES restart as opposed to a full server restart does not result in the 2+ minute query. I believe this is due to the Windows disk page cache being cleared being responsible for the 2+ minute to 1 second change.

If it matters, here is my query (this is /_count but I get similar results with /_search):

POST myindex/_count
{"query":{"bool":{"filter":{"bool":{"must":[{"geo_shape":{"geography":{"shape":{"type":"polygon","coordinates":[[[-118.74701654704592,38.554294590584455],[-118.74701654704592,22.052177425063828],[-76.559516547045916,22.052177425063828],[-76.559516547045916,38.554294590584455],[-118.74701654704592,38.554294590584455]]]},"relation":"intersects"}}}]}}}}}

@cbuescher cbuescher added the :Analytics/Geo Indexing, search aggregations of geo points and shapes label Jan 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

@iverase
Copy link
Contributor

iverase commented Jan 2, 2020

Thanks @blkbltjns for sharing the numbers. I am curious to know what performance would you get if you perform the same query using an envelope instead of a polygon. Would it be possible for you to run the following query and share the results?:

POST myindex/_count
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "geo_shape": {
                "geography": {
                  "shape": {
                    "type": "envelope",
                    "coordinates" : [ 
                      [-118.74701654704592, 38.554294590584455],
                      [-76.55951654704592, 22.052177425063828] 
                    ]
                  },
                  "relation": "intersects"
                }
              }
            }
          ]
        }
      }
    }
  }
}

@blkbltjns
Copy link
Author

@iverase

Running the envelope query has the same result in terms of performance. The initial query takes 2+ minutes (with very heavy disk read activity) and subsequent runs of the same query take a little over a second.

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@iverase
Copy link
Contributor

iverase commented Jun 29, 2020

Hi @blkbltjns,

It is a while since you originally open the issue but I am wondering if you were able to understand the issue with the disk page cache? There were some improvements in recent versions of ES regarding BKD backed geo shapes so they might have helped you.

@blkbltjns
Copy link
Author

Hi @iverase,

No change in performance here. We still see query timeouts on the initial spatial intersection queries, and subsequent queries take around a second to complete.

@iverase
Copy link
Contributor

iverase commented Jul 7, 2020

are you still using 7.5.0?

In 7.6.0 we change the way we open this index(#49272), so I would expect an upgrade will help. In the upcoming 7.9.0, there are several improvements to this index as well.

Could you share the output of hot threads while running this query? I would like to see where we are spending most of the time.

@wchaparro
Copy link
Member

Closing, not enough information to proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

6 participants