Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate multi-band performance #226

Open
DanielJDufour opened this issue Dec 9, 2023 · 9 comments
Open

investigate multi-band performance #226

DanielJDufour opened this issue Dec 9, 2023 · 9 comments

Comments

@DanielJDufour
Copy link
Member

Describe the bug
Performance seems to be better when you split each band into a separate file; rather than combing them all into one file. This is counter-intuitive, so will need investigation

To Reproduce
Steps to reproduce the behavior:
I've written a test, which I'll share

Expected behavior
Multiband case is faster than separate band case.

Screenshots
none

Desktop (please complete the following information):

  • OS: [e.g. iOS] MacOS
  • Browser [e.g. chrome, safari] Chrome
  • Version [e.g. 22] idk

Additional context
Add any other context about the problem here.

rought benchmarks
combined multiband with geoblaze.load: 5,885ms
combined multiband with geoblaze.parse: 16,683ms

separated bands with geoblaze.load: 3,138ms (1250 + 1121 + 1064)
separated bands with geoblaze.parse: 6,921ms (2364 + 2230 + 2245)

@DanielJDufour
Copy link
Member Author

Welp, we can rule out georaster initliazation as the issue. I ran the same performance tests with just the initial parseGeoraster step (excluding the statistical calculations) and the results were as expected.

Singleband raster analysis took 287 ms
Multiband raster analysis took 21 ms

Parsing all the single band rasters will naturally take longer because we'll have to issue a GET request for each of the 20 or so files' metadata whereas the multiband scenario will require just one fetch of the metadata (usually first 1024 bytes).

However, and most importantly, this initial metadata parsing contributes very little to the overall length of time it takes to run stats on the multiband raster ( 0.021 seconds vs. 23 seconds).

That being said, the initial metadata parsing does take up more than a full half of the time for the single band raster analysis (~287ms out of ~506ms). Meaning that, in theory, running stats on a multiband raster, could be faster someday once the main issue is fixed.

@DanielJDufour
Copy link
Member Author

Further investigation reveals the bulk of the extra time is spent calling georaster.getValues. It takes on average 4ms for the single-band rasters, but takes 24,143 ms for the multiband raster. In other words, 99.9% of the time (24143ms of the total 24165ms) is spent fetching the data with getValues.

@DanielJDufour
Copy link
Member Author

The plot thickens. I ran a test on geotiff.js readRasters method, which is what georaster uses and it doesn't seem like there's any critical issue with geotiff.js.

Singleband raster analysis took 186 ms
Multiband raster analysis took 928 ms

I'm still curious why it takes more times to read a multiband raster than all rasters individually. In theory, reading the multiband should be faster for reasons expressed above. But it's a very small factor in the abnormally large geoblaze time.

@DanielJDufour
Copy link
Member Author

oof, so when I copy the getValues function directly from the georaster source code, it runs very fast.
HOWEVER, when I run georaster.getValues from the compiled georaster code it's SUPER slow.
My current hypothesis is that the webpack building process is adding in a backwards compatability polyfill that is massively slowing things down. I'll keep digging!

@DanielJDufour
Copy link
Member Author

well I ran the compiled version of the georaster code and it's fast, so I'm at a bit of a loss. It might be an issue with the way the georaster build compiled geotiff.js or another dependency. I'll look into it more.

@DanielJDufour
Copy link
Member Author

Okay. I can confirm that the issue is with the way GeoRaster compiled geotiff.js. GeoRaster has a private property called _geotiff that gives you access to a geotiff.js instance of your raster. It's technically not the same geotiff.js instance as the one initially created to read the georaster-relevant metadata bc that's off the main thread.

    const georaster = await parseGeoRaster(url);
    const tiff = georaster._geotiff;
    await tiff.readRasters({...})

@DanielJDufour
Copy link
Member Author

Fixing the georaster build process or getting the next major release out the door is going to be a large lift. However, here's the quick hacky fix for georasters that are loaded via url:

import { fromUrl } from "geotiff";

async function fix(georaster) {
  if (typeof georaster._data === "string" && georaster._geotiff) {
    georaster._geotiff = await fromUrl(georaster._data);
  }
}

const georaster = await parseGeoRaster(multiUrl);
await fix(georaster);
await geoblaze.sum(georaster, geometry);

@avmey
Copy link

avmey commented Mar 22, 2024

I'm having trouble with this workaround, particularly using geotiff and ESM.

Error [ERR_REQUIRE_ESM]: require() of ES Module /workspaces/test-proj/node_modules/quick-lru/index.js from /workspaces/test-proj/node_modules/geotiff/dist-node/source/blockedsource.js not supported.

I found this thread you and Tim posted in, but was curious if these are still the most up to date solutions.

@aLemonFox
Copy link

aLemonFox commented Apr 8, 2024

Hi, is there any news on this? I have a cog of around ~8MB and it easily takes 30+ seconds to identify a single point from multiple bands (including fetching).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants