Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read WKW as DatasetArray, no more Memory Mapping #7528

Merged
merged 33 commits into from
Jan 29, 2024
Merged

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Jan 9, 2024

  • Removed the dependency webknossos-wrap.
  • Took some of its code and copied it into the webknossos codebase (WKWFile,WKWHeader,ResourceBox).
  • Renamed some of its properties to match the shard/chunk terminology (previously cube/block)
  • Added some methods to the WKWHeader class so that it implements the DatasetHeader interface
  • WKWArray now reads WKW data, implements the DatasetArray interface
    • note that AxisOrder is always CXYZ
  • Volume annotation downloads/uploads still use some of the original codepaths from WKWFile and WKWHeader, so those are not removed (however, that works without memory mapping)
  • Also renamed size to shape where applicable (size just looked too weird, also confusing with flat byte lengths)

URL of deployed dev instance (used for testing):

Steps to test:

  • Open wkw dataset,
    • compressed
    • uncompressed
    • multiple shards
    • segmentation
    • different dtypes including uint24, float
  • Open other dataset (e.g. zarr)
  • Should all load nicely
  • Create volume annotation, download + reupload, should show correct content

TODOs:

  • import webknossos-wrap code into wk directly
  • DatasetArray
    • header parsing
      • CompressorImpl
        • use streaming lz4 decompression, throw away expectedUncompressedSizeBytes?
      • DataType
      • Channels/uint24
      • DatasetShape
    • sharding
    • test color data compressed and uncompressed
    • test segmentation data
    • test with bigger datasets/multiple shards
    • roughly check perf
  • how to integrate other usages? (e.g. volume annotation upload/download)
  • remove unused code
  • do we need to handle the case of missing header.wkw files?
  • follow up issues for shard handle caching, ResourceBox, other cleanup

Issues:


@fm3 fm3 self-assigned this Jan 9, 2024
@fm3 fm3 mentioned this pull request Jan 18, 2024
@fm3 fm3 changed the title WIP: Read WKW as DatasetArray Read WKW as DatasetArray Jan 18, 2024
@fm3 fm3 marked this pull request as ready for review January 18, 2024 15:37
@fm3 fm3 changed the title Read WKW as DatasetArray Read WKW as DatasetArray, no more Memory Mapping Jan 18, 2024
@fm3 fm3 requested a review from frcroth January 18, 2024 19:55
Copy link
Member

@frcroth frcroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also do some performance comparisons?

MIGRATIONS.unreleased.md Outdated Show resolved Hide resolved
@normanrz
Copy link
Member

Benchmark results on Hetzner cloud CCX53 (32 cores, 128G RAM)

Benchmarking with wrk (32 threads, 500 conenctions) and a custom lua script

NEW COLOR WKW
Running 1m test @ http://localhost:9001
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    25.18ms   69.42ms   2.03s    98.12%
    Req/Sec   818.53    160.40     3.33k    78.99%
  1558572 requests in 1.00m, 285.96GB read
  Non-2xx or 3xx responses: 1
Requests/sec:  25941.35
Transfer/sec:      4.76GB

NEW COLOR WKW UNCOMPRESSED
Running 1m test @ http://localhost:9001
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    19.56ms   32.82ms   1.06s    97.64%
    Req/Sec     0.89k   178.75     3.28k    81.92%
  1681436 requests in 1.00m, 301.25GB read
  Non-2xx or 3xx responses: 2
Requests/sec:  27990.22
Transfer/sec:      5.01GB

NEW SEGMENTATION WKW
Running 1m test @ http://localhost:9001
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    47.64ms  117.42ms   2.85s    98.63%
    Req/Sec   409.30     85.51     1.48k    82.30%
  777735 requests in 1.00m, 569.93GB read
Requests/sec:  12940.61
Transfer/sec:      9.48GB

---

OLD COLOR WKW
Running 1m test @ http://localhost:9001
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    72.34ms  206.46ms   4.49s    97.65%
    Req/Sec   337.55     75.59   780.00     77.81%
  630421 requests in 1.00m, 115.67GB read
Requests/sec:  10490.76
Transfer/sec:      1.92GB

OLD COLOR WKW UNCOMPRESSED
CRASH

OLD SEGMENTATION WKW
Running 1m test @ http://localhost:9001
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    63.89ms  196.38ms   3.23s    98.18%
    Req/Sec   379.16     80.62     0.89k    81.07%
  699823 requests in 1.00m, 512.83GB read
Requests/sec:  11644.98
Transfer/sec:      8.53GB

---

COLOR ZARR3
Running 1m test @ http://localhost:9001
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    23.15ms   37.57ms   1.07s    96.25%
    Req/Sec   811.37    211.67     3.58k    80.27%
  1533392 requests in 1.00m, 281.34GB read
  Non-2xx or 3xx responses: 1
Requests/sec:  25522.56
Transfer/sec:      4.68GB

COLOR ZARR3 UNCOMPRESSED
Running 1m test @ http://localhost:9001
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    44.61ms  187.18ms   2.82s    97.32%
    Req/Sec     0.87k   228.64     3.96k    81.43%
  1659328 requests in 1.00m, 274.68GB read
Requests/sec:  27610.02
Transfer/sec:      4.57GB

SEGMENTATION ZARR3
Running 1m test @ http://localhost:9001
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    59.39ms  211.02ms   4.82s    98.17%
    Req/Sec   420.26    105.71     1.18k    78.27%
  797359 requests in 1.00m, 584.31GB read
  Non-2xx or 3xx responses: 1
Requests/sec:  13267.61
Transfer/sec:      9.72GB

WKW is now faster than before, 2.5x for color data and 1.15x for segmentation data. Zarr3 and WKW are basically on par.

Compare with #7245 (comment)

@fm3 fm3 enabled auto-merge (squash) January 29, 2024 09:55
@fm3 fm3 merged commit f7df25f into master Jan 29, 2024
2 checks passed
@fm3 fm3 deleted the wkw-dataset-array branch January 29, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants