Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Color equalizer OpenCL implementation #17372

Merged
merged 6 commits into from
Aug 27, 2024

Conversation

jenshannoschwalm
Copy link
Collaborator

  1. Better visualizing of effect masking buttons
  2. Output compared with CPU path has no significant difference here
  3. Possibly a bug detected in laplacian while being here
  4. Added support for 2-channel OpenCL images including gaussian blurring

Currently i only have a pretty low-power notebook with shared intel 620 OpenCL graphics so i can't tell anything about performance gains vs CPU.

There is some more-tricky new stuff here, so i would appreciate

  1. Testing for performance, you could do that via darktable --bench-module colorequal vs darktable --disable-opencl --bench-module colorequal if you don't use other ways
  2. Testing on AMD hardware. The code for 2-channel images seems to be correct but we had issues with data not fully initialized ...
  3. What about ARM silicon?

@jenshannoschwalm jenshannoschwalm added scope: image processing correcting pixels scope: performance doing everything the same but faster OpenCL Related to darktable OpenCL code labels Aug 26, 2024
@jenshannoschwalm jenshannoschwalm added this to the 5.0 milestone Aug 26, 2024
@jenshannoschwalm
Copy link
Collaborator Author

Force-pushed a) fixing a possibly sqrt of a negative and b) using the new dt_opencl_duplicate_image()

@TurboGit
Copy link
Member

Using CPU:

     7,4435 [dev_pixelpipe] took 0,022 secs (0,070 CPU) [full] processed `colorequal' on CPU, blended on CPU
     7,7700 [dev_pixelpipe] took 0,209 secs (1,463 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    16,5930 [dev_pixelpipe] took 0,067 secs (0,429 CPU) [full] processed `colorequal' on CPU, blended on CPU
    16,6650 [dev_pixelpipe] took 0,055 secs (0,534 CPU) [full] processed `colorequal' on CPU, blended on CPU
    16,7874 [dev_pixelpipe] took 0,256 secs (2,036 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    17,1099 [dev_pixelpipe] took 0,268 secs (1,671 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    17,2947 [dev_pixelpipe] took 0,059 secs (0,207 CPU) [full] processed `colorequal' on CPU, blended on CPU
    17,3776 [dev_pixelpipe] took 0,065 secs (0,382 CPU) [full] processed `colorequal' on CPU, blended on CPU
    17,4757 [dev_pixelpipe] took 0,073 secs (0,562 CPU) [full] processed `colorequal' on CPU, blended on CPU
    17,5027 [dev_pixelpipe] took 0,267 secs (1,905 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    17,5484 [dev_pixelpipe] took 0,055 secs (0,714 CPU) [full] processed `colorequal' on CPU, blended on CPU
    17,8271 [dev_pixelpipe] took 0,254 secs (1,874 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    18,8292 [dev_pixelpipe] took 0,061 secs (0,381 CPU) [full] processed `colorequal' on CPU, blended on CPU
    18,8941 [dev_pixelpipe] took 0,047 secs (0,433 CPU) [full] processed `colorequal' on CPU, blended on CPU
    19,0067 [dev_pixelpipe] took 0,094 secs (0,588 CPU) [full] processed `colorequal' on CPU, blended on CPU
    19,0314 [dev_pixelpipe] took 0,264 secs (1,956 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    19,3126 [dev_pixelpipe] took 0,223 secs (1,628 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    19,5327 [dev_pixelpipe] took 0,084 secs (0,541 CPU) [full] processed `colorequal' on CPU, blended on CPU
    19,6270 [dev_pixelpipe] took 0,068 secs (0,690 CPU) [full] processed `colorequal' on CPU, blended on CPU
    19,7146 [dev_pixelpipe] took 0,076 secs (0,732 CPU) [full] processed `colorequal' on CPU, blended on CPU
    19,7176 [dev_pixelpipe] took 0,266 secs (2,300 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    19,9999 [dev_pixelpipe] took 0,228 secs (1,710 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    20,9772 [dev_pixelpipe] took 0,062 secs (0,499 CPU) [full] processed `colorequal' on CPU, blended on CPU
    21,0595 [dev_pixelpipe] took 0,064 secs (0,417 CPU) [full] processed `colorequal' on CPU, blended on CPU
    21,1340 [dev_pixelpipe] took 0,227 secs (1,897 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    21,1460 [dev_pixelpipe] took 0,075 secs (0,710 CPU) [full] processed `colorequal' on CPU, blended on CPU
    21,2065 [dev_pixelpipe] took 0,050 secs (0,505 CPU) [full] processed `colorequal' on CPU, blended on CPU
    21,4383 [dev_pixelpipe] took 0,226 secs (1,460 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    21,8097 [dev_pixelpipe] took 0,105 secs (0,674 CPU) [full] processed `colorequal' on CPU, blended on CPU
    21,9187 [dev_pixelpipe] took 0,083 secs (0,587 CPU) [full] processed `colorequal' on CPU, blended on CPU
    22,0484 [dev_pixelpipe] took 0,324 secs (2,362 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    22,3254 [dev_pixelpipe] took 0,224 secs (1,576 CPU) [preview] processed `colorequal' on CPU, blended on CPU

Using GPU dt crashes for me:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f799230f266 in __GI_ppoll (fds=0x55f7430932a0, nfds=4, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42

warning: 42	../sysdeps/unix/sysv/linux/ppoll.c: Aucun fichier ou dossier de ce nom
warning: Currently logging to /tmp/darktable_bt_JLIZS2.txt.  Turn the logging off and on to make the new setting effective.
#0  0x00007f799230f266 in __GI_ppoll (fds=0x55f7430932a0, nfds=4, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x00007f7991fd4aec in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f7991fd546f in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f7991806c8d in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#4  0x00007f799267ae99 in dt_gui_gtk_run (gui=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/gui/gtk.c:1492
#5  0x000055f70665c15f in main (argc=<optimized out>, argv=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/main.c:122

=========

  Id   Target Id                                           Frame 
* 1    Thread 0x7f79898e6f40 (LWP 398688) "darktable"      0x00007f799230f266 in __GI_ppoll (fds=0x55f7430932a0, nfds=4, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
  2    Thread 0x7f7851a006c0 (LWP 398819) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  3    Thread 0x7f78524006c0 (LWP 398818) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  4    Thread 0x7f7852e006c0 (LWP 398817) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  5    Thread 0x7f78538006c0 (LWP 398816) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  6    Thread 0x7f78542006c0 (LWP 398815) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  7    Thread 0x7f7854c006c0 (LWP 398814) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  8    Thread 0x7f78556006c0 (LWP 398813) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  9    Thread 0x7f78560006c0 (LWP 398812) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  10   Thread 0x7f7856a006c0 (LWP 398811) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  11   Thread 0x7f78574006c0 (LWP 398810) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  12   Thread 0x7f7857e006c0 (LWP 398809) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  13   Thread 0x7f785d0006c0 (LWP 398808) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  14   Thread 0x7f7864c006c0 (LWP 398807) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  15   Thread 0x7f79456006c0 (LWP 398806) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  16   Thread 0x7f7866a006c0 (LWP 398805) "worker res 0"   0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  17   Thread 0x7f78656006c0 (LWP 398804) "pool-darktable" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  18   Thread 0x7f78660006c0 (LWP 398785) "pool-darktable" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  19   Thread 0x7f78674006c0 (LWP 398759) "lua thread"     0x00007f799230f266 in __GI_ppoll (fds=0x7f7978000b70, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
  20   Thread 0x7f7926a006c0 (LWP 398743) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  21   Thread 0x7f79274006c0 (LWP 398742) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  22   Thread 0x7f7927e006c0 (LWP 398741) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  23   Thread 0x7f792ca006c0 (LWP 398740) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  24   Thread 0x7f792d4006c0 (LWP 398739) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  25   Thread 0x7f792de006c0 (LWP 398738) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  26   Thread 0x7f7938c006c0 (LWP 398737) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  27   Thread 0x7f79396006c0 (LWP 398736) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  28   Thread 0x7f793a0006c0 (LWP 398735) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  29   Thread 0x7f793aa006c0 (LWP 398734) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  30   Thread 0x7f793b4006c0 (LWP 398733) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  31   Thread 0x7f793be006c0 (LWP 398732) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  32   Thread 0x7f7944c006c0 (LWP 398731) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  33   Thread 0x7f79460006c0 (LWP 398730) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  34   Thread 0x7f7983e006c0 (LWP 398729) "darktable"      0x00007f79920eae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  35   Thread 0x7f7946a006c0 (LWP 398716) "darktable"      syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  36   Thread 0x7f79474006c0 (LWP 398715) "worker 6"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f79487143e8) at ./nptl/futex-internal.c:57
  37   Thread 0x7f7947e006c0 (LWP 398714) "worker 6"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f7948713718) at ./nptl/futex-internal.c:57
  38   Thread 0x7f7950c006c0 (LWP 398713) "worker 6"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f7948713718) at ./nptl/futex-internal.c:57
  39   Thread 0x7f79516006c0 (LWP 398712) "worker 6"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f7948713718) at ./nptl/futex-internal.c:57
  40   Thread 0x7f79520006c0 (LWP 398711) "worker 6"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f7948713718) at ./nptl/futex-internal.c:57
  41   Thread 0x7f795cc006c0 (LWP 398710) "worker 6"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7f795cbf1660, op=393, expected=0, futex_word=0x7f79480079a0) at ./nptl/futex-internal.c:57
  42   Thread 0x7f795d6006c0 (LWP 398709) "cuda-EvtHandlr" 0x00007f799230ed2f in __GI___poll (fds=0x7f7928000c20, nfds=11, timeout=100) at ../sysdeps/unix/sysv/linux/poll.c:29
  43   Thread 0x7f79696006c0 (LWP 398705) "gphoto_update"  0x00007f79922e8a65 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7f79695f1610, rem=0x7f79695f1620) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
  44   Thread 0x7f796a0006c0 (LWP 398704) "worker res 2"   0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
  45   Thread 0x7f796aa006c0 (LWP 398703) "worker res 1"   0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
  46   Thread 0x7f796b4006c0 (LWP 398702) "worker res 0"   0x00007f7992305da7 in __GI___wait4 (pid=pid@entry=398820, stat_loc=stat_loc@entry=0x0, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
  47   Thread 0x7f796be006c0 (LWP 398701) "kicker"         0x00007f79922e8a65 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f796bdf1680, rem=rem@entry=0x7f796bdf1680) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
  48   Thread 0x7f7974c006c0 (LWP 398700) "worker 6"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
  49   Thread 0x7f79756006c0 (LWP 398699) "worker 5"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
  50   Thread 0x7f79760006c0 (LWP 398698) "worker 4"       0x00007f79922e8a65 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7f7975ff15d0, rem=0x7f7975ff15e0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
  51   Thread 0x7f7976a006c0 (LWP 398697) "worker 3"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
  52   Thread 0x7f79774006c0 (LWP 398696) "thumbs_update"  0x00007f79922e8a65 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7f79773f0570, rem=0x7f79773f0580) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
  53   Thread 0x7f7977e006c0 (LWP 398695) "worker 1"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
  54   Thread 0x7f7980c006c0 (LWP 398694) "worker 0"       0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
  55   Thread 0x7f79834006c0 (LWP 398692) "gdbus"          0x00007f799230f266 in __GI_ppoll (fds=0x7f78f8000c70, nfds=3, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
  56   Thread 0x7f7988a006c0 (LWP 398690) "gmain"          0x00007f799230f266 in __GI_ppoll (fds=0x55f73edc3aa0, nfds=2, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
  57   Thread 0x7f79894006c0 (LWP 398689) "pool-spawner"   syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

=========

Thread 57 (Thread 0x7f79894006c0 (LWP 398689) "pool-spawner"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f7992032794 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f7991f9c3fb in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f79920044b2 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00007f7992004321 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159970444992, 1647949358541220487, -57472, 2, 140730027837344, 140159962054656, -1716536336305968505, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 56 (Thread 0x7f7988a006c0 (LWP 398690) "gmain"):
#0  0x00007f799230f266 in __GI_ppoll (fds=0x55f73edc3aa0, nfds=2, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
        sc_ret = -514
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        tval = {tv_sec = 1, tv_nsec = -57472}
#1  0x00007f7991fd4aec in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f7991fd5180 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f7991fd51d1 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00007f7992004321 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159959959232, 1647949358541220487, -57472, 2, 140730027837152, 140159951568896, -1716539909718758777, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 55 (Thread 0x7f79834006c0 (LWP 398692) "gdbus"):
#0  0x00007f799230f266 in __GI_ppoll (fds=0x7f78f8000c70, nfds=3, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
        sc_ret = -514
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        tval = {tv_sec = 94520399973192, tv_nsec = -57472}
#1  0x00007f7991fd4aec in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f7991fd546f in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f799137ed1a in ??? () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
#4  0x00007f7992004321 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159869781696, 1647949358541220487, -57472, 11, 140730027837648, 140159861391360, -1716514346073412985, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 54 (Thread 0x7f7980c006c0 (LWP 398694) "worker 0"):
#0  0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
        sc_cancel_oldtype = 0
        __arg6 = <optimized out>
        __arg3 = <optimized out>
        _a5 = <optimized out>
        _a2 = <optimized out>
        sc_ret = <optimized out>
        __arg4 = <optimized out>
        __arg1 = <optimized out>
        _a6 = <optimized out>
        _a3 = <optimized out>
        resultvar = <optimized out>
        __arg5 = <optimized out>
        __arg2 = <optimized out>
        _a4 = <optimized out>
        _a1 = <optimized out>
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
        err = <optimized out>
        clockbit = 256
        op = 393
#2  0x00007f799229e2ab in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f79922a0990 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55f73ede4b60, cond=0x55f73ede4bb0) at ./nptl/pthread_cond_wait.c:503
        spin = 0
        buffer = {__routine = 0x7f79922a0700 <__condvar_cleanup_waiting>, __arg = 0x7f7980bf1630, __canceltype = 1065353216, __prev = 0x0}
        cbuffer = {wseq = 615, cond = 0x55f73ede4bb0, mutex = 0x55f73ede4b60, private = 0}
        err = <optimized out>
        g = 1
        flags = <optimized out>
        g1_start = <optimized out>
        signals = <optimized out>
        result = 0
        wseq = 615
        seq = 307
        private = 0
        maxspin = <optimized out>
        err = <optimized out>
        result = <optimized out>
        wseq = <optimized out>
        g = <optimized out>
        seq = <optimized out>
        flags = <optimized out>
        private = <optimized out>
        signals = <optimized out>
        done = <optimized out>
        g1_start = <optimized out>
        spin = <optimized out>
        buffer = {__routine = <optimized out>, __arg = <optimized out>, __canceltype = <optimized out>, __prev = <optimized out>}
        cbuffer = {wseq = <optimized out>, cond = <optimized out>, mutex = <optimized out>, private = <optimized out>}
        s = <optimized out>
#4  ___pthread_cond_wait (cond=cond@entry=0x55f73ede4bb0, mutex=mutex@entry=0x55f73ede4b60) at ./nptl/pthread_cond_wait.c:618
#5  0x00007f7992598abe in dt_pthread_cond_wait (cond=0x55f73ede4bb0, mutex=0x55f73ede4b60) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/dtpthread.h:329
#6  _control_work (ptr=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:586
        params = <optimized out>
        control = 0x55f73ede2520
        name = "worker 0\000\000\000\000\000\000\000"
#7  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159827838656, 1647949358541220487, -57472, 11, 140730027837856, 140159819448320, -1716522042654807417, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#8  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 53 (Thread 0x7f7977e006c0 (LWP 398695) "worker 1"):
#0  0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
        sc_cancel_oldtype = 0
        __arg6 = <optimized out>
        __arg3 = <optimized out>
        _a5 = <optimized out>
        _a2 = <optimized out>
        sc_ret = <optimized out>
        __arg4 = <optimized out>
        __arg1 = <optimized out>
        _a6 = <optimized out>
        _a3 = <optimized out>
        resultvar = <optimized out>
        __arg5 = <optimized out>
        __arg2 = <optimized out>
        _a4 = <optimized out>
        _a1 = <optimized out>
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
        err = <optimized out>
        clockbit = 256
        op = 393
#2  0x00007f799229e2ab in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f79922a0990 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55f73ede4b60, cond=0x55f73ede4bb0) at ./nptl/pthread_cond_wait.c:503
        spin = 0
        buffer = {__routine = 0x7f79922a0700 <__condvar_cleanup_waiting>, __arg = 0x7f7977df1630, __canceltype = 1065353216, __prev = 0x0}
        cbuffer = {wseq = 619, cond = 0x55f73ede4bb0, mutex = 0x55f73ede4b60, private = 0}
        err = <optimized out>
        g = 1
        flags = <optimized out>
        g1_start = <optimized out>
        signals = <optimized out>
        result = 0
        wseq = 619
        seq = 309
        private = 0
        maxspin = <optimized out>
        err = <optimized out>
        result = <optimized out>
        wseq = <optimized out>
        g = <optimized out>
        seq = <optimized out>
        flags = <optimized out>
        private = <optimized out>
        signals = <optimized out>
        done = <optimized out>
        g1_start = <optimized out>
        spin = <optimized out>
        buffer = {__routine = <optimized out>, __arg = <optimized out>, __canceltype = <optimized out>, __prev = <optimized out>}
        cbuffer = {wseq = <optimized out>, cond = <optimized out>, mutex = <optimized out>, private = <optimized out>}
        s = <optimized out>
#4  ___pthread_cond_wait (cond=cond@entry=0x55f73ede4bb0, mutex=mutex@entry=0x55f73ede4b60) at ./nptl/pthread_cond_wait.c:618
#5  0x00007f7992598abe in dt_pthread_cond_wait (cond=0x55f73ede4bb0, mutex=0x55f73ede4b60) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/dtpthread.h:329
#6  _control_work (ptr=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:586
        params = <optimized out>
        control = 0x55f73ede2520
        name = "worker 1\000\000\000\000\000\000\000"
#7  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159678940864, 1647949358541220487, -57472, 11, 140730027837856, 140159670550528, -1716893402707088761, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#8  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 52 (Thread 0x7f79774006c0 (LWP 398696) "thumbs_update"):
#0  0x00007f79922e8a65 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7f79773f0570, rem=0x7f79773f0580) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        r = <optimized out>
#1  0x00007f79922f3613 in __GI___nanosleep (req=<optimized out>, rem=<optimized out>) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
        ret = <optimized out>
#2  0x00007f7992005897 in g_usleep () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f7992596f2d in dt_update_thumbs_thread (p=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/crawler.c:1005
        i = <optimized out>
        bt = <optimized out>
        dwriting = <optimized out>
        updated = 0
#4  0x00007f79924ed931 in _backthumbs_job_run (job=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/darktable.c:677
#5  0x00007f7992598103 in _control_job_execute (job=job@entry=0x55f7430930c0) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:314
#6  0x00007f7992598ad8 in _control_run_job (control=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:333
        job = 0x55f7430930c0
        job = <optimized out>
#7  _control_work (ptr=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:582
        params = <optimized out>
        control = 0x55f73ede2520
        name = "worker 2\000\000\000\000\000\000\000"
#8  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159668455104, 1647949358541220487, -57472, 11, 140730027837856, 140159660064768, -1716892578073367929, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#9  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 51 (Thread 0x7f7976a006c0 (LWP 398697) "worker 3"):
#0  0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
        sc_cancel_oldtype = 0
        __arg6 = <optimized out>
        __arg3 = <optimized out>
        _a5 = <optimized out>
        _a2 = <optimized out>
        sc_ret = <optimized out>
        __arg4 = <optimized out>
        __arg1 = <optimized out>
        _a6 = <optimized out>
        _a3 = <optimized out>
        resultvar = <optimized out>
        __arg5 = <optimized out>
        __arg2 = <optimized out>
        _a4 = <optimized out>
        _a1 = <optimized out>
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
        err = <optimized out>
        clockbit = 256
        op = 393
#2  0x00007f799229e2ab in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f79922a0990 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55f73ede4b60, cond=0x55f73ede4bb0) at ./nptl/pthread_cond_wait.c:503
        spin = 0
        buffer = {__routine = 0x7f79922a0700 <__condvar_cleanup_waiting>, __arg = 0x7f79769f1630, __canceltype = 1065353216, __prev = 0x0}
        cbuffer = {wseq = 607, cond = 0x55f73ede4bb0, mutex = 0x55f73ede4b60, private = 0}
        err = <optimized out>
        g = 1
        flags = <optimized out>
        g1_start = <optimized out>
        signals = <optimized out>
        result = 0
        wseq = 607
        seq = 303
        private = 0
        maxspin = <optimized out>
        err = <optimized out>
        result = <optimized out>
        wseq = <optimized out>
        g = <optimized out>
        seq = <optimized out>
        flags = <optimized out>
        private = <optimized out>
        signals = <optimized out>
        done = <optimized out>
        g1_start = <optimized out>
        spin = <optimized out>
        buffer = {__routine = <optimized out>, __arg = <optimized out>, __canceltype = <optimized out>, __prev = <optimized out>}
        cbuffer = {wseq = <optimized out>, cond = <optimized out>, mutex = <optimized out>, private = <optimized out>}
        s = <optimized out>
#4  ___pthread_cond_wait (cond=cond@entry=0x55f73ede4bb0, mutex=mutex@entry=0x55f73ede4b60) at ./nptl/pthread_cond_wait.c:618
#5  0x00007f7992598abe in dt_pthread_cond_wait (cond=0x55f73ede4bb0, mutex=0x55f73ede4b60) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/dtpthread.h:329
#6  _control_work (ptr=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:586
        params = <optimized out>
        control = 0x55f73ede2520
        name = "worker 3\000\000\000\000\000\000\000"
#7  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159657969344, 1647949358541220487, -57472, 11, 140730027837856, 140159649579008, -1716896151486158201, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#8  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 50 (Thread 0x7f79760006c0 (LWP 398698) "worker 4"):
#0  0x00007f79922e8a65 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7f7975ff15d0, rem=0x7f7975ff15e0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        r = <optimized out>
#1  0x00007f79922f3613 in __GI___nanosleep (req=<optimized out>, rem=<optimized out>) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
        ret = <optimized out>
#2  0x00007f7992005897 in g_usleep () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f79925a14d9 in _control_write_sidecars_job_run (job=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs/sidecar_jobs.c:95
        new_imgs = 0x0
        imgs = <optimized out>
        enqueued = Python Exception <class 'gdb.error'>: There is no member named keys.
0x7f7954000c60
#4  0x00007f7992598103 in _control_job_execute (job=job@entry=0x55f742e1bf70) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:314
#5  0x00007f7992598ad8 in _control_run_job (control=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:333
        job = 0x55f742e1bf70
        job = <optimized out>
#6  _control_work (ptr=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:582
        params = <optimized out>
        control = 0x55f73ede2520
        name = "worker 4\000\000\000\000\000\000\000"
#7  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159647483584, 1647949358541220487, -57472, 11, 140730027837856, 140159639093248, -1716897525875692921, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#8  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 49 (Thread 0x7f79756006c0 (LWP 398699) "worker 5"):
#0  0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
        sc_cancel_oldtype = 0
        __arg6 = <optimized out>
        __arg3 = <optimized out>
        _a5 = <optimized out>
        _a2 = <optimized out>
        sc_ret = <optimized out>
        __arg4 = <optimized out>
        __arg1 = <optimized out>
        _a6 = <optimized out>
        _a3 = <optimized out>
        resultvar = <optimized out>
        __arg5 = <optimized out>
        __arg2 = <optimized out>
        _a4 = <optimized out>
        _a1 = <optimized out>
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
        err = <optimized out>
        clockbit = 256
        op = 393
#2  0x00007f799229e2ab in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f79922a0990 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55f73ede4b60, cond=0x55f73ede4bb0) at ./nptl/pthread_cond_wait.c:503
        spin = 0
        buffer = {__routine = 0x7f79922a0700 <__condvar_cleanup_waiting>, __arg = 0x7f79755f1630, __canceltype = 107344000, __prev = 0x0}
        cbuffer = {wseq = 613, cond = 0x55f73ede4bb0, mutex = 0x55f73ede4b60, private = 0}
        err = <optimized out>
        g = 1
        flags = <optimized out>
        g1_start = <optimized out>
        signals = <optimized out>
        result = 0
        wseq = 613
        seq = 306
        private = 0
        maxspin = <optimized out>
        err = <optimized out>
        result = <optimized out>
        wseq = <optimized out>
        g = <optimized out>
        seq = <optimized out>
        flags = <optimized out>
        private = <optimized out>
        signals = <optimized out>
        done = <optimized out>
        g1_start = <optimized out>
        spin = <optimized out>
        buffer = {__routine = <optimized out>, __arg = <optimized out>, __canceltype = <optimized out>, __prev = <optimized out>}
        cbuffer = {wseq = <optimized out>, cond = <optimized out>, mutex = <optimized out>, private = <optimized out>}
        s = <optimized out>
#4  ___pthread_cond_wait (cond=cond@entry=0x55f73ede4bb0, mutex=mutex@entry=0x55f73ede4b60) at ./nptl/pthread_cond_wait.c:618
#5  0x00007f7992598abe in dt_pthread_cond_wait (cond=0x55f73ede4bb0, mutex=0x55f73ede4b60) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/dtpthread.h:329
#6  _control_work (ptr=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:586
        params = <optimized out>
        control = 0x55f73ede2520
        name = "worker 5\000\000\000\000\000\000\000"
#7  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159636997824, 1647949358541220487, -57472, 11, 140730027837856, 140159628607488, -1716896701241972089, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#8  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 48 (Thread 0x7f7974c006c0 (LWP 398700) "worker 6"):
#0  0x00007f799229e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f73ede4bdc) at ./nptl/futex-internal.c:57
        sc_cancel_oldtype = 0
        __arg6 = <optimized out>
        __arg3 = <optimized out>
        _a5 = <optimized out>
        _a2 = <optimized out>
        sc_ret = <optimized out>
        __arg4 = <optimized out>
        __arg1 = <optimized out>
        _a6 = <optimized out>
        _a3 = <optimized out>
        resultvar = <optimized out>
        __arg5 = <optimized out>
        __arg2 = <optimized out>
        _a4 = <optimized out>
        _a1 = <optimized out>
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
        err = <optimized out>
        clockbit = 256
        op = 393
#2  0x00007f799229e2ab in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55f73ede4bdc, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f79922a0990 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55f73ede4b60, cond=0x55f73ede4bb0) at ./nptl/pthread_cond_wait.c:503
        spin = 0
        buffer = {__routine = 0x7f79922a0700 <__condvar_cleanup_waiting>, __arg = 0x7f7974bf1630, __canceltype = 1065353216, __prev = 0x0}
        cbuffer = {wseq = 609, cond = 0x55f73ede4bb0, mutex = 0x55f73ede4b60, private = 0}
        err = <optimized out>
        g = 1
        flags = <optimized out>
        g1_start = <optimized out>
        signals = <optimized out>
        result = 0
        wseq = 609
        seq = 304
        private = 0
        maxspin = <optimized out>
        err = <optimized out>
        result = <optimized out>
        wseq = <optimized out>
        g = <optimized out>
        seq = <optimized out>
        flags = <optimized out>
        private = <optimized out>
        signals = <optimized out>
        done = <optimized out>
        g1_start = <optimized out>
        spin = <optimized out>
        buffer = {__routine = <optimized out>, __arg = <optimized out>, __canceltype = <optimized out>, __prev = <optimized out>}
        cbuffer = {wseq = <optimized out>, cond = <optimized out>, mutex = <optimized out>, private = <optimized out>}
        s = <optimized out>
#4  ___pthread_cond_wait (cond=cond@entry=0x55f73ede4bb0, mutex=mutex@entry=0x55f73ede4b60) at ./nptl/pthread_cond_wait.c:618
#5  0x00007f7992598abe in dt_pthread_cond_wait (cond=0x55f73ede4bb0, mutex=0x55f73ede4b60) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/dtpthread.h:329
#6  _control_work (ptr=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:586
        params = <optimized out>
        control = 0x55f73ede2520
        name = "worker 6\000\000\000\000\000\000\000"
#7  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159626512064, 1647949358541220487, -57472, 11, 140730027837856, 140159618121728, -1716900274654762361, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#8  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 47 (Thread 0x7f796be006c0 (LWP 398701) "kicker"):
#0  0x00007f79922e8a65 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f796bdf1680, rem=rem@entry=0x7f796bdf1680) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        r = <optimized out>
#1  0x00007f79922f3613 in __GI___nanosleep (req=req@entry=0x7f796bdf1680, rem=rem@entry=0x7f796bdf1680) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
        ret = <optimized out>
#2  0x00007f79923043ca in __sleep (seconds=0, seconds@entry=2) at ../sysdeps/posix/sleep.c:55
        save_errno = 0
        ts = {tv_sec = 1, tv_nsec = 488868487}
#3  0x00007f79925977ea in _control_worker_kicker (ptr=0x55f73ede2520) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/control/jobs.c:558
        control = 0x55f73ede2520
#4  0x00007f79922a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140159477614272, 1647949358541220487, -57472, 11, 140730027837856, 140159469223936, -1716884606614066553, -1716551635038983545}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f799231c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 46 (Thread 0x7f796b4006c0 (LWP 398702) "worker res 0"):
#0  0x00007f7992305da7 in __GI___wait4 (pid=pid@entry=398820, stat_loc=stat_loc@entry=0x0, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
        sc_ret = -512
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00007f7992305eb7 in __GI___waitpid (pid=pid@entry=398820, stat_loc=stat_loc@entry=0x0, options=options@entry=0) at ./posix/waitpid.c:38
#2  0x00007f7992584ce8 in _dt_sigsegv_handler (param=11) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/system_signal_handling.c:101
        pid = 398820
        name_used = 0x7f79400a7c30 "/tmp/darktable_bt_JLIZS2.txt"
        fout = <optimized out>
        delete_file = 0
        datadir = "/opt/darktable/share/darktable", '\000' <repeats 4065 times>
        pid_arg = 0x7f79400a4eb0 "398688"
        comm_arg = 0x7f79400afdd0 "/opt/darktable/share/darktable/gdb_commands"
        logenable = 0x7f79400b0d60 "set logging enabled on"
        setlogfile = 0x7f794004eb90 "set logging file /tmp/darktable_bt_JLIZS2.txt"
#3  0x00007f79922545d0 in <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ??? ()
#5  0x00007f7980c34836 in _prefilter_chromaticity_cl (roi=0x7f796b3e8960, devid=0, gd=0x55f741af3500, UV=<optimized out>, saturation=<optimized out>, weight=<optimized out>, csigma=<optimized out>, epsilon=<optimized out>, sat_shift=<optimized out>) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/iop/colorequal.c:1359
        err = <optimized out>
        sigma = <optimized out>
        ds_UV = 0x7f7940088bb0
        a = 0x0
        b_full = 0x7f79400a5d50
        width = 236
        covariance = 0x0
        covariance_tmp = 0x0
        b = 0x0
        a_full = 0x7f79400a2850
        UV_tmp = 0x7f79400a1fb0
        scaling = <optimized out>
        gsigma = <optimized out>
        ds_height = 354
        ds_width = 236
        height = 354
        resized = <optimized out>
        err = <optimized out>
        sigma = <optimized out>
        width = <optimized out>
        height = <optimized out>
        scaling = <optimized out>
        gsigma = <optimized out>
        ds_height = <optimized out>
        ds_width = <optimized out>
        resized = <optimized out>
        ds_UV = <optimized out>
        covariance = <optimized out>
        covariance_tmp = <optimized out>
        a = <optimized out>
        b = <optimized out>
        a_full = <optimized out>
        b_full = <optimized out>
        UV_tmp = <optimized out>
        error = <optimized out>
        ch = <optimized out>
        ch = <optimized out>
#6  process_cl (self=<optimized out>, piece=<optimized out>, dev_in=<optimized out>, dev_out=<optimized out>, roi_in=<optimized out>, roi_out=0x7f796b3e8960) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/iop/colorequal.c:1624
        d = 0x7f79400209c0
        gd = 0x55f741af3500
        work_profile = <optimized out>
        err = <optimized out>
        devid = <optimized out>
        owidth = 236
        oheight = 354
        g = <optimized out>
        fullpipe = <optimized out>
        mask_mode = 0
        guiding = 1
        run_fast = <optimized out>
        white = 1.21629012
        sat_shift = 0.0600000024
        max_brightness_shift = <optimized out>
        corr_max_brightness_shift = <optimized out>
        bright_shift = 0.073105
        gradient_amp = 0.0170933846
        UV = 0x7f7940088bb0
        corrections = 0x7f7940089360
        b_corrections = 0x7f7940089b40
        L = 0x7f794008a320
        scharr = 0x7f794008a320
        saturation = 0x7f794008ab00
        dev_tmp = 0x7f7940088400
        input_matrix = {{0.65512532, 0.1382166, 0.157010317, 0}, {0.27726227, 0.6788975, 0.043841105, 0}, {0.00141842337, 0.0491540544, 1.03821886, 0}, {4.32468045e-35, 3.08383753e-41, -6.77852855e-28, 4.57285728e-41}}
        output_matrix = {{1.6629343, -0.321330518, -0.237917423, 0}, {-0.681079388, 1.60909951, 0.035052143, 0}, {0.0299735144, -0.0757431611, 0.961853564, 0}, {2.3032221e+26, 4.57285728e-41, 2.30322505e+26, 4.57285728e-41}}
        input_matrix_cl = 0x7f7940062e60
        output_matrix_cl = 0x7f794007b5e0
        gamut_LUT = 0x7f7940064560
        LUT_saturation = 0x7f7940075060
        LUT_hue = 0x7f794006c050
        LUT_brightness = 0x7f7940086f10
        weight = 0x7f7940087c60
#7  0x00007f7992627941 in _dev_pixelpipe_process_rec (pipe=pipe@entry=0x55f73f0396c0, dev=dev@entry=0x55f7418618f0, output=output@entry=0x7f796b3e9ae8, cl_mem_output=cl_mem_output@entry=0x7f796b3e9af0, out_format=out_format@entry=0x7f796b3e9af8, roi_out=roi_out@entry=0x7f796b3e8960, modules=0x7f79400039e0 = {...}, pieces=0x7f7940020990 = {...}, pos=51) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/develop/pixelpipe_hb.c:1880
        pfm_dump = <optimized out>
        err = <optimized out>
        cst_from = <optimized out>
        cst_to = <optimized out>
        cst_out = <optimized out>
        valid_input_on_gpu_only = 1
        possible_cl = <optimized out>
        success_opencl = <optimized out>
        input_cst_cl = IOP_CS_RGB
        required_factor_cl = <optimized out>
        m_bpp = <optimized out>
        fits_on_device = <optimized out>
        roi_in = {x = 0, y = 0, width = 236, height = 354, scale = 0.061097689}
        module_name = "colorequal", '\000' <repeats 245 times>
        input = 0x7f792deb9040
        cl_mem_input = 0x7f794008fe50
        module = <optimized out>
        piece = 0x7f79400206c0
        old_pipetype = <optimized out>
        gui_module = <optimized out>
        bpp = <optimized out>
        bufsize = <optimized out>
        hash = <optimized out>
        gamma_preview = <optimized out>
        cache_available = 0
        _input_format = {channels = 4, datatype = TYPE_FLOAT, filters = 2492765332, xtrans = {"\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000"}, rawprepare = {raw_black_level = 2048, raw_white_point = 15094}, temperature = {enabled = 1, coeffs = {2.25128031, 1, 1.41811609, nan(0x400000)}}, processed_maximum = {3.65576553, 1.62386072, 2.30282307, nan(0x400000)}, cst = 2}
        input_format = 0x55f741839740
        in_bpp = 16
        out_bpp = <optimized out>
        important = <optimized out>
        important_cl = 0
        start = {clock = 434071687.37182099, user = 7.0927920000000002}
        pixelpipe_flow = PIXELPIPE_FLOW_HISTOGRAM_NONE
        tiling = {factor = 7.5, factor_cl = 7.5, maxbuf = 1, maxbuf_cl = 1, overhead = 38528, overlap = 17, xalign = 1, yalign = 1}
        work_profile = 0x7f794004bd00
        __FUNCTION__ = "_dev_pixelpipe_process_rec"
        histogram_log = "\a\000\000\000\000\000\000\000xj\001", '\000' <repeats 13 times>, "\210E\a\000\000\000\000"
#8  0x00007f7992625a8b in _dev_pixelpipe_process_rec (pipe=pipe@entry=0x55f73f0396c0, dev=dev@entry=0x55f7418618f0, output=output@entry=0x7f796b3e9ae8, cl_mem_output=cl_mem_output@entry=0x7f796b3e9af0, out_format=out_format@entry=0x7f796b3e9af8, roi_out=roi_out@entry=0x7f796b3e8cf0, modules=0x7f7940003a00 = {...}, pieces=0x7f79400226a0 = {...}, pos=52) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/develop/pixelpipe_hb.c:1360
        roi_in = {x = 0, y = 0, width = 236, height = 354, scale = 0.061097689}
        module_name = '\000' <repeats 255 times>
        input = 0x0
        cl_mem_input = 0x0
        module = <optimized out>
        piece = 0x7f7940022400
        old_pipetype = <optimized out>
        gui_module = <optimized out>
        bpp = <optimized out>
        bufsize = <optimized out>
        hash = <optimized out>
        gamma_preview = <optimized out>
        cache_available = <optimized out>
        _input_format = {channels = 0, datatype = TYPE_UNKNOWN, filters = 0, xtrans = {"\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000"}, rawprepare = {raw_black_level = 0, raw_white_point = 0}, temperature = {enabled = 0, coeffs = {0, 0, 0, 0}}, processed_maximum = {0, 0, 0, 0}, cst = 0}
        input_format = 0x0
        in_bpp = <optimized out>
        out_bpp = <optimized out>
        important = <optimized out>
        important_cl = <optimized out>
        start = {clock = 0, user = 0}
        pixelpipe_flow = PIXELPIPE_FLOW_NONE
        tiling = {factor = 0, factor_cl = 0, maxbuf = 0, maxbuf_cl = 0, overhead = 0, overlap = 0, xalign = 0, yalign = 0}
        work_profile = <optimized out>
        __FUNCTION__ = "_dev_pixelpipe_process_rec"
        histogram_log = '\000' <repeats 31 times>
#9  0x00007f7992625a8b in _dev_pixelpipe_process_rec (pipe=pipe@entry=0x55f73f0396c0, dev=dev@entry=0x55f7418618f0, output=output@entry=0x7f796b3e9ae8, cl_mem_output=cl_mem_output@entry=0x7f796b3e9af0, out_format=out_format@entry=0x7f796b3e9af8, roi_out=roi_out@entry=0x7f796b3e9080, modules=0x7f7940003a20 = {...}, pieces=0x7f7940023610 = {...}, pos=53) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/develop/pixelpipe_hb.c:1360
        roi_in = {x = 0, y = 0, width = 236, height = 354, scale = 0.061097689}
        module_name = '\000' <repeats 255 times>
        input = 0x0
        cl_mem_input = 0x0
        module = <optimized out>
        piece = 0x7f7940022f80
        old_pipetype = <optimized out>
        gui_module = <optimized out>
        bpp = <optimized out>
        bufsize = <optimized out>
        hash = <optimized out>
        gamma_preview = <optimized out>
        cache_available = <optimized out>
        _input_format = {channels = 0, datatype = TYPE_UNKNOWN, filters = 0, xtrans = {"\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000"}, rawprepare = {raw_black_level = 0, raw_white_point = 0}, temperature = {enabled = 0, coeffs = {0, 0, 0, 0}}, processed_maximum = {0, 0, 0, 0}, cst = 0}
        input_format = 0x0
        in_bpp = <optimized out>
        out_bpp = <optimized out>
        important = <optimized out>
        important_cl = <optimized out>
        start = {clock = 0, user = 0}
        pixelpipe_flow = PIXELPIPE_FLOW_NONE
        tiling = {factor = 0, factor_cl = 0, maxbuf = 0, maxbuf_cl = 0, overhead = 0, overlap = 0, xalign = 0, yalign = 0}
        work_profile = <optimized out>
        __FUNCTION__ = "_dev_pixelpipe_process_rec"
        histogram_log = '\000' <repeats 31 times>
#10 0x00007f7992625a8b in _dev_pixelpipe_process_rec (pipe=pipe@entry=0x55f73f0396c0, dev=dev@entry=0x55f7418618f0, output=output@entry=0x7f796b3e9ae8, cl_mem_output=cl_mem_output@entry=0x7f796b3e9af0, out_format=out_format@entry=0x7f796b3e9af8, roi_out=roi_out@entry=0x7f796b3e9410, modules=0x7f7940003a40 = {...}, pieces=0x7f7940023a80 = {...}, pos=54) at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/develop/pixelpipe_hb.c:1360
        roi_in = {x = 0, y = 0, width = 236, height = 354, scale = 0.061097689}
        module_name = '\000' <repeats 255 times>
        input = 0x0
        cl_mem_input = 0x0
        module = <optimized out>
        piece = 0x7f7940023630
        old_pipetype = <optimized out>
        gui_module = <optimized out>
        bpp = <optimized out>
        bufsize = <optimized out>
        hash = <optimized out>
        gamma_preview = <optimized out>
        cache_available = <optimized out>
        _input_format = {channels = 0, datatype = TYPE_UNKNOWN, filters = 0, xtrans = {"\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000"}, rawprepare = {raw_black_level = 0, raw_white_point = 0}, temperature = {enabled = 0, coeffs = {0, 0, 0, 0}}, processed_maximum = {0, 0, 0, 0}, cst = 0}
        input_format = 0x0
        in_bpp = <optimized out>
        out_bpp = <optimized out>
        important = <optimized out>
        important_cl = <optimized out>
        start = {clock = 0, user = 0}
        pixelpipe_flow = PIXELPIPE_FLOW_NONE
        tiling = {factor = 0, factor_cl = 0, maxbuf = 0, maxbuf_cl = 0, overhead = 0, overlap = 0, xalign = 0, yalign = 0}
        work_profile = <optimized out>
        __FUNCTION__ = "_dev_pixelpipe_process_rec"
        histogram_log = '\000' <repeats 31 times>

...

@TurboGit TurboGit self-requested a review August 26, 2024 13:43
A pre-requisite for efficient color equalizer OpenCL support.
New opencl kernel functions for 2-channel cl_mem images and support in gaussian API.
1. lookup_gamut might be used elsewhere thus it has been moved to colorspace.h
2. kernel_interpolate_bilinear can be used with 1-4 channels, it's last parameter has been
   changed for correctness and callers have been modified accordingly.
3. In laplacian there was one call for (2) that missed a kernel parameter.
   So far result was likely correct ...
   Checked code there and modified to modern dt_opencl_enqueue_kernel_2d_args() usage
4. Bumped CL kernel version so enforcing a recompilation.
5. Introduce dt_opencl_duplicate_image(const int devid, const cl_mem src)
1. Independent of chosen white level.
2. Generally darker so better viewing.
@jenshannoschwalm
Copy link
Collaborator Author

force-pushed some updates - there were race conditions with released cl images that mostly were no problem here on intel (intel seems to late-release so data might still be valid for some time, this has been observed elsewhere before). But - i could reproduce something as reported by @TurboGit so maybe another round.

If still crashing we would have to investigate the interpolate_bilinear and gaussian on 2-ch images.

@piratenpanda
Copy link
Contributor

When I see "[dev_pixelpipe] took 0,060 secs (0,388 CPU) [full] processed `colorequal' on GPU, blended on GPU" its running correctly on the opencl path, right? So far no issues on ROCM 6.2.0 with my 6700 XT

Added kernels plus some missing OpenCL colorspace conversion inline functions.
Should be output_identical compared with CPU code after testing except minimal differences
due to some use of native OpenCL functions for performance (as we do elsewhere while converting
pixel colorspace.
@jenshannoschwalm
Copy link
Collaborator Author

When I see "[dev_pixelpipe] took 0,060 secs (0,388 CPU) [full] processed `colorequal' on GPU, blended on GPU" its running correctly on the opencl path, right?

Exactly.

@jenshannoschwalm
Copy link
Collaborator Author

Just force-pushed some reduction in opencl memory consumption by simple reordering plus removing one kernel call that might help for performance.

@jenshannoschwalm
Copy link
Collaborator Author

When I see "[dev_pixelpipe] took 0,060 secs (0,388 CPU) [full] processed `colorequal' on GPU, blended on GPU" its running correctly on the opencl path, right? So far no issues on ROCM 6.2.0 with my 6700 XT

BTW if opencl runs fine i would be very interested in performance on your system.

@TurboGit
Copy link
Member

Testing this new version, good news no crash on my side. The perf is far better than CPU:

     3,3180 [dev_pixelpipe] took 0,002 secs (0,000 CPU) [thumbnail] processed `colorequal' on GPU, blended on GPU
     6,0645 [dev_pixelpipe] took 0,009 secs (0,006 CPU) [full] processed `colorequal' on GPU, blended on GPU
     6,2554 [dev_pixelpipe] took 0,029 secs (0,021 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    12,9126 [dev_pixelpipe] took 0,032 secs (0,056 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    12,9671 [dev_pixelpipe] took 0,009 secs (0,013 CPU) [full] processed `colorequal' on GPU, blended on GPU
    13,0091 [dev_pixelpipe] took 0,031 secs (0,046 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    13,0824 [dev_pixelpipe] took 0,009 secs (0,008 CPU) [full] processed `colorequal' on GPU, blended on GPU
    17,3971 [dev_pixelpipe] took 0,032 secs (0,052 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    17,4960 [dev_pixelpipe] took 0,033 secs (0,054 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    17,5484 [dev_pixelpipe] took 0,008 secs (0,008 CPU) [full] processed `colorequal' on GPU, blended on GPU
    17,5860 [dev_pixelpipe] took 0,032 secs (0,046 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    17,6765 [dev_pixelpipe] took 0,030 secs (0,042 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    17,7250 [dev_pixelpipe] took 0,008 secs (0,004 CPU) [full] processed `colorequal' on GPU, blended on GPU
    19,6490 [dev_pixelpipe] took 0,032 secs (0,058 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    19,6998 [dev_pixelpipe] took 0,008 secs (0,016 CPU) [full] processed `colorequal' on GPU, blended on GPU
    19,7418 [dev_pixelpipe] took 0,032 secs (0,031 CPU) [preview] processed `colorequal' on GPU, blended on GPU

I'll review and do more testing tomorrow. Thanks for the hard work @jenshannoschwalm, I know I keep saying that... but that's your fault you keep doing great stuff for darktable :)

@piratenpanda
Copy link
Contributor

BTW if opencl runs fine i would be very interested in performance on your system.

Speed improvements are massive. Sliders feel so much smoother. CPU: Ryzen 9 7950X.

    14,2434 [dev_pixelpipe] took 0,012 secs (0,012 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    14,8415 [dev_pixelpipe] took 0,013 secs (0,015 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    14,8877 [dev_pixelpipe] took 0,012 secs (0,011 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    14,9322 [dev_pixelpipe] took 0,012 secs (0,011 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    14,9767 [dev_pixelpipe] took 0,012 secs (0,008 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    15,0256 [dev_pixelpipe] took 0,015 secs (0,016 CPU) [full] processed `colorequal' on GPU, blended on GPU
    15,5846 [dev_pixelpipe] took 0,016 secs (0,013 CPU) [full] processed `colorequal' on GPU, blended on GPU
    15,6254 [dev_pixelpipe] took 0,018 secs (0,015 CPU) [full] processed `colorequal' on GPU, blended on GPU
    15,6637 [dev_pixelpipe] took 0,016 secs (0,016 CPU) [full] processed `colorequal' on GPU, blended on GPU
    15,7029 [dev_pixelpipe] took 0,016 secs (0,012 CPU) [full] processed `colorequal' on GPU, blended on GPU
    15,7414 [dev_pixelpipe] took 0,016 secs (0,011 CPU) [full] processed `colorequal' on GPU, blended on GPU
    15,7789 [dev_pixelpipe] took 0,015 secs (0,000 CPU) [full] processed `colorequal' on GPU, blended on GPU
    15,8140 [dev_pixelpipe] took 0,012 secs (0,009 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    16,2811 [dev_pixelpipe] took 0,012 secs (0,008 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    16,3237 [dev_pixelpipe] took 0,012 secs (0,017 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    16,3654 [dev_pixelpipe] took 0,012 secs (0,008 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    16,4063 [dev_pixelpipe] took 0,011 secs (0,015 CPU) [preview] processed `colorequal' on GPU, blended on GPU
    16,4734 [dev_pixelpipe] took 0,015 secs (0,008 CPU) [full] processed `colorequal' on GPU, blended on GPU

vs.

     8,5553 [dev_pixelpipe] took 0,209 secs (2,876 CPU) [full] processed `colorequal' on CPU, blended on CPU
     9,0641 [dev_pixelpipe] took 0,165 secs (2,072 CPU) [preview] processed `colorequal' on CPU, blended on CPU
     9,1105 [dev_pixelpipe] took 0,210 secs (2,931 CPU) [full] processed `colorequal' on CPU, blended on CPU
     9,2915 [dev_pixelpipe] took 0,152 secs (2,217 CPU) [preview] processed `colorequal' on CPU, blended on CPU
     9,3404 [dev_pixelpipe] took 0,200 secs (3,127 CPU) [full] processed `colorequal' on CPU, blended on CPU
     9,8920 [dev_pixelpipe] took 0,167 secs (2,087 CPU) [preview] processed `colorequal' on CPU, blended on CPU
     9,9360 [dev_pixelpipe] took 0,210 secs (2,935 CPU) [full] processed `colorequal' on CPU, blended on CPU
    10,1256 [dev_pixelpipe] took 0,163 secs (1,951 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    10,1657 [dev_pixelpipe] took 0,203 secs (2,778 CPU) [full] processed `colorequal' on CPU, blended on CPU
    10,5758 [dev_pixelpipe] took 0,167 secs (1,866 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    10,6248 [dev_pixelpipe] took 0,216 secs (2,916 CPU) [full] processed `colorequal' on CPU, blended on CPU
    10,8120 [dev_pixelpipe] took 0,160 secs (2,101 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    10,8585 [dev_pixelpipe] took 0,206 secs (3,126 CPU) [full] processed `colorequal' on CPU, blended on CPU
    11,4122 [dev_pixelpipe] took 0,161 secs (2,256 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    11,4714 [dev_pixelpipe] took 0,220 secs (3,418 CPU) [full] processed `colorequal' on CPU, blended on CPU
    11,6476 [dev_pixelpipe] took 0,152 secs (1,860 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    11,6953 [dev_pixelpipe] took 0,199 secs (2,873 CPU) [full] processed `colorequal' on CPU, blended on CPU
    11,8809 [dev_pixelpipe] took 0,160 secs (2,308 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    11,9228 [dev_pixelpipe] took 0,201 secs (3,146 CPU) [full] processed `colorequal' on CPU, blended on CPU
    12,1549 [dev_pixelpipe] took 0,161 secs (2,204 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    12,2005 [dev_pixelpipe] took 0,206 secs (3,114 CPU) [full] processed `colorequal' on CPU, blended on CPU
    12,3968 [dev_pixelpipe] took 0,170 secs (2,283 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    12,4259 [dev_pixelpipe] took 0,198 secs (2,823 CPU) [full] processed `colorequal' on CPU, blended on CPU
    12,6158 [dev_pixelpipe] took 0,155 secs (2,032 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    12,6659 [dev_pixelpipe] took 0,204 secs (3,191 CPU) [full] processed `colorequal' on CPU, blended on CPU
    13,0108 [dev_pixelpipe] took 0,165 secs (2,039 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    13,0601 [dev_pixelpipe] took 0,214 secs (3,003 CPU) [full] processed `colorequal' on CPU, blended on CPU
    13,2393 [dev_pixelpipe] took 0,152 secs (2,058 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    13,2887 [dev_pixelpipe] took 0,201 secs (3,344 CPU) [full] processed `colorequal' on CPU, blended on CPU
    13,7150 [dev_pixelpipe] took 0,171 secs (2,367 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    13,7476 [dev_pixelpipe] took 0,203 secs (2,987 CPU) [full] processed `colorequal' on CPU, blended on CPU
    13,9471 [dev_pixelpipe] took 0,168 secs (2,119 CPU) [preview] processed `colorequal' on CPU, blended on CPU
    13,9795 [dev_pixelpipe] took 0,200 secs (2,794 CPU) [full] processed `colorequal' on CPU, blended on CPU

@MStraeten
Copy link
Collaborator

MStraeten commented Aug 26, 2024

apple m1max:

202,3265 [bench module GPU]   [full] `     colorequal' takes  0,20390s,   5,03mpix,   24,682pix/us
90,8250 [bench module plain] [full] `     colorequal' takes  0,49542s,   5,03mpix,   10,158pix/us

@jenshannoschwalm
Copy link
Collaborator Author

So good news, on all supported platforms it seems to be running fine with some per gains (except me here currently on Intel 620 for the week).

Copy link
Member

@TurboGit TurboGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, all good to me!

@TurboGit TurboGit merged commit 28f3414 into darktable-org:master Aug 27, 2024
6 checks passed
@TurboGit
Copy link
Member

It works give good results but it seems to have quite a high diff between the CPU and GPU version:

$ ./run.sh --op colorequal
Test 0158-coloreq-no-guided-filter
      Image mire1.cr2
      CPU & GPU version differ by 997389 pixels
      CPU vs. GPU report :
      ----------------------------------
      Max dE                   : 6.98332
      Avg dE                   : 0.21767
      Std dE                   : 0.37147
      ----------------------------------
      Pixels below avg + 0 std : 64.63 %
      Pixels below avg + 1 std : 87.49 %
      Pixels below avg + 3 std : 98.48 %
      Pixels below avg + 6 std : 99.74 %
      Pixels below avg + 9 std : 99.93 %
      ----------------------------------
      Pixels above tolerance   : 0.31 %
 
      Expected CPU vs. current CPU report :
      ----------------------------------
      Max dE                   : 1.37983
      Avg dE                   : 0.00848
      Std dE                   : 0.06640
      ----------------------------------
      Pixels below avg + 0 std : 98.05 %
      Pixels below avg + 1 std : 98.08 %
      Pixels below avg + 3 std : 98.19 %
      Pixels below avg + 6 std : 99.02 %
      Pixels below avg + 9 std : 99.69 %
      ----------------------------------
      Pixels above tolerance   : 0.00 %
 
  OK

Test 0159-coloreq-guided-filter
      Image mire1.cr2
      CPU & GPU version differ by 2.73934e+06 pixels
      CPU vs. GPU report :
      ----------------------------------
      Max dE                   : 12.86431
      Avg dE                   : 1.18455
      Std dE                   : 0.90806
      ----------------------------------
      Pixels below avg + 0 std : 67.59 %
      Pixels below avg + 1 std : 88.97 %
      Pixels below avg + 3 std : 97.73 %
      Pixels below avg + 6 std : 99.80 %
      Pixels below avg + 9 std : 99.98 %
      ----------------------------------
      Pixels above tolerance   : 9.09 %
 
      Expected CPU vs. current CPU report :
      ----------------------------------
      Max dE                   : 0.93404
      Avg dE                   : 0.00004
      Std dE                   : 0.00455
      ----------------------------------
      Pixels below avg + 0 std : 99.99 %
      Pixels below avg + 1 std : 99.99 %
      Pixels below avg + 3 std : 99.99 %
      Pixels below avg + 6 std : 99.99 %
      Pixels below avg + 9 std : 99.99 %
      ----------------------------------
      Pixels above tolerance   : 0.00 %
 
  OK

Total test 2
Errors     0

For other tests, the CPU vs GPU differ for about 30000 pixels. So maybe some calculation not fully equivalent? @jenshannoschwalm : Any idea?

@jenshannoschwalm
Copy link
Collaborator Author

  1. the native variant might be a problem...
  2. there are color new opencl transformation I I might have done some stupid things
  3. would be interesting to know, if the differences are there without the guiding.

@TurboGit
Copy link
Member

For 3 we can see in my report above that the non guided filter has already almost 1e6 diff pixels. The guided filters has 3x more diff pixels. I would say that we need to fix the non-guided part first and we will have figures for the guided filter part.

@jenshannoschwalm jenshannoschwalm deleted the ce_opencl branch August 27, 2024 19:38
@jenshannoschwalm
Copy link
Collaborator Author

jenshannoschwalm commented Aug 28, 2024

I checked again the whole cl source and can't find an obvious principal problem.

The whole UCS color space handling in opencl is full of native functions. The other module I am aware of using that would be colorbalancergb making use of that. Do we have a test case with that module in UCS space? EDIT there is 93

@TurboGit
Copy link
Member

We have a test for color balance RGB in UCS mode, and the diff CPU vs GPU is ok (~32000 pixels):

$ ./run.sh 0093-colorbalancergb-ucs
Test 0093-colorbalancergb-ucs
      Image mire1.cr2
      CPU & GPU version differ by 32479 pixels
      CPU vs. GPU report :
      ----------------------------------
      Max dE                   : 1.64225
      Avg dE                   : 0.00498
      Std dE                   : 0.04929
      ----------------------------------
      Pixels below avg + 0 std : 98.84 %
      Pixels below avg + 1 std : 98.85 %
      Pixels below avg + 3 std : 98.90 %
      Pixels below avg + 6 std : 99.08 %
      Pixels below avg + 9 std : 99.52 %
      ----------------------------------
      Pixels above tolerance   : 0.00 %
 
      Expected CPU vs. current CPU report :
      ----------------------------------
      Max dE                   : 1.13108
      Avg dE                   : 0.00041
      Std dE                   : 0.01273
      ----------------------------------
      Pixels below avg + 0 std : 99.87 %
      Pixels below avg + 1 std : 99.87 %
      Pixels below avg + 3 std : 99.87 %
      Pixels below avg + 6 std : 99.87 %
      Pixels below avg + 9 std : 99.88 %
      ----------------------------------
      Pixels above tolerance   : 0.00 %
 
  OK

Total test 1

@TurboGit
Copy link
Member

The CPU vs GPU are the following:

0158-coloreq-no-guided-filter/diff-cl.png:
image

0159-coloreq-guided-filter/diff-cl.png:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OpenCL Related to darktable OpenCL code scope: image processing correcting pixels scope: performance doing everything the same but faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants