Integrate libtuv thread pool to eliminate thread creation overhead #93

marktwtn · 2019-01-30T23:15:57Z

To reduce the overhead of creating and eliminating the threads repeatedly,
we integrate the thread pool of libtuv with git submodule.
The pthread-related functions and data types are replaced with the corresonding
ones of libtuv.
The compilation of libtuv library is written in the file mk/submodule.mk.

Experiment:
Call clock_gettime() right before and after the functions for getting the thread.
The functions are pthread_create() (without thread pool)
and uv_queue_work() (with thread pool).
Use test-multi-pow.py as testcase since it initializes and destroys dcurl only once and
does the PoW multiple times, like what IRI does.
The experiment result shows the time of getting each thread
and the thread number of a PoW execution is 7.

Experiment result (unit: second):
Without thread pool
thread0: 0.000028384
thread1: 0.000025127
thread2: 0.000024748
thread3: 0.000023925
thread4: 0.000024126
thread5: 0.000025328
thread6: 0.000052900
thread0: 0.000049344
thread1: 0.000039575
thread2: 0.000036720
thread3: 0.000036249
thread4: 0.000034606
thread5: 0.000034676
thread6: 0.000033444

With thread pool
thread0: 0.000124327
thread1: 0.000002084
thread2: 0.000001052
thread3: 0.000000150
thread4: 0.000000121
thread5: 0.000000080
thread6: 0.000000090
thread0: 0.000000291
thread1: 0.000000080
thread2: 0.000000050
thread3: 0.000000050
thread4: 0.000000050
thread5: 0.000000060
thread6: 0.000000050

The first consumed time of getting the thread from thread pool is longer
since it is in charge of preallocating and initalizing the threads.

Close #58.

jserv · 2019-01-31T01:27:33Z

The commit messages shall contain relevant experiment results for reference purpose.

README.md

Makefile

jserv · 2019-01-31T02:14:56Z

For performance measurement, you should take C++11 threads, affinity and hyperthreading into considerations. Thread affinity might differentiate dramatically.

src/pow_c.c

jserv · 2019-01-31T02:22:55Z

How about C and OpenCL implementations?

marktwtn · 2019-02-11T10:06:08Z

The commit messages shall contain relevant experiment results for reference purpose.

The experiment result has been added/updated to the git commit and the first comment of this issue.

For performance measurement, you should take C++11 threads, affinity and hyperthreading into considerations. Thread affinity might differentiate dramatically.

I think this can be opened as another issue?

How about C and OpenCL implementations?

C implementation does include thread pool of libtuv.
OpenCL implementation seems nothing to do with the thread?

jserv · 2019-02-11T10:37:43Z

For performance measurement, you should take C++11 threads, affinity and hyperthreading into considerations. Thread affinity might differentiate dramatically.

I think this can be opened as another issue?

Yes, please do. The skeleton implementation looks like the following:

static int num_processors;

int main(int argc, char *argv[]) {
    ...
    num_processors = sysconf(_SC_NPROCESSORS_CONF);
    ...
}

#include <sched.h>
static inline void drop_policy(void) {
    struct sched_param param = { .sched_priority = 0;  };
    sched_setscheduler(0, SCHED_OTHER, &param);
}

static inline void affine_to_cpu(int id, int cpu) {
    cpu_set_t set;
    CPU_ZERO(&set);
    CPU_SET(cpu, &set);
    sched_setaffinity(0, sizeof(set), &set);
}

static void *worker_thread(void *userdata) {
    int thread_id = ((thread_info *) userdata)->id;
    ...
    /* Set worker threads to nice 19 and then preferentially to SCHED_IDLE
     * and if that fails, then SCHED_BATCH. No need for this to be an
     * error if it fails.
     */
    if (!geteuid())
        setpriority(PRIO_PROCESS, 0, -14);
    drop_policy();

    /* Cpu affinity only makes sense if the number of threads is a multiple
     * of the number of CPUs.
     */
    affine_to_cpu(thread_id, thread_id % num_processors);
    ...
}

mk/submodule.mk

jserv · 2019-02-11T10:42:42Z

OpenCL implementation seems nothing to do with the thread?

(off-topic) Is pthread_mutex_lock necessary in OpenCL backend? Can we simply synchronize at a barrier?

jserv · 2019-02-11T11:16:04Z

Experiment result shall come with the listing of hardware configurations for reference purpose.

jserv · 2019-02-11T11:51:39Z

Since this pull request dramatically changes the flow of execution, there should be a dedicated note briefing fundamental designs in directory docs/. Something like docs/threading-model.md would be nice where we can discuss thread pool, SMP affinity, load balancing, etc.

jserv · 2019-02-11T16:54:56Z

Rebasing is required due to recent document re-organization.

mk/submodule.mk

To reduce the overhead of creating and eliminating the threads repeatedly, we integrate the thread pool of libtuv with git submodule. The pthread-related functions and data types are replaced with the corresonding ones of libtuv. The compilation of libtuv library is written in the file mk/submodule.mk. Experiment: Call clock_gettime() right before and after the functions for getting the thread. The functions are pthread_create() (without thread pool) and uv_queue_work() (with thread pool). Use test-multi-pow.py as testcase since it initializes and destroys dcurl only once and does the PoW multiple times, like what IRI does. The experiment result shows the time of getting each thread and the thread number of a PoW execution is 7. Hardware information: architecure - x86_64 CPU - AMD Ryzen 5 2400G (4 cores/8 threads) Experiment result (unit: second): Without thread pool thread0: 0.000028384 thread1: 0.000025127 thread2: 0.000024748 thread3: 0.000023925 thread4: 0.000024126 thread5: 0.000025328 thread6: 0.000052900 thread0: 0.000049344 thread1: 0.000039575 thread2: 0.000036720 thread3: 0.000036249 thread4: 0.000034606 thread5: 0.000034676 thread6: 0.000033444 With thread pool thread0: 0.000124327 thread1: 0.000002084 thread2: 0.000001052 thread3: 0.000000150 thread4: 0.000000121 thread5: 0.000000080 thread6: 0.000000090 thread0: 0.000000291 thread1: 0.000000080 thread2: 0.000000050 thread3: 0.000000050 thread4: 0.000000050 thread5: 0.000000060 thread6: 0.000000050 The first consumed time of getting the thread from thread pool is longer since it is in charge of preallocating and initalizing the threads. Close DLTcollab#58.

jserv · 2019-02-12T06:24:38Z

mk/submodule.mk

@@ -0,0 +1,21 @@
+# Copy from the Makefile of libtuv to support different platforms
+UNAME_M := $(shell uname -m)
+UNAME_S := $(shell uname -s)


Add FIXME to mention the limitation of supported operating system listing.

Umm......
I'm not sure what you expect to see.
Like listing the operating system that dcurl supports but libtuv does not or vice versa?

marktwtn · 2019-02-12T06:25:15Z

Experiment result shall come with the listing of hardware configurations for reference purpose.

I have added the architecture and CPU information in the commit message.

Since this pull request dramatically changes the flow of execution, there should be a dedicated note briefing fundamental designs in directory docs/. Something like docs/threading-model.md would be nice where we can discuss thread pool, SMP affinity, load balancing, etc.

I will record it as a TO-DO list.

Rebasing is required due to recent document re-organization.

This has been done without any problem.

marktwtn · 2019-02-12T06:40:13Z

OpenCL implementation seems nothing to do with the thread?

(off-topic) Is pthread_mutex_lock necessary in OpenCL backend? Can we simply synchronize at a barrier?

It is necessary.
It makes sure that we can select the right GPU hardware if we have more than one GPU device.

jserv reviewed Jan 31, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

wusyong reviewed Jan 31, 2019

View reviewed changes

Makefile Show resolved Hide resolved

jserv reviewed Jan 31, 2019

View reviewed changes

src/pow_c.c Outdated Show resolved Hide resolved

jserv changed the title ~~Integrate libtuv thread pool into dcurl~~ Integrate libtuv thread pool to eliminate thread creation overhead Jan 31, 2019

wusyong added the spring-201902 label Feb 11, 2019

marktwtn force-pushed the libtuv-thread-pool-integration branch from 90e6dc3 to 97613e3 Compare February 11, 2019 09:52

jserv reviewed Feb 11, 2019

View reviewed changes

mk/submodule.mk Outdated Show resolved Hide resolved

jserv assigned marktwtn Feb 11, 2019

wusyong added this to the sprint-201902 milestone Feb 11, 2019

wusyong removed the sprint-201902 label Feb 11, 2019

jserv requested a review from ajblane February 11, 2019 11:24

jserv reviewed Feb 11, 2019

View reviewed changes

mk/submodule.mk Outdated Show resolved Hide resolved

marktwtn added 2 commits February 12, 2019 14:16

Make libtuv library path portable

728aa2a

marktwtn force-pushed the libtuv-thread-pool-integration branch from 97613e3 to 728aa2a Compare February 12, 2019 06:19

jserv reviewed Feb 12, 2019

View reviewed changes

jserv merged commit c5147ab into DLTcollab:dev Feb 12, 2019

marktwtn mentioned this pull request Feb 13, 2019

Investigate thread pool implementation and record the design into a document #102

Closed

marktwtn deleted the libtuv-thread-pool-integration branch February 14, 2019 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate libtuv thread pool to eliminate thread creation overhead #93

Integrate libtuv thread pool to eliminate thread creation overhead #93

marktwtn commented Jan 30, 2019 •

edited

Loading

jserv commented Jan 31, 2019

jserv commented Jan 31, 2019

jserv commented Jan 31, 2019

marktwtn commented Feb 11, 2019 •

edited

Loading

jserv commented Feb 11, 2019

jserv commented Feb 11, 2019

jserv commented Feb 11, 2019

jserv commented Feb 11, 2019

jserv commented Feb 11, 2019

jserv Feb 12, 2019

marktwtn Feb 12, 2019

marktwtn commented Feb 12, 2019

marktwtn commented Feb 12, 2019

Integrate libtuv thread pool to eliminate thread creation overhead #93

Integrate libtuv thread pool to eliminate thread creation overhead #93

Conversation

marktwtn commented Jan 30, 2019 • edited Loading

jserv commented Jan 31, 2019

jserv commented Jan 31, 2019

jserv commented Jan 31, 2019

marktwtn commented Feb 11, 2019 • edited Loading

jserv commented Feb 11, 2019

jserv commented Feb 11, 2019

jserv commented Feb 11, 2019

jserv commented Feb 11, 2019

jserv commented Feb 11, 2019

jserv Feb 12, 2019

Choose a reason for hiding this comment

marktwtn Feb 12, 2019

Choose a reason for hiding this comment

marktwtn commented Feb 12, 2019

marktwtn commented Feb 12, 2019

marktwtn commented Jan 30, 2019 •

edited

Loading

marktwtn commented Feb 11, 2019 •

edited

Loading