You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
other than device, nvme_path and pin_memory which are pretty obvious, the rest have super-terse descriptions and a user will have no idea how to configure those. Let's write a guide to how these values should be chosen.
I copied the descriptions and defaults that already exist and tried to ask the right questions, so if you could answer those I think that would be a great start.
Thank you!
Optimizer
buffer_count: default 4: Number of buffers in buffer pool for optimizer state offloading to NVMe. This should be at least the number of states maintained per parameter by the optimizer. For example, Adam optimizer has 4 states (parameter, gradient, momentum, and variance)
Q: why "at least" - is it more efficient to have it bigger?
Q: what's the impact on memory footprint (CPU/NVMe)
fast_init: default false. Enable fast optimizer initialization when offloading to NVMe.
Q: why is it false by default?
Param
buffer_count: default 5: Number of buffers in buffer pool for parameter offloading to NVMe.
Q: why 5, what are the correlations to other params?
buffer_size: default 1e8: Size of buffers in buffer pool for parameter offloading to NVMe.
Q: how do we get to this number and how it correlates with other config params?
Q: what's the impact on memory footprint (CPU/NVMe)
max_in_cpu: default 1e9: Number of parameter elements to maintain in CPU memory when offloading to NVMe is enabled
Q: how do we get to this number and how it correlates with other config params?
Q: what's the impact on memory footprint (CPU/NVMe)
The text was updated successfully, but these errors were encountered:
I am also looking for guidelines to set those parmas. They do seem to give meaningful impact on my server setup in terms of performance, but setting those values to high kills the whole system.
I am also looking for guidelines to set those parmas. They do seem to give meaningful impact on my server setup in terms of performance, but setting those values to high kills the whole system.
Same issue here. Even reading the paper doesn't help at all. Is there a documentation explaining what these params do yet?
#998 tackles the
aio
param section, but we still have no user guide for the new "offload_optimizer" and "offload_param" sections. We have:other than
device
,nvme_path
andpin_memory
which are pretty obvious, the rest have super-terse descriptions and a user will have no idea how to configure those. Let's write a guide to how these values should be chosen.I copied the descriptions and defaults that already exist and tried to ask the right questions, so if you could answer those I think that would be a great start.
Thank you!
Optimizer
buffer_count
: default4
: Number of buffers in buffer pool for optimizer state offloading to NVMe. This should be at least the number of states maintained per parameter by the optimizer. For example, Adam optimizer has 4 states (parameter, gradient, momentum, and variance)Q: why "at least" - is it more efficient to have it bigger?
Q: what's the impact on memory footprint (CPU/NVMe)
fast_init
: defaultfalse
. Enable fast optimizer initialization when offloading to NVMe.Q: why is it false by default?
Param
buffer_count
: default5
: Number of buffers in buffer pool for parameter offloading to NVMe.Q: why 5, what are the correlations to other params?
buffer_size
: default1e8
: Size of buffers in buffer pool for parameter offloading to NVMe.Q: how do we get to this number and how it correlates with other config params?
Q: what's the impact on memory footprint (CPU/NVMe)
max_in_cpu
: default1e9
: Number of parameter elements to maintain in CPU memory when offloading to NVMe is enabledQ: how do we get to this number and how it correlates with other config params?
Q: what's the impact on memory footprint (CPU/NVMe)
The text was updated successfully, but these errors were encountered: