-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multithreading in the CPU mode of Solver::Solve #79
Comments
The codes do not have to be changed. Thanks to Michael Rutter, multi-threaded OpenBLAS package is available on all the versions of Ubuntu since Precise (12.04).
Benchmark results are demonstrated in the related issue: #16 |
ComputeUpdatedValue() is not a big issue when doing large networks. It is the ForwardBackward() function, and the individual layers that takes the most time. Thus, parallellizing it will not give us much gain. |
Within the ForwardBackward() computation, the convolutional layers are the ones which take most of the time (see #83) therefore parallelizing the loops there will be the most effective |
Cherry-pick batchnorm fixes
In each iteration of Solver::Solve, there are four chances to accelerate the computation.
The first opportunity is the most complex one since Net::ForwardBackward invokes the Forward and Backward of all the layers that comprise a net.
Dtype loss = net_->ForwardBackward(bottom_vec);
The second chance is more straightforward. An OpenMP directive is enough to parallelize the independent computation for each param_id.
ComputeUpdateValue();
The only extra trick that is needed to deal with the next occasion is to distinguish CPU and GPU mode.
net_->Update();
The last one involves a plain old OpenMP friendly nested for loop.
Test();
The text was updated successfully, but these errors were encountered: