-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Optimization] Using multiple workers #10
Comments
One proposal (that is admittedly difficult) would be to parallelize |
Hi @erip Your idea seems excellent, and it could indeed offer significant speed-ups. I've never worked with the paradigm of multi-processing in my mind and therefore overlooked its potential benefits. Now, I believe that multi-processing is a better option than multi-threading because:
Although, I do not have much knowledge about the technicalities of multi-processing and multi-threading, and my reasons are based on some of the blog posts and videos that I read/saw. Therefore I'll have to study a bit more about the caveats of integrating this functionality. Coming on to the implementation, I am hopeful that it would not be much complicated, thanks to Python's built-in multiprocessing and concurrent modules that provide a reasonably simple API. Also, Thanks for your suggestion. 😃 Once I figure out the way to implement it, I'll post some tests here demonstrating the speed-ups that we may (or may not) achieve. TODO
|
@MrinalJain17, you might take some inspiration from this question. Caveat: it's both my question and my answer. 😅 |
@erip , I've found a slightly different way to modify the from multiprocessing import Pool
import numpy as np
from tqdm import tqdm
from mydia import Videos
path = ["./sample_video/bigbuckbunny.mp4" for i in range(5)]
reader = Videos()
def read(path, workers=1, chunksize=1):
list_of_videos = []
with Pool(processes=workers) as pool:
with tqdm(total=len(path), unit="videos") as pbar:
for i, result in enumerate(pool.imap(reader._read_video, path, chunksize=chunksize)):
list_of_videos.append(result)
pbar.update()
pool.join()
video_tensor = np.vstack(list_of_videos)
return video_tensor
video = read(path, workers=4) This method is working, but it needs to be tweaked to get the expected speed up. Also, I went through your question on StackOverflow - the suggestion to use Although, there was something that I was curious about. In the question, you mention "Given The randomly selected frames depend on the total number of frames in the video. For instance, let's say we have 2 videos - Before implementing this functionality, I'll have to test the performance gain that this would provide. |
I have the benefit of knowing that my videos are all the same length, all have the same frame rate, and I want to extract the same number of frames from all of them. |
@erip , given below are the results showing the performance gain/loss. Note
20 videos,
|
No. of processes | Time (in seconds) |
---|---|
1 | 4.30 |
2 | 2.53 |
4 | 1.74 |
8 | 1.39 |
16 | 1.34 |
100 videos, target_size
=(224, 224), num_frames
=24
No multiprocessing: 20.8 seconds
No. of processes | Time (in seconds) |
---|---|
1 | 21.30 |
2 | 12.40 |
4 | 8.01 |
8 | 5.87 |
16 | 5.45 |
500 videos, target_size
=(224, 224), num_frames
=24, to_gray
=True
No multiprocessing: 107 seconds
No. of processes | Time (in seconds) |
---|---|
1 | 98 |
2 | 56.9 |
4 | 34.4 |
8 | 27 |
16 | 25.3 |
When the number of processes=16, we are effectively providing 16x more compute power and therefore, one might expect ~16x speed-up - and that's not possible. It can be observed that the gain in performance is ~4x. This is because internally when no multi-processing is used, the videos are read individually using list comprehensions. They are highly optimized and are responsible for a significant performance boost. However, when using multi-processing, we fall back to the for
loop, and lose some performance.
Also, the communication between processes is responsible for some overhead, and it's evident from the cases where 8 and 16 processes are used.
Once this is implemented, it will guarantee repeatability since the usage of random_state
has been fixed.
That's an awesome speedup! 👍 Really nicely done, @MrinalJain17 |
Support for multiple workers has been implemented in version 2.2.0. See release notes for more details.There will always be scope for optimization, and therefore I have opened a new issue (#17 ) for other innovative ideas to speed up the process. Appreciate your contribution @erip . 👍 😄 |
Figure out techniques to optimize the video reading and processing time.
The text was updated successfully, but these errors were encountered: