[Optimization] Using multiple workers #10

MrinalJain17 · 2018-07-26T19:35:22Z

Figure out techniques to optimize the video reading and processing time.

erip · 2018-10-08T18:33:22Z

One proposal (that is admittedly difficult) would be to parallelize Videos.read across workers. The trick is to ensure that the repeatability afforded by the random_state is guaranteed irrespective of the number of workers. Does this seem possible, @MrinalJain17?

MrinalJain17 · 2018-10-08T21:17:18Z

Hi @erip

Your idea seems excellent, and it could indeed offer significant speed-ups. I've never worked with the paradigm of multi-processing in my mind and therefore overlooked its potential benefits.

Now, I believe that multi-processing is a better option than multi-threading because:

The task here is CPU bound - which would prevent multi-threading from providing any major performance boosts.
Maintaining code that is multi-threaded is quite complex and prone to bugs.

Although, I do not have much knowledge about the technicalities of multi-processing and multi-threading, and my reasons are based on some of the blog posts and videos that I read/saw. Therefore I'll have to study a bit more about the caveats of integrating this functionality.

Coming on to the implementation, I am hopeful that it would not be much complicated, thanks to Python's built-in multiprocessing and concurrent modules that provide a reasonably simple API.

Also, random_state is only used when the mode of selection of the frames is "random", and the indices of the required frames are produced before-hand, like any other mode. It should therefore not be affected by the number of workers.

Thanks for your suggestion. 😃 Once I figure out the way to implement it, I'll post some tests here demonstrating the speed-ups that we may (or may not) achieve.

TODO

Support for parallelizing Videos.read()

erip · 2018-10-08T23:26:41Z

@MrinalJain17, you might take some inspiration from this question. Caveat: it's both my question and my answer. 😅

MrinalJain17 · 2018-10-09T20:06:24Z

@erip , I've found a slightly different way to modify the Videos.read() function to support multiple workers.

from multiprocessing import Pool
import numpy as np
from tqdm import tqdm
from mydia import Videos

path = ["./sample_video/bigbuckbunny.mp4" for i in range(5)]
reader = Videos()

def read(path, workers=1, chunksize=1):
    list_of_videos = []
    with Pool(processes=workers) as pool:
        with tqdm(total=len(path), unit="videos") as pbar:
            for i, result in enumerate(pool.imap(reader._read_video, path, chunksize=chunksize)):
                list_of_videos.append(result)
                pbar.update()
    pool.join()
    video_tensor = np.vstack(list_of_videos)
    
    return video_tensor

video = read(path, workers=4)

This method is working, but it needs to be tweaked to get the expected speed up.

Also, I went through your question on StackOverflow - the suggestion to use r=np.random.RandomState() followed by r.choice(..) is the correct way to seed the generator and will be fixed.

Although, there was something that I was curious about. In the question, you mention "Given n videos and a random seed, r, how can I ensure that the extracted frames for each video is the same regardless of the number of workers?"

The randomly selected frames depend on the total number of frames in the video. For instance, let's say we have 2 videos - vid_1 having 100 frames and vid_2 having 50 frames. Now, selecting 20 frames (at random) is equivalent to np.random.choice(total_frames, 20, replace=false). Therefore, the frame indices would be different for the two videos. However, rereading the same video will give you the same result because of random_state.

Before implementing this functionality, I'll have to test the performance gain that this would provide.

erip · 2018-10-09T21:30:29Z

I have the benefit of knowing that my videos are all the same length, all have the same frame rate, and I want to extract the same number of frames from all of them.

MrinalJain17 · 2018-10-11T01:55:48Z

@erip , given below are the results showing the performance gain/loss.

Note

The same video was used multiple times.
The value of target_size and num_frames are chosen arbitrarily and have no significance.
The code was executed on a c5.4xlarge amazon ec2 instance (Ubuntu 18.04, Python 3.6)

20 videos, `target_size`=(224, 224), `num_frames`=24

No multiprocessing: 4.02 seconds

No. of processes	Time (in seconds)
1	4.30
2	2.53
4	1.74
8	1.39
16	1.34

100 videos, `target_size`=(224, 224), `num_frames`=24

No multiprocessing: 20.8 seconds

No. of processes	Time (in seconds)
1	21.30
2	12.40
4	8.01
8	5.87
16	5.45

500 videos, `target_size`=(224, 224), `num_frames`=24, `to_gray`=True

No multiprocessing: 107 seconds

No. of processes	Time (in seconds)
1	98
2	56.9
4	34.4
8	27
16	25.3

When the number of processes=16, we are effectively providing 16x more compute power and therefore, one might expect ~16x speed-up - and that's not possible. It can be observed that the gain in performance is ~4x. This is because internally when no multi-processing is used, the videos are read individually using list comprehensions. They are highly optimized and are responsible for a significant performance boost. However, when using multi-processing, we fall back to the for loop, and lose some performance.

Also, the communication between processes is responsible for some overhead, and it's evident from the cases where 8 and 16 processes are used.

Once this is implemented, it will guarantee repeatability since the usage of random_state has been fixed.

erip · 2018-10-11T13:19:33Z

That's an awesome speedup! 👍 Really nicely done, @MrinalJain17

MrinalJain17 · 2018-10-14T22:55:19Z

Support for multiple workers has been implemented in version 2.2.0. See release notes for more details.

There will always be scope for optimization, and therefore I have opened a new issue (#17 ) for other innovative ideas to speed up the process.

Appreciate your contribution @erip . 👍 😄

MrinalJain17 mentioned this issue Jul 26, 2018

Optimization when reading frames. #9

Closed

MrinalJain17 added enhancement New feature or request help wanted Extra attention is needed low priority labels Jul 26, 2018

MrinalJain17 added this to To do in Improvements and Enhancements Jul 26, 2018

MrinalJain17 added long-term Feature to be implemented/improved eventually by making small (but significant) changes. and removed low priority labels Sep 20, 2018

MrinalJain17 removed this from To do in Improvements and Enhancements Sep 21, 2018

MrinalJain17 changed the title ~~Optimization in reading videos~~ Optimization in reading videos: Using multiple workers Oct 14, 2018

MrinalJain17 removed help wanted Extra attention is needed long-term Feature to be implemented/improved eventually by making small (but significant) changes. labels Oct 14, 2018

MrinalJain17 closed this as completed Oct 14, 2018

MrinalJain17 changed the title ~~Optimization in reading videos: Using multiple workers~~ [Optimization] Using multiple workers Oct 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization] Using multiple workers #10

[Optimization] Using multiple workers #10

MrinalJain17 commented Jul 26, 2018

erip commented Oct 8, 2018

MrinalJain17 commented Oct 8, 2018 •

edited

Loading

erip commented Oct 8, 2018

MrinalJain17 commented Oct 9, 2018

erip commented Oct 9, 2018

MrinalJain17 commented Oct 11, 2018

erip commented Oct 11, 2018

MrinalJain17 commented Oct 14, 2018

[Optimization] Using multiple workers #10

[Optimization] Using multiple workers #10

Comments

MrinalJain17 commented Jul 26, 2018

erip commented Oct 8, 2018

MrinalJain17 commented Oct 8, 2018 • edited Loading

TODO

erip commented Oct 8, 2018

MrinalJain17 commented Oct 9, 2018

erip commented Oct 9, 2018

MrinalJain17 commented Oct 11, 2018

Note

20 videos, target_size=(224, 224), num_frames=24

100 videos, target_size=(224, 224), num_frames=24

500 videos, target_size=(224, 224), num_frames=24, to_gray=True

erip commented Oct 11, 2018

MrinalJain17 commented Oct 14, 2018

Support for multiple workers has been implemented in version 2.2.0. See release notes for more details.

MrinalJain17 commented Oct 8, 2018 •

edited

Loading

20 videos, `target_size`=(224, 224), `num_frames`=24

100 videos, `target_size`=(224, 224), `num_frames`=24

500 videos, `target_size`=(224, 224), `num_frames`=24, `to_gray`=True