Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimization] Using multiple workers #10

Closed
MrinalJain17 opened this issue Jul 26, 2018 · 8 comments
Closed

[Optimization] Using multiple workers #10

MrinalJain17 opened this issue Jul 26, 2018 · 8 comments
Labels
enhancement New feature or request

Comments

@MrinalJain17
Copy link
Owner

Figure out techniques to optimize the video reading and processing time.

@MrinalJain17 MrinalJain17 added enhancement New feature or request help wanted Extra attention is needed low priority labels Jul 26, 2018
@MrinalJain17 MrinalJain17 added long-term Feature to be implemented/improved eventually by making small (but significant) changes. and removed low priority labels Sep 20, 2018
@erip
Copy link

erip commented Oct 8, 2018

One proposal (that is admittedly difficult) would be to parallelize Videos.read across workers. The trick is to ensure that the repeatability afforded by the random_state is guaranteed irrespective of the number of workers. Does this seem possible, @MrinalJain17?

@MrinalJain17
Copy link
Owner Author

MrinalJain17 commented Oct 8, 2018

Hi @erip

Your idea seems excellent, and it could indeed offer significant speed-ups. I've never worked with the paradigm of multi-processing in my mind and therefore overlooked its potential benefits.

Now, I believe that multi-processing is a better option than multi-threading because:

  1. The task here is CPU bound - which would prevent multi-threading from providing any major performance boosts.
  2. Maintaining code that is multi-threaded is quite complex and prone to bugs.

Although, I do not have much knowledge about the technicalities of multi-processing and multi-threading, and my reasons are based on some of the blog posts and videos that I read/saw. Therefore I'll have to study a bit more about the caveats of integrating this functionality.

Coming on to the implementation, I am hopeful that it would not be much complicated, thanks to Python's built-in multiprocessing and concurrent modules that provide a reasonably simple API.

Also, random_state is only used when the mode of selection of the frames is "random", and the indices of the required frames are produced before-hand, like any other mode. It should therefore not be affected by the number of workers.

Thanks for your suggestion. 😃 Once I figure out the way to implement it, I'll post some tests here demonstrating the speed-ups that we may (or may not) achieve.

TODO

  • Support for parallelizing Videos.read()

@erip
Copy link

erip commented Oct 8, 2018

@MrinalJain17, you might take some inspiration from this question. Caveat: it's both my question and my answer. 😅

@MrinalJain17
Copy link
Owner Author

@erip , I've found a slightly different way to modify the Videos.read() function to support multiple workers.

from multiprocessing import Pool
import numpy as np
from tqdm import tqdm
from mydia import Videos

path = ["./sample_video/bigbuckbunny.mp4" for i in range(5)]
reader = Videos()

def read(path, workers=1, chunksize=1):
    list_of_videos = []
    with Pool(processes=workers) as pool:
        with tqdm(total=len(path), unit="videos") as pbar:
            for i, result in enumerate(pool.imap(reader._read_video, path, chunksize=chunksize)):
                list_of_videos.append(result)
                pbar.update()
    pool.join()
    video_tensor = np.vstack(list_of_videos)
    
    return video_tensor

video = read(path, workers=4)

This method is working, but it needs to be tweaked to get the expected speed up.

Also, I went through your question on StackOverflow - the suggestion to use r=np.random.RandomState() followed by r.choice(..) is the correct way to seed the generator and will be fixed.

Although, there was something that I was curious about. In the question, you mention "Given n videos and a random seed, r, how can I ensure that the extracted frames for each video is the same regardless of the number of workers?"

The randomly selected frames depend on the total number of frames in the video. For instance, let's say we have 2 videos - vid_1 having 100 frames and vid_2 having 50 frames. Now, selecting 20 frames (at random) is equivalent to np.random.choice(total_frames, 20, replace=false). Therefore, the frame indices would be different for the two videos. However, rereading the same video will give you the same result because of random_state.

Before implementing this functionality, I'll have to test the performance gain that this would provide.

@erip
Copy link

erip commented Oct 9, 2018

I have the benefit of knowing that my videos are all the same length, all have the same frame rate, and I want to extract the same number of frames from all of them.

@MrinalJain17
Copy link
Owner Author

@erip , given below are the results showing the performance gain/loss.

Note

  1. The same video was used multiple times.
  2. The value of target_size and num_frames are chosen arbitrarily and have no significance.
  3. The code was executed on a c5.4xlarge amazon ec2 instance (Ubuntu 18.04, Python 3.6)

20 videos, target_size=(224, 224), num_frames=24

No multiprocessing: 4.02 seconds

No. of processes Time (in seconds)
1 4.30
2 2.53
4 1.74
8 1.39
16 1.34

100 videos, target_size=(224, 224), num_frames=24

No multiprocessing: 20.8 seconds

No. of processes Time (in seconds)
1 21.30
2 12.40
4 8.01
8 5.87
16 5.45

500 videos, target_size=(224, 224), num_frames=24, to_gray=True

No multiprocessing: 107 seconds

No. of processes Time (in seconds)
1 98
2 56.9
4 34.4
8 27
16 25.3

When the number of processes=16, we are effectively providing 16x more compute power and therefore, one might expect ~16x speed-up - and that's not possible. It can be observed that the gain in performance is ~4x. This is because internally when no multi-processing is used, the videos are read individually using list comprehensions. They are highly optimized and are responsible for a significant performance boost. However, when using multi-processing, we fall back to the for loop, and lose some performance.

Also, the communication between processes is responsible for some overhead, and it's evident from the cases where 8 and 16 processes are used.

Once this is implemented, it will guarantee repeatability since the usage of random_state has been fixed.

@erip
Copy link

erip commented Oct 11, 2018

That's an awesome speedup! 👍 Really nicely done, @MrinalJain17

@MrinalJain17 MrinalJain17 changed the title Optimization in reading videos Optimization in reading videos: Using multiple workers Oct 14, 2018
@MrinalJain17 MrinalJain17 removed help wanted Extra attention is needed long-term Feature to be implemented/improved eventually by making small (but significant) changes. labels Oct 14, 2018
@MrinalJain17
Copy link
Owner Author

Support for multiple workers has been implemented in version 2.2.0. See release notes for more details.

There will always be scope for optimization, and therefore I have opened a new issue (#17 ) for other innovative ideas to speed up the process.

Appreciate your contribution @erip . 👍 😄

@MrinalJain17 MrinalJain17 changed the title Optimization in reading videos: Using multiple workers [Optimization] Using multiple workers Oct 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants