Add timer tool to Profiler #40386

zhangting2020 · 2022-03-09T15:37:01Z

PR types

Function optimization

PR changes

Others

Describe

Add timer tool to Profiler

实现说明

计时事件的管理：使用stack，因为可能会出现嵌套计时的需求，比如在train的过程中调用eval，eval计时结束，又回到train，所以需要在当前记录的事件结束后，回到上一层事件
自动从训练的迭代中，去除eval过程的时间：考虑到当前模型库中，如果train的过程中，加入了eval，是在eval结束后，重置train的下一个step的start_time的。因此计时工具需要自动去除掉eval的时间，目前有2种机制可以保障：
- event切换时，重置start_time：
  - 适用于train的过程中，嵌套了eval，同时都用了profiler的计时功能。此类场景下，由于2个event都被管理，是能够通过event的切换，知道计时事件发生了改变。
  - 在eval事件结束时，切回train的时候，重置start_time，也就去除了eval的时间
- 每个iter开始时，判断当前事件是否需要计时，如果不需要计时，就暂停计时，直到状态恢复。
  - 适用于train的过程中嵌套了eval，但是eval不采用profiler计时（因为我们无法控制用户行为），此时计时事件只有train。即便进入了eval，由于计时的event并没有发生切换，第一种机制会失效。
  - 检查机制：通过以下函数，检查当前的reader，和当前事件的reader的dataset是否相同，来确认是否发生了train和eval的切换。如果发生了，则暂停train的计时。直到重新回到train事件，reader则会与当前事件的reader一致，则继续计时。

    def check_if_need_record(self, reader):
        if self.current_event is None:
            return
        if self.current_event.need_record:
            # set reader for the current event at the first iter
            if self.current_event.reader is None:
                self.current_event.reader = reader
            elif self.current_event.reader.__dict__[
                    '_dataset'] != reader.__dict__['_dataset']:
                # enter a new task but not calling beign() to record it.
                # we pause the timer until the end of new task, so that 
                # the cost of new task is not added to the current event.
                # eg. start evaluation in the traing task
                self.current_event.need_record = False
        else:
            # when the new task exits, continue timing for the current event.
            if self.current_event.reader.__dict__[
                    '_dataset'] == reader.__dict__['_dataset']:
                self.current_event.need_record = True
                self.hooks['timer_hook'].start_time = timeit.default_timer()

计时器的误差：需要注意的是，计时功能中并没有使用python的timer作为计时器，而是采用timerit，能自动根据不同平台选用高精度的计时器。

The standard time. time() function provides sub-second precision, though that precision varies by platform.
For Linux and Mac precision is +- 1 microsecond or 0.001 milliseconds. 
Python on Windows uses +- 16 milliseconds precision due to clock implementation problems due to process interrupts.

结果展示

profier.step(num_samples=None)，使用step/s表示速度

Iter 0:  reader_cost: 0.76443 s batch_cost: 0.97205 s ips: 1.029 steps/s
Iter 10:  reader_cost: 0.00013 s batch_cost: 0.00431 s ips: 232.195 steps/s
============================================Perf Summary============================================
Reader_ratio: 2.655%
Time unit: s, IPS unit: steps/s
|                 |       avg       |       max       |       min       |
|   reader_cost   |     0.00010     |     0.00012     |     0.00007     |
|    batch_cost   |     0.00392     |     0.00419     |     0.00315     |
|       ips       |    255.41523    |    317.35774    |    238.74872    |

profiler.step(num_samples=N)，默认使用samples/s表示速度

============================================Perf Summary============================================
Reader Ratio: 2.587%
Time Unit: s, IPS Unit: samples/s
|                 |       avg       |       max       |       min       |
|   reader_cost   |     0.00008     |     0.00010     |     0.00005     |
|    batch_cost   |     0.00305     |     0.00314     |     0.00248     |
|       ips       |    1441.33261   |    1615.92509   |    1432.74847   |

profiler.step(num_samples=N), profiler.step_info(unit='images')，使用用户指定的输入样本单位，即images/s表示速度

Iter 0:  reader_cost: 0.61120 s batch_cost: 0.72221 s ips: 5.539 images/s
Iter 10:  reader_cost: 0.00014 s batch_cost: 0.00439 s ips: 910.991 images/s
============================================Perf Summary============================================
Reader Ratio: 2.598%
Time Unit: s, IPS Unit: images/s
|                 |       avg       |       max       |       min       |
|   reader_cost   |     0.00011     |     0.00013     |     0.00008     |
|    batch_cost   |     0.00405     |     0.00431     |     0.00331     |
|       ips       |    1086.70453   |    1206.85083   |    927.93254    |

doc

paddle-bot-old · 2022-03-09T15:37:32Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Xreki · 2022-03-10T05:00:59Z

python/paddle/profiler/profiler.py

@@ -371,11 +383,14 @@ def stop(self):
            if self.on_trace_ready:
                self.on_trace_ready(self)

-    def step(self):
+    def step(self, num_samples=None):


是否需要支持允许指定单位？或者作为step_info的参数？

目前修改为在step_info中，通过unit参数指定样本单位，比如unit='images'，吞吐量则表示为images/s

paddle-bot-old · 2022-03-19T02:36:47Z

Sorry to inform you that 27f7dc2's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

rainyfly

LGTM

rainyfly · 2022-03-24T03:02:07Z

这个列表清单的时间是 average的，感觉用的字段可以表述清楚些，不然会以为是total的感觉

zhangting2020 · 2022-03-24T04:48:27Z

这个列表清单的时间是 average的，感觉用的字段可以表述清楚些，不然会以为是total的感觉

和@Xreki 讨论了下，这块step_indo，在API中已经说明了统计的是2次调用之间经过的迭代的均值，所以不再额外增加avg字段了。

Xreki · 2022-03-26T08:57:23Z

python/paddle/fluid/dataloader/dataloader_iter.py

@@ -256,6 +257,8 @@ def __next__(self):
            event_type=profiler.TracerEventType.Dataloader)
        trace_event.begin()
        try:
+            benchmark().check_if_need_record(self)
+            benchmark().before_reader()


check_if_need_record是否在before_reader中自动调用比较好？

因为check_if_need_record要接受一个reader参数，和hook的几个基本接口的参数列表不一样。这块为了保持接口的一致性，这个函数单独抽出来

Xreki · 2022-03-26T09:00:46Z

python/paddle/profiler/profiler.py

@@ -253,6 +254,8 @@ class Profiler:
            which means profiling range [start_batch, end_batch).
        on_trace_ready (callable): callable object, takes the Profiler object as parameter, which provides a way for users to do post-processing.
            This callable object will be called when ``scheduler`` returns ``ProfilerState.RECORD_AND_RETURN``.
+        timer_only (bool): If it is True, the cost of Dataloader and every step of the model will be count without profiling. Otherwise, the model will
+            be timed and profiled.


写下默认值。

Xreki · 2022-03-26T09:01:54Z

python/paddle/profiler/profiler.py

+                    p.step(num_samples=BATCH_SIZE)
+                    if i % 10 == 0:
+                        step_info = p.step_info(unit='images')
+                        print("Iter {}: {}".format(i, step_info))


用注释的方式写下输出。

Xreki · 2022-03-26T09:02:47Z

python/paddle/profiler/profiler.py

+                simple_net = SimpleNet()
+                opt = paddle.optimizer.SGD(learning_rate=1e-3,
+                                           parameters=simple_net.parameters())
+                BATCH_SIZE = 4


python里面不怎么用这种命名风格吧？batch_size？

因为这里是一个常量名，python代码风格一般是用大写的

Xreki · 2022-03-26T09:18:01Z

python/paddle/profiler/timer.py

+            return None
+
+
+class Hook:


python类型最好都继承object？

Xreki · 2022-03-26T11:49:10Z

python/paddle/profiler/timer.py

+        self.print_stats('batch_cost', summary['batch_summary'])
+        self.print_stats('ips', summary['ips_summary'])
+
+    def print_stats(self, item, message_dict):


内部函数可以加_前缀

Xreki · 2022-03-26T11:50:55Z

python/paddle/profiler/timer.py

+        self.reader = None
+        self.need_record = True
+        self.speed_mode = 'samples/s'
+        self.speed_unit = 'samples/s'


为啥要设置speed_mode和speed_unit两个？

speed_mode是速度的表示模式，用来确定是采用吞吐量samples/s, 还是steps/s给出性能，它取决于num_samples的设置。

speed_unit是速度的单位，取决于用户在step_info调用时自定义的样本单位，这个值仅仅在speed_mode为samples/s时生效，用于产生images/s, words/s等不同的吞吐量单位

Xreki · 2022-03-26T11:56:29Z

python/paddle/profiler/timer.py

+            return
+        reader_cost = timeit.default_timer() - self.start_reader
+        benchmark.current_event.record_reader(reader_cost)
+        if benchmark.current_event.total_iters >= benchmark.current_event.skip_iter:


这个逻辑会不会放在Event里面比较好？

Xreki · 2022-03-26T11:56:39Z

python/paddle/profiler/timer.py

+            return
+        batch_cost = timeit.default_timer() - self.start_time
+        benchmark.current_event.record_batch(batch_cost, benchmark.num_samples)
+        if benchmark.current_event.total_iters >= benchmark.current_event.skip_iter:


这个逻辑会不会放在Event里面比较好？

Xreki · 2022-03-26T11:58:55Z

python/paddle/profiler/profiler.py

+                    #train()
+                    prof.step()
+                    if iter % 10 == 0:
+                        print(prof.step_info())


以注释的方式，写一下打印出来的内容

Xreki · 2022-03-28T09:44:58Z

python/paddle/profiler/profiler.py

+                # printed when the "step_info" is called at 10 iteration intervals.
+                # The values you get may be different from the following.
+                # Iter 0:  reader_cost: 0.51946 s batch_cost: 0.66077 s ips: 6.054 images/s
+                # Iter 10:  reader_cost: 0.00014 s batch_cost: 0.00441 s ips: 907.009 images/s


每个iter的log放在L370之后，总的log放在L371之后，表明哪个函数产生哪样的log。

Xreki · 2022-03-28T09:50:12Z

python/paddle/profiler/profiler.py

+                        print("Iter {}: {}".format(iter, prof.step_info()))
+                prof.stop()
+
+                # The example does not call the DataLoader, so there is no "reader_cost".


Xreki

LGTM

TCChenlong · 2022-03-28T11:50:39Z

python/paddle/profiler/profiler.py

+
+        Returns:
+            string: A string representing the statistic.
+        Examples:


Please add a black line before Examples

TCChenlong · 2022-03-28T11:51:26Z

python/paddle/profiler/profiler.py

+
+                import paddle
+                import paddle.profiler as profiler
+                import numpy as np


Please use Paddle's API create Tensor instead of numpy

TCChenlong

LGTM
TODO：Fix docs

Xreki

LGTM

TCChenlong

LGTM for API docs

Xreki reviewed Mar 10, 2022

View reviewed changes

zhangting2020 force-pushed the benchmark branch from 27f7dc2 to e5367af Compare March 23, 2022 09:29

zhangting2020 requested a review from rainyfly March 23, 2022 10:34

rainyfly previously approved these changes Mar 24, 2022

View reviewed changes

zhangting2020 dismissed rainyfly’s stale review via 1b01098 March 24, 2022 04:08

zhangting2020 force-pushed the benchmark branch from 1b01098 to 3c45e1c Compare March 24, 2022 11:48

zhangting2020 force-pushed the benchmark branch from 275ba57 to 51088e9 Compare March 25, 2022 03:42

Xreki reviewed Mar 26, 2022

View reviewed changes

zhangting2020 added 11 commits March 28, 2022 09:12

add benchmark tools

f83d4e5

add event

15834d0

add timer tools to profiler

cee3909

remove utils.benchmark

2646a3a

add timer_only arg for Profiler and unittests

ec58a87

polish doc

742d853

add unit args for step_info

ac0fcd5

reader is changed when an epoch end, so fix the check_if_need_record

1e9b0ff

add code examples

3a2f814

fix code example

4830f9b

fix code format error

57aafa9

zhangting2020 added 2 commits March 28, 2022 09:15

fix conflict

ba6c6e6

fix get_summary

f6fc880

zhangting2020 force-pushed the benchmark branch 3 times, most recently from d254a09 to 1ce7507 Compare March 28, 2022 09:26

Xreki reviewed Mar 28, 2022

View reviewed changes

polish the code

99661bd

zhangting2020 force-pushed the benchmark branch from 1ce7507 to 99661bd Compare March 28, 2022 10:29

zhangting2020 requested review from TCChenlong and lanxianghit March 28, 2022 10:38

Xreki previously approved these changes Mar 28, 2022

View reviewed changes

TCChenlong reviewed Mar 28, 2022

View reviewed changes

TCChenlong previously approved these changes Mar 28, 2022

View reviewed changes

zhangting2020 dismissed stale reviews from TCChenlong and Xreki via f0f1cdd March 28, 2022 12:25

zhangting2020 force-pushed the benchmark branch from f0f1cdd to 3fb9373 Compare March 28, 2022 12:27

zhangting2020 force-pushed the benchmark branch from 3fb9373 to 5cc1bcd Compare March 28, 2022 15:40

fix doc and code format

81ac3a6

zhangting2020 force-pushed the benchmark branch from 5cc1bcd to 81ac3a6 Compare March 29, 2022 01:50

Xreki approved these changes Mar 29, 2022

View reviewed changes

TCChenlong approved these changes Mar 29, 2022

View reviewed changes

lanxianghit approved these changes Mar 29, 2022

View reviewed changes

zhangting2020 merged commit 83efeea into PaddlePaddle:develop Mar 30, 2022

Add timer tool to Profiler #40386

Add timer tool to Profiler #40386

Conversation

zhangting2020 commented Mar 9, 2022 • edited Loading

PR types

PR changes

Describe

实现说明

结果展示

doc

paddle-bot-old bot commented Mar 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paddle-bot-old bot commented Mar 19, 2022

rainyfly left a comment

Choose a reason for hiding this comment

rainyfly commented Mar 24, 2022

zhangting2020 commented Mar 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

zhangting2020 commented Mar 9, 2022 •

edited

Loading