Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timer tool to Profiler #40386

Merged
merged 15 commits into from
Mar 30, 2022
Merged

Conversation

zhangting2020
Copy link
Contributor

@zhangting2020 zhangting2020 commented Mar 9, 2022

PR types

Function optimization

PR changes

Others

Describe

Add timer tool to Profiler

实现说明

  • 计时事件的管理:使用stack,因为可能会出现嵌套计时的需求,比如在train的过程中调用eval,eval计时结束,又回到train,所以需要在当前记录的事件结束后,回到上一层事件
  • 自动从训练的迭代中,去除eval过程的时间:考虑到当前模型库中,如果train的过程中,加入了eval,是在eval结束后,重置train的下一个step的start_time的。因此计时工具需要自动去除掉eval的时间,目前有2种机制可以保障:
    • event切换时,重置start_time:
      • 适用于train的过程中,嵌套了eval,同时都用了profiler的计时功能。此类场景下,由于2个event都被管理,是能够通过event的切换,知道计时事件发生了改变。
      • 在eval事件结束时,切回train的时候,重置start_time,也就去除了eval的时间
    • 每个iter开始时,判断当前事件是否需要计时,如果不需要计时,就暂停计时,直到状态恢复。
      • 适用于train的过程中嵌套了eval,但是eval不采用profiler计时(因为我们无法控制用户行为),此时计时事件只有train。即便进入了eval,由于计时的event并没有发生切换,第一种机制会失效。
      • 检查机制:通过以下函数,检查当前的reader,和当前事件的reader的dataset是否相同,来确认是否发生了train和eval的切换。如果发生了,则暂停train的计时。直到重新回到train事件,reader则会与当前事件的reader一致,则继续计时。
    def check_if_need_record(self, reader):
        if self.current_event is None:
            return
        if self.current_event.need_record:
            # set reader for the current event at the first iter
            if self.current_event.reader is None:
                self.current_event.reader = reader
            elif self.current_event.reader.__dict__[
                    '_dataset'] != reader.__dict__['_dataset']:
                # enter a new task but not calling beign() to record it.
                # we pause the timer until the end of new task, so that 
                # the cost of new task is not added to the current event.
                # eg. start evaluation in the traing task
                self.current_event.need_record = False
        else:
            # when the new task exits, continue timing for the current event.
            if self.current_event.reader.__dict__[
                    '_dataset'] == reader.__dict__['_dataset']:
                self.current_event.need_record = True
                self.hooks['timer_hook'].start_time = timeit.default_timer()
  • 计时器的误差:需要注意的是,计时功能中并没有使用python的timer作为计时器,而是采用timerit,能自动根据不同平台选用高精度的计时器。
The standard time. time() function provides sub-second precision, though that precision varies by platform.
For Linux and Mac precision is +- 1 microsecond or 0.001 milliseconds. 
Python on Windows uses +- 16 milliseconds precision due to clock implementation problems due to process interrupts.

结果展示

  • profier.step(num_samples=None),使用step/s表示速度
Iter 0:  reader_cost: 0.76443 s batch_cost: 0.97205 s ips: 1.029 steps/s
Iter 10:  reader_cost: 0.00013 s batch_cost: 0.00431 s ips: 232.195 steps/s
============================================Perf Summary============================================
Reader_ratio: 2.655%
Time unit: s, IPS unit: steps/s
|                 |       avg       |       max       |       min       |
|   reader_cost   |     0.00010     |     0.00012     |     0.00007     |
|    batch_cost   |     0.00392     |     0.00419     |     0.00315     |
|       ips       |    255.41523    |    317.35774    |    238.74872    |
  • profiler.step(num_samples=N),默认使用samples/s表示速度
============================================Perf Summary============================================
Reader Ratio: 2.587%
Time Unit: s, IPS Unit: samples/s
|                 |       avg       |       max       |       min       |
|   reader_cost   |     0.00008     |     0.00010     |     0.00005     |
|    batch_cost   |     0.00305     |     0.00314     |     0.00248     |
|       ips       |    1441.33261   |    1615.92509   |    1432.74847   |
  • profiler.step(num_samples=N), profiler.step_info(unit='images'),使用用户指定的输入样本单位,即images/s表示速度
Iter 0:  reader_cost: 0.61120 s batch_cost: 0.72221 s ips: 5.539 images/s
Iter 10:  reader_cost: 0.00014 s batch_cost: 0.00439 s ips: 910.991 images/s
============================================Perf Summary============================================
Reader Ratio: 2.598%
Time Unit: s, IPS Unit: images/s
|                 |       avg       |       max       |       min       |
|   reader_cost   |     0.00011     |     0.00013     |     0.00008     |
|    batch_cost   |     0.00405     |     0.00431     |     0.00331     |
|       ips       |    1086.70453   |    1206.85083   |    927.93254    |

doc

screencapture-10-136-157-23-8090-documentation-docs-en-api-paddle-profiler-Profiler-en-html-2022-03-29-11_29_00

@paddle-bot-old
Copy link

paddle-bot-old bot commented Mar 9, 2022

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@@ -371,11 +383,14 @@ def stop(self):
if self.on_trace_ready:
self.on_trace_ready(self)

def step(self):
def step(self, num_samples=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否需要支持允许指定单位?或者作为step_info的参数?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前修改为在step_info中,通过unit参数指定样本单位,比如unit='images',吞吐量则表示为images/s

@paddle-bot-old
Copy link

Sorry to inform you that 27f7dc2's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

rainyfly
rainyfly previously approved these changes Mar 24, 2022
Copy link
Contributor

@rainyfly rainyfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rainyfly
Copy link
Contributor

image

这个列表清单的时间是 average的,感觉用的字段可以表述清楚些,不然会以为是total的感觉

@zhangting2020
Copy link
Contributor Author

image

这个列表清单的时间是 average的,感觉用的字段可以表述清楚些,不然会以为是total的感觉

@Xreki 讨论了下,这块step_indo,在API中已经说明了统计的是2次调用之间经过的迭代的均值,所以不再额外增加avg字段了。

@@ -256,6 +257,8 @@ def __next__(self):
event_type=profiler.TracerEventType.Dataloader)
trace_event.begin()
try:
benchmark().check_if_need_record(self)
benchmark().before_reader()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_if_need_record是否在before_reader中自动调用比较好?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为check_if_need_record要接受一个reader参数,和hook的几个基本接口的参数列表不一样。这块为了保持接口的一致性,这个函数单独抽出来

@@ -253,6 +254,8 @@ class Profiler:
which means profiling range [start_batch, end_batch).
on_trace_ready (callable): callable object, takes the Profiler object as parameter, which provides a way for users to do post-processing.
This callable object will be called when ``scheduler`` returns ``ProfilerState.RECORD_AND_RETURN``.
timer_only (bool): If it is True, the cost of Dataloader and every step of the model will be count without profiling. Otherwise, the model will
be timed and profiled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写下默认值。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

p.step(num_samples=BATCH_SIZE)
if i % 10 == 0:
step_info = p.step_info(unit='images')
print("Iter {}: {}".format(i, step_info))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用注释的方式写下输出。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

simple_net = SimpleNet()
opt = paddle.optimizer.SGD(learning_rate=1e-3,
parameters=simple_net.parameters())
BATCH_SIZE = 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python里面不怎么用这种命名风格吧?batch_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为这里是一个常量名,python代码风格一般是用大写的

return None


class Hook:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python类型最好都继承object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.print_stats('batch_cost', summary['batch_summary'])
self.print_stats('ips', summary['ips_summary'])

def print_stats(self, item, message_dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

内部函数可以加_前缀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.reader = None
self.need_record = True
self.speed_mode = 'samples/s'
self.speed_unit = 'samples/s'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥要设置speed_modespeed_unit两个?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speed_mode是速度的表示模式,用来确定是采用吞吐量samples/s, 还是steps/s给出性能,它取决于num_samples的设置。

speed_unit是速度的单位,取决于用户在step_info调用时自定义的样本单位,这个值仅仅在speed_mode为samples/s时生效,用于产生images/s, words/s等不同的吞吐量单位

return
reader_cost = timeit.default_timer() - self.start_reader
benchmark.current_event.record_reader(reader_cost)
if benchmark.current_event.total_iters >= benchmark.current_event.skip_iter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个逻辑会不会放在Event里面比较好?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return
batch_cost = timeit.default_timer() - self.start_time
benchmark.current_event.record_batch(batch_cost, benchmark.num_samples)
if benchmark.current_event.total_iters >= benchmark.current_event.skip_iter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个逻辑会不会放在Event里面比较好?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#train()
prof.step()
if iter % 10 == 0:
print(prof.step_info())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以注释的方式,写一下打印出来的内容

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zhangting2020 zhangting2020 force-pushed the benchmark branch 3 times, most recently from d254a09 to 1ce7507 Compare March 28, 2022 09:26
# printed when the "step_info" is called at 10 iteration intervals.
# The values you get may be different from the following.
# Iter 0: reader_cost: 0.51946 s batch_cost: 0.66077 s ips: 6.054 images/s
# Iter 10: reader_cost: 0.00014 s batch_cost: 0.00441 s ips: 907.009 images/s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个iter的log放在L370之后,总的log放在L371之后,表明哪个函数产生哪样的log。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

print("Iter {}: {}".format(iter, prof.step_info()))
prof.stop()

# The example does not call the DataLoader, so there is no "reader_cost".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Xreki
Xreki previously approved these changes Mar 28, 2022
Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


Returns:
string: A string representing the statistic.
Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a black line before Examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


import paddle
import paddle.profiler as profiler
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Paddle's API create Tensor instead of numpy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

TCChenlong
TCChenlong previously approved these changes Mar 28, 2022
Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
TODO:Fix docs

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for API docs

@zhangting2020 zhangting2020 merged commit 83efeea into PaddlePaddle:develop Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants