Skip to content
This repository has been archived by the owner on Nov 28, 2020. It is now read-only.

Special Meeting - PGO deep dive #91

Closed
mhdawson opened this issue Jan 12, 2017 · 66 comments
Closed

Special Meeting - PGO deep dive #91

mhdawson opened this issue Jan 12, 2017 · 66 comments

Comments

@mhdawson
Copy link
Member

mhdawson commented Jan 12, 2017

Who

@nodejs/benchmarking

When

Wed Feb 15, 1PM PST

Where

link for participants: https://hangouts.google.com/hangouts/_/ytl/NH_6tyz8abYim69liT4bERGkALuos4J4U8Wl8QJDTBw=?eid=100598160817214911030

For those who just want to watch: http://youtu.be/pC0Boxl2EPA

youtube admin page: https://www.youtube.com/my_live_events?filter=scheduled

Agenda

Deep dive presentation from Kunal on PGO. He's done some investigation and will give us an update on what PGO is, how it works and results he's seen so far. He's looking at it based on initial interest shown in nodejs/node#1409.

@mhdawson
Copy link
Member Author

@bnoordhuis @rvagg you might be interested in this based on past interested in issue 1409

@bnoordhuis
Copy link
Member

I don't really have anything to contribute. The big blocker was (and presumably is) realistic benchmarks.

@mhdawson
Copy link
Member Author

@nodejs/benchmarking need more people to fill in the doodle with availability.

@mhdawson
Copy link
Member Author

@williamkapke can you add to general node calendar

@williamkapke
Copy link
Contributor

Added! 19:00 GMT on 2017-02-02

@mhdawson
Copy link
Member Author

mhdawson commented Feb 2, 2017

Meeting is live, just waiting for participants to join.

@gareth-ellis
Copy link
Member

Trying to join....

@mhdawson
Copy link
Member Author

mhdawson commented Feb 2, 2017

@kunalspathak you going to join ?

@mhdawson
Copy link
Member Author

mhdawson commented Feb 2, 2017

Our presenter could not make it so need to reschedule.

@kunalspathak could you send out a new doodle to pick a new time ?

@kunalspathak
Copy link
Member

@mhdawson , Sorry about that. For some reason, i didn't see any updates from this thread in my mail box so didn't realize the meeting was scheduled for today. I will schedule a meeting sometime next week. Sorry about that.

@kunalspathak
Copy link
Member

@mhdawson
Copy link
Member Author

mhdawson commented Feb 2, 2017

Did you turn on timezone support ? I can't tell what timezones are my local time (EST) or something else ?

@kunalspathak
Copy link
Member

kunalspathak commented Feb 2, 2017

I didn't updated timezone when i opened my account. I did it now and reconfigured the dates. Can you see the right timezone now?

@mhdawson
Copy link
Member Author

mhdawson commented Feb 3, 2017

I think you have to enable it as part of creating the doodle and I don't think you can do it after the effect.

The first time I see is 11 AM on Thursday but no info on timezone.

@kunalspathak
Copy link
Member

Can you try this?

http://doodle.com/poll/ukkknw3rb5d644xs

@gareth-ellis
Copy link
Member

I can see time zone on that :)

@mhdawson
Copy link
Member Author

mhdawson commented Feb 7, 2017

That worked for me as well. Filled in my availability.

@kunalspathak
Copy link
Member

@nodejs/benchmarking . We need more people to participate.

@ThePrimeagen
Copy link
Member

ThePrimeagen commented Feb 8, 2017 via email

@gareth-ellis
Copy link
Member

Have replied (again). Seems my first response was lost

@kunalspathak
Copy link
Member

@michaelbpaulson , looks like best time we can have this is on 2/10 11AM PST. Is it possible for you to join then?

@ThePrimeagen
Copy link
Member

ThePrimeagen commented Feb 9, 2017 via email

@kunalspathak
Copy link
Member

@mhdawson , I have closed the poll and scheduled the meeting for Feb 10th at 11 AM PST. Can you setup the hangout page and live streaming?

@mhdawson
Copy link
Member Author

mhdawson commented Feb 9, 2017

Updated hangout to reschedule for Feb 10 11AM. We should be good to go. If there are any problems starting the event/recording I'll post in this issue as we start the meeting.

@kunalspathak
Copy link
Member

Updated hangout to reschedule for Feb 10 11AM

I see in the issue comment above that it reflects Thursday Feb 9, 11AM EST. May be you want to correct the date/time?

@sathvikl
Copy link
Contributor

@kunalspathak I have a conflict at that time.

Below are the PGO instructions I tried on Ubuntu 14.10 (gcc (Ubuntu 4.9.1-16ubuntu6) 4.9.1) and
Ubuntu 16.04 (gcc (Ubuntu5.4.0-6ubuntu1~16.04.4) 5.4.0)

To build a node framework with PGO instrumented counters: apply this patch

diff --git a/common.gypi b/common.gypi

index 811a7b3..3747b8d 100644 
--- a/common.gypi 
+++ b/common.gypi 
@@ -253,8 +253,9 @@ 
              'ldflags': [ '-mx32' ], 
            }], 
            [ 'target_arch=="x64"', { 
-            'cflags': [ '-m64' ], 
-            'ldflags': [ '-m64' ], 
+            'cflags': [ '-m64','-fprofile-generate'], 
+            'ldflags': [ '-m64', '-lgcov' ], 
+            'libraries': ['-lgcov'] 
            }], 
            [ 'target_arch=="ppc" and OS!="aix"', { 
              'cflags': [ '-m32' ], 

This will create
node/out/Release/obj
node/out/Release/obj.host
node/out/Release/obj.target
obj.target will contain the .o and .gcda files

To use the profiled information and build an optimized binary for the workload apply the below patch

diff --git a/common.gypi b/common.gypi

index 811a7b3..3747b8d 100644 
--- a/common.gypi 
+++ b/common.gypi 
@@ -253,8 +253,9 @@ 
              'ldflags': [ '-mx32' ], 
            }], 
            [ 'target_arch=="x64"', { 
-            'cflags': [ '-m64' ], 
-            'ldflags': [ '-m64' ], 
+            'cflags': [ '-m64','-fprofile-use', '-fprofile-correction',  '-Wno-error=coverage-mismatch'], 
+            'ldflags': [ '-m64', '-lgcov' ], 
+            'libraries': ['-lgcov'] 
            }], 
            [ 'target_arch=="ppc" and OS!="aix"', { 
              'cflags': [ '-m32' ], 

profile-correction is needed since node has multiple threads, no-coverage-mismatch is needed currently since code relocations take place in v8.

@kunalspathak
Copy link
Member

Thanks @sathvikl for the instructions. I appreciate it.

@mhdawson
Copy link
Member Author

Hangouts is up and running, just waiting for presenter and others to join.

@mhdawson
Copy link
Member Author

Trying to see if there is a way to get a hold of @kunalspathak

@gareth-ellis
Copy link
Member

Which time zone is this supposed to be ? EST or PST? I understood to be PST so another 3 hours time?

@CurryKitten
Copy link
Contributor

@mhdawson Unfortunately, I can't make those times in the new doodle, so I'll watch later on YouTube

@kunalspathak
Copy link
Member

@mhdawson , can you schedule the hangout meeting for Feb 15th, 1 PM PST ?

@mhdawson
Copy link
Member Author

Ok meeting and info above updated to Feb 15th 1PM PST

@kunalspathak
Copy link
Member

@mhdawson , the title again says 1 PM EST. It should be 1 PM PST. 😃

@gareth-ellis
Copy link
Member

Updated it

@mhdawson
Copy link
Member Author

@gareth-ellis thanks.

@mhdawson
Copy link
Member Author

@mhdawson
Copy link
Member Author

New link for those that want to watch http://youtu.be/pC0Boxl2EPA

Old link was not giving option to record.

@mhdawson
Copy link
Member Author

Just waiting for Kunal and then we'll get started.

@kunalspathak
Copy link
Member

I am getting error when trying to access the hangout link.

@mhdawson
Copy link
Member Author

We posted a new link are you using that ? Gareth is in.

@kunalspathak
Copy link
Member

@mhdawson
Copy link
Member Author

My only other suggestion is to try different browser as people seem to have intermittent problems getting in

@mhdawson
Copy link
Member Author

Looks like the right link to me.

@kunalspathak
Copy link
Member

Errors on Edge, Chrome and Firefox.

@kunalspathak
Copy link
Member

error

@mhdawson
Copy link
Member Author

Do you have a different machine you can try, I know that has fixed it for some people

@kunalspathak
Copy link
Member

Let me try.

@gareth-ellis
Copy link
Member

That looks like the sort of error I've had in the past. I am however using chrome now (same machine that I previously had issue with)

@mhdawson
Copy link
Member Author

Ok starting up the meeting now.

@mhdawson
Copy link
Member Author

mhdawson commented Mar 1, 2017

Meeting was held closing.

@mhdawson mhdawson closed this as completed Mar 1, 2017
@kunalspathak
Copy link
Member

One of the follow-up from this meeting was to evaluate the impact of PGO on core benchmarks if we just use minimal training set. So i did an experiment of using Techempower and acme-air as training set and did a comparison of performance gain we get with PGO. Here are the results. To summarize, I do see similar perf win in core benchmarks.

@sathvikl
Copy link
Contributor

sathvikl commented Mar 8, 2017

@kunalspathak Thanks for posting.

Which compiler/version/OS was this on ?

When do you PGO are you seeing coverage errors, I'm working on addressing that.

Are you the PGO trained binary, trained/profiled with Techempower and acme-air, on all these benchmarks ? If you train the PGO with array.js (as an example) and run array.js benchmarks what improvement differences would you observe ?

@kunalspathak
Copy link
Member

Which compiler/version/OS was this on ?

I should have mentioned this. This is on Windows, VS2015, based on nodejs/node@b26a469

When do you PGO are you seeing coverage errors, I'm working on addressing that.

No. I didn't see any. What type of coverage errors do you mean?

Are you the PGO trained binary, trained/profiled with Techempower and acme-air, on all these benchmarks ?

The binaries were trained using Techempower and Acmeair. Once I produced the binaries, I ran no-pgo vs. pgo comparison on core benchmarks.

If you train the PGO with array.js (as an example) and run array.js benchmarks what improvement differences would you observe ?

I didn't try this specifically, but I am sure that this will definitely show the improvements.

@sathvikl
Copy link
Contributor

sathvikl commented Mar 8, 2017

With GCC 5.x I was getting coverage errors, meaning the code was relocated so I needed the -coverage-mismatch flag. I saw a bugzilla ticket where the same was fixed for mozilla engine.

This is great if PGO trained binaries can give overall 10% improvement for all frameworks.

I meant to explain that micro benchmarks executed by binaries trained by the same micro-bench would probably give an even higher improvement. So It would be good to know what the performance degradation is by using binary trained by Acme air and Techempower.

Either way, If there is no performance regression on any framework then the release binary versions of nodejs could possibly use PGO to train the binaries.

Does VS 2015 give a summary report of what changes were done as a result of Profile information? If you have it can you please share it.

@kunalspathak
Copy link
Member

Does VS 2015 give a summary report of what changes were done as a result of Profile information?

It does give summary of no. of functions optimized, etc, which I don't see useful except I feel good if more functions were optimized 😄

@gareth-ellis was also curious in PGO meeting if we can get actual function names that were optimized. I will follow up with VC++ team and get back to you guys on it.

 0 of 0 ( 0.0%) original invalid call sites were matched.
  0 new call sites were added.
  240 of 112751 (  0.21%) profiled functions will be compiled for speed, and the rest of the functions will be com
  piled for size
  544530 of 1575974 inline instances were from dead/cold paths
  112749 of 112751 functions (100.0%) were optimized using profile data, and the rest of the functions were optimi
  zed without using profile data
  7861923896240 of 7861923896240 instructions (100.0%) were optimized using profile data, and the rest of the inst
  ructions were optimized without using profile data

@kunalspathak
Copy link
Member

Alright, so regarding summary report, we can obtain below information:

  • List of functions inlined. However the list is gigantic. It is output as part of build process and when I dumped it into the file, the file size was approx. 550 MB. So don't think it will be super useful.
  • Summary of functions that were profiled and how hot they were during profiling.
  • Summary of functions that were not optimized using profile data and reason.
  • Names of functions that were never executed while recording the scenario.

@sathvikl
Copy link
Contributor

sathvikl commented Mar 9, 2017

From uarch analysis, I had found that approx. the number of instructions executed had reduced by 9-10% and the PGO speedup was around 10%.

I like that VS can give us this info, I was not able to extract it easily out of GCC. GCC just prints on a per function basis what optimizations were done, just the raw count and the same info can be obtained for the overall compilation.
What options need to be enabled in VS to get this information ?

It looks like 0.34% of functions were found to be inlined at dead/cold paths so I was thinking not inlining may have helped but 0.34% or whatever the contribution is from these functions seems to be low.

If you could share the exact functions through a one drive that would be very helpful.

@kunalspathak
Copy link
Member

  • List of functions inlined. However the list is gigantic. It is output as part of build process and when I dumped it into the file, the file size was approx. 550 MB. So don't think it will be super useful.

Passing linker flag /d2:-inlinelog gives that information

  • Summary of functions that were profiled and how hot they were during profiling.

pgomgr /summary <binary_name_prefix>.pgd

  • Summary of functions that were not optimized using profile data and reason.

Set environment variable set PGU_LOG=1 and it will dump <binary_name_suffix>.pgu.log next to the binary.

  • Names of functions that were never executed while recording the scenario.

pgomgr /detail /summary <binary_name_prefix>.pgd

I have placed all the files here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants