Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise checking the node types allowed for each rule #6232

Merged

Conversation

carlos-granados
Copy link
Contributor

Modifies the PHP parser node traverser so that it can get a list of the visitors that apply to each node. When the visitors are Rector rules, we can compute which rules apply to each node class and cache this. By doing this we improve the performance of Rector by 20-25% according to my tests

To implement this optimisation we need to patch the PHP parser. In order to make this PR easy to review and test, I have added the patch to the code. If you agree that this code is fine I will create another PR in the rectorphp/vendor-patches repository with the patch and modify this PR to link to that patch instead of a local one

@TomasVotruba
Copy link
Member

Thanks for proposal! This would be better to have directly in php-parser, as all projects would benefit.

Could you share time measurement on your machine with before/after results?

@carlos-granados
Copy link
Contributor Author

A couple of time measurements in my machine:

  • Running rector on the rector-src project. Before ~35s After ~27s
  • Running rector on our company project. Before ~48s After ~35s

Regarding adding the code change to php-parser, I thought about this possibility but the fact is that for most cases this change would slow down the parser (not much) as it introduces an additional function call for each node traversed, so not sure it is appropriate. I can always create the PR and see what they think. And we can always add a patch ourselves and if the PR is accepted, remove the patch. What do you think?

@TomasVotruba
Copy link
Member

Thanks 👍

Could you share full time result, so we can see memory/user/sys times as well? It's criteria we use to decide.

E.g.

time vendor/bin/rector

vendor/bin/rector  0.64s user 0.28s system 44% cpu 2.108 total

@carlos-granados
Copy link
Contributor Author

Timings:

Running Rector on the rector-src project:
Before ./bin/rector --clear-cache 220.79s user 12.11s system 657% cpu 35.440 total
After ./bin/rector --clear-cache 168.53s user 11.80s system 638% cpu 28.236 total

Running Rector on our company project:
Before ../rector-src/bin/rector --clear-cache 323.44s user 7.41s system 687% cpu 48.130 total
After ../rector-src/bin/rector --clear-cache 236.92s user 6.64s system 678% cpu 35.903 total

I am running this on a MacBook Pro with an M1 chip and 16gb of RAM. Php is 8.2.18

@samsonasik
Copy link
Member

@carlos-granados I tested locally with only 8gb ram on m1 mac, and can see faster as well 👍

Before

time bin/rector process --clear-cache
 2258/2258 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%

 [OK] Rector is done!                                                                                                   

bin/rector process --clear-cache  234.69s user 11.00s system 711% cpu 34.540 total

After

time bin/rector process --clear-cache
 2258/2258 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%

 [OK] Rector is done!                                                                                                   
                                                                                                                        
bin/rector process --clear-cache  173.57s user 10.82s system 709% cpu 25.982 total

@carlos-granados
Copy link
Contributor Author

I added some tests for the getVisitorsForNode() function. I used the PrivatesAccessor tool to access the protected function, please let me know if this is OK

@carlos-granados
Copy link
Contributor Author

@samsonasik thanks! Should I go ahead and create a PR in rectorphp/vendor-patches with the patch?

@samsonasik
Copy link
Member

I will let @TomasVotruba decide on this first

@carlos-granados
Copy link
Contributor Author

I created the PR to add the patch to the vendor-patches repo rectorphp/vendor-patches#8. Please let me know once it has been merged so that I can modify this PR to use that patch instead of the local one

@TomasVotruba
Copy link
Member

@carlos-granados Patch merged 👍

@carlos-granados
Copy link
Contributor Author

@TomasVotruba I updated the PR with the patch from vendor-patches

@TomasVotruba
Copy link
Member

I thin this is ready to go.

Last but not least, I like to ask you @staabm if you have time. Could you run Blackfire before/after this PR? Just to see change in performnace/calls.

@staabm
Copy link
Contributor

staabm commented Aug 22, 2024

I am on vacation until start of next week

@carlos-granados
Copy link
Contributor Author

@staabm no hurry, enjoy your holidays!

@staabm
Copy link
Contributor

staabm commented Aug 27, 2024

Started some testing but got interrupted. Will continue later today and report back

Copy link
Contributor

@staabm staabm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm this is running consistently faster between 15% and 20% - tested across several codebases on windows 11 and latest macos

great job.

@staabm
Copy link
Contributor

staabm commented Aug 27, 2024

btw: since rector-src seem to have dropped the memory-peak information in the debug output some time ago I was not able to verify whether this PR introduces memory leak or consumes more ram

assert($visitor instanceof RectorInterface);
foreach ($visitor->getNodeTypes() as $nodeType) {
if (is_a($nodeClass, $nodeType, true)) {
$this->visitorsPerNodeClass[$nodeClass][] = $visitor;
Copy link
Contributor

@staabm staabm Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code assumes that every rules public function getNodeTypes() does only return a non-conditional static array.

something like (artificial example)

    /**
     * @return array<class-string<Node>>
     */
    public function getNodeTypes(): array
    {
        if (phpversion() >= "8") {
           return [New_::class];
        }
        return [ClassLike::class];
    }

would break because of the caching.

I don't think its a meaningful BC break, just want to point it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example in particular would not be a problem because phpversion() returns a single value each time Rector is run. The list of visitors per node type is not cached across runs. But I understand what you mean. In any case I can't think of any case where this function would not return a non-conditional static array

@TomasVotruba
Copy link
Member

@staabm Thank you for confirmation 👍

Let's give this a try. We need to test downgrade itself + downgraded version before release, so there is a chance this might get reverted.

Let's roll :)

@TomasVotruba TomasVotruba merged commit 8c94690 into rectorphp:main Sep 4, 2024
36 checks passed
@carlos-granados
Copy link
Contributor Author

@TomasVotruba thanks, let me know if I can help with the testing of the downgraded version in any way

@TomasVotruba
Copy link
Member

@carlos-granados The version is now published. Could you try rector/rector:dev-main and measure time/memory before after?

@staabm
Copy link
Contributor

staabm commented Sep 4, 2024

btw: I checked the PHPStan codebase today whether there is a similar problem/opportunity.

I think the problem is solved differently in there

it seems to work in a way which does not need patching nikic/php-parser and does not rely on inheritance, which decouples the inner working of PHPStan and php-parser internals.

maybe that something which can be taken as inspiration to solve this same problem in a more elegant way

@carlos-granados
Copy link
Contributor Author

Before this improvement:
vendor/bin/rector process --dry-run --clear-cache 488.44s user 12.92s system 540% cpu 1:32.76 total
After this improvement:
vendor/bin/rector process --dry-run --clear-cache 347.52s user 11.58s system 487% cpu 1:13.72 total

This is when running Rector on my company's code base, about 1K files and about 100K lines of code

@carlos-granados
Copy link
Contributor Author

carlos-granados commented Sep 4, 2024

btw: I checked the PHPStan codebase today whether there is a similar problem/opportunity.

I think the problem is solved differently in there

it seems to work in a way which does not need patching nikic/php-parser and does not rely on inheritance, which decouples the inner working of PHPStan and php-parser internals.

maybe that something which can be taken as inspiration to solve this same problem in a more elegant way

@staabm PHPStan is able to use this because they don't use the node traverser from the php parser, they use their own traverser and in this way they are able to filter the rules that need to be applied to each node in a way which is similar to what I have added.

@samsonasik
Copy link
Member

samsonasik commented Sep 4, 2024

I tested on a project with 4869 files, it has 1 minutes 30 seconds faster 👍

Before v1.2.4

vendor/bin/rector process --clear-cache --dry-run 1629.64s user 49.34s system 651% cpu 4:17.87 total

After dev-main

vendor/bin/rector process --clear-cache --dry-run 1211.25s user 39.69s system 744% cpu 2:47.94 total

@TomasVotruba
Copy link
Member

TomasVotruba commented Sep 9, 2024

Numbers on one bigger project 🥳

Before:

vendor/bin/rector -n --clear-cache  4735.58s user 111.60s system 983% cpu 8:13.08 total

After:

vendor/bin/rector -n --clear-cache  2887.89s user 73.34s system 1197% cpu 4:07.38 total

@carlos-granados carlos-granados deleted the optimise-node-type-check branch September 10, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants