Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce generated IL code size #65

Merged
merged 10 commits into from
Aug 12, 2022
Merged

Conversation

MrJul
Copy link
Contributor

@MrJul MrJul commented Aug 3, 2022

I was poking around the IL generated from XAML files by Avalonia to understand how the IL compiler works, and noticed some redundant instructions. So here's a pull request trying to optimize those a bit :) The goal is to reduce the final IL size, and if possible improve the JIT compiling times, which are high.

Notes

Avalonia.Themes.Fluent was the main target of these optimizations since it has many, relatively heavy, XAML files.

This PR contains several independent commits and I think it's easier to review commit per commit. If needed, I can split it into several PRs.

LangVersion is set to latest to use switch expressions and improved pattern matching. If that's not acceptable, I can rewrite those with older C# constructs.

Hopefully no breaking changes in public, non-IL specific types (IXamlPropertySetter, etc.). However there are new overloads in IXamlILEmitter, which is technically breaking, but is very unlikely to be implemented by types other than those already in XamlX.

Commits

Fixed x:Null incorrectly being assignable to a value type for Cecil

Not an optimization, but a fix for a bug I stumbled upon later when optimizing the property setters. For Cecil, x:Null was always incorrectly assignable to everything, even value types. It now has the same logic as S.R.Emit. A failing test has been added, which now passes.

Using optimized ldloc/stloc/ldarg/ldc.i4 opcodes when possible

A low hanging fruit: use the short encoding forms of ldloc/stloc/ldc.i4 when possible. This reduces most ldloc/stloc from 5 bytes to 1. SRE already does this automatically internally, but Cecil doesn't.

Removed unnecessary casts for dynamic property assignments

PropertyAssignmentEmitter was adding an unneeded cast right after a value was checked to either be statically assignable by downcast, or after a runtime isinst call, which pushes the correct the type on the stack on success.

Removed unnecessary casts for adders

Casts were present on base interface calls for adders, which was very noticeable for Avalonia's Style.Resources which is typed as IResourceDictionary, but whose Add method is on IDictionary<TKey,TValue>. This removes those as they're not necessary. (Even IAddChild logic, which is right after, didn't add those casts.)

NamespaceInfoProvider: generated CreateNamespaceInfo method to avoid duplicate code

This commits generates a simple helper function to reduce highly duplicated code when generating the namespaces alias mappings.

Cached dynamic property setters

The biggest change in this PR. When PropertyAssignmentEmitter has to do runtime dispatch because there are multiple setters that can't be resolved at compile time, it now generates a single reusable method for those setters, per XAML file. For Avalonia.Themes.Fluent, all <SolidColorBrush x:Key="xxx" Color="{StaticResource yyy}" /> in a single file calls an unique method instead of generating the same branches that check if the static resource returned an UnsetValueType, IBinding or Color. The result is 8 different methods in the assembly instead of 183 times the branches.

This commit requires that IXamlPropertySetter correctly implements Equals and GetHashCode. If an implementation doesn't, the generated code is still correc: the generated method simply won't be reused (so no IL size gain). Ideally IXamlPropertySetter should inherit from IEquatable<IXamlPropertySetter> but that would be a breaking change. If this PR is accepted, I have a follow up one on Avalonia for its custom setter implementations.

GetHashCode is implemented on PropertySetterBinderParameters despite this class being mutable, but it doesn't matter in practice: binding parameters are never changed once assigned in existing code (and doing so would probably break things).

While writing this, I noticed that AllowRuntimeNull wasn't really working. Each generated branch is checking for the value to be of a specific type, so null won't ever match even when this property is true. This is fixed.

Property setters can emit their own arguments for better IL generation

IXamlPropertySetter.Emit is called with arguments already being on the stack. Most custom setters need to push extra arguments before, and thus resort to stloc all existing arguments, push what they need, then ldloc the arguments. This is really noticeable on adders. A new IXamlILOptimizedEmitablePropertySetter interface has been added that setters can optionally implement, which must emit the arguments itself.

If this PR is accepted, I have a follow up one on Avalonia that implements this interface on the custom setters.

Lazy NamespaceInfoProvider.XmlNamespaces

This commit doesn't decrease IL size, but avoids the work of creating the XML namespace dictionary on initialization. The XmlNamespaces property has a lazy implementation instead, which will be jitted only when needed.

This also helps to reduce runtime memory allocations a little bit: in Avalonia those namespaces are only used in reflection bindings using either FindAncestor or having an attached property in their path.

The lazy initialization is lock free: it's very likely to be used on a single thread, if it isn't the race isn't problematic since the dictionary creation has no side effect, and the returned dictionary is read-only and never mutated.

Additionally, the returned dictionary uses an initial capacity and fixed size arrays, to avoid allocating unused memory.

Results

JIT

With Avalonia.Themes.Fluent stripped of its fonts (to keep only code), here are the results of this PR on my PC:

Before After Diff
File Size 833 KB 574 KB -31.1%
Run (JIT) 293 ms 254 ms -13.3%

The Run (JIT) column is the time spent in AvaloniaLoader.Load(this) in an empty App that has only <FluentTheme /> in its styles. Average of 10 runs in Release mode, with the .NET 6.0.7 runtime.

Overall, some noticeable improvements in both file size and JIT time spent.

R2R

Being curious, I also tested ReadyToRun (win-x64):

Before After Diff
File Size R2R 2501 KB 1939 KB -22.5%
Run (R2R) 186 ms 186 ms 0.0%

As expected the PR doesn't improve the run time since the JIT is barely involved, but the final assembly still has a nice size reduction.

@kekekeks
Copy link
Owner

kekekeks commented Aug 3, 2022

A low hanging fruit: use the short encoding forms of ldloc/stloc/ldc.i4 when possible. This reduces most ldloc/stloc from 5 bytes to 1. SRE already does this automatically internally, but Cecil doesn't.

I think Cecil has a separate Optimize method for doing that automatically, but we aren't using it for some reason

@grokys
Copy link
Collaborator

grokys commented Aug 3, 2022

I don't have a lot to add as a review of this PR as it's not my area, but I would like to give you a 👏 for the best PR description I think I've ever seen from a new contributor ;)

"DynamicSetters_" + context.Configuration.IdentifierGenerator.GenerateIdentifierPart(),
context.Configuration.WellKnownTypes.Object);
cache = new DynamicSettersCache(settersType);
context.SetItem(cache);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since DynamicSettersCache actually holds an emitted type that could and should be reused when compiling other XAML files, I think we should modify the context to allow for cross-context item sharing.

e. g. SetItem would become public void SetItem<T>(T item, bool reusable = false) and context would use a shared dictionary if it gets one in constructor.

That would only work for Cecil backend since with SRE type would be finalized by the next line, but that would still remove some code duplication where we care about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Currently even in the same XAML file the cache will be different per sub type (happens on deferred content, XamlClosure type generation), which indeed isn't ideal.

But even with the context change, the context currently only exposes a way to define a new sub type, not access the current type or the defining module which would be necessary to define a new top-level type containing all the setters. Otherwise there will be access problems since the nested type XamlClosure_X+XamlClosure_Y might try to call a method on XamlClosure_A+DynamicSetters generated earlier (nested closure types are private).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that Avalonia has lazy resources, the setters are almost never reused :( Is it ok if I tackle this problem in another PR after this one? Are breaking changes to the emit context acceptable?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since XamlX in its current state is supposed to be shipped as a git submodule (we don't have nuget packages), there is room for breaking API changes, yes.

@kekekeks
Copy link
Owner

kekekeks commented Aug 3, 2022

Note, that due to the amount of changes I could only merge the PR after a corresponding PR in Avalonia repo passes the unit tests

@MrJul
Copy link
Contributor Author

MrJul commented Aug 3, 2022

A low hanging fruit: use the short encoding forms of ldloc/stloc/ldc.i4 when possible. This reduces most ldloc/stloc from 5 bytes to 1. SRE already does this automatically internally, but Cecil doesn't.

I think Cecil has a separate Optimize method for doing that automatically, but we aren't using it for some reason

I wasn't aware about that method. Should it be used instead and the first commit removed?

I don't have a lot to add as a review of this PR as it's not my area, but I would like to give you a 👏 for the best PR description I think I've ever seen from a new contributor ;)

Thank you! :)

Note, that due to the amount of changes I could only merge the PR after a corresponding PR in Avalonia repo passes the unit tests

That's completely understandable, I'll add a PR soon. Will the CI be able to update the submodule since it will refer to a non-merged PR commit?

@kekekeks
Copy link
Owner

kekekeks commented Aug 3, 2022

Should it be used instead and the first commit removed?

We definitely should use it since it optimizes branching ops to short versions too.

Will the CI be able to update the submodule since it will refer to a non-merged PR commit?

Commits in github forks are globally visible, so CI can pick submodule commit from any fork, yes

@MrJul
Copy link
Contributor Author

MrJul commented Aug 4, 2022

I just pushed a new commit after doing more testing.

The EmitConvert call in was actually still needed for one specific case: T to Nullable<T>. It now uses the current type on the stack (after isinst) instead of the original value type, so extra casts aren't emitted in all other cases.

I also added new unit tests that check the correctness of dynamic setters for value types, nullable value types and reference types.

@MrJul
Copy link
Contributor Author

MrJul commented Aug 5, 2022

New numbers for this PR with the empty app now that the (awesome) lazy resources PR has been merged in Avalonia:

Before After Diff
File Size 1032 KB 815 KB -21.0%
Run (JIT) 78 ms 57 ms -27.0%

However, with the lazy resources, setters aren't reused since each resource has its own closure class and the cache works per class, as mentioned earlier. I'd like to keep this PR code as-is if that's fine with the maintainers, and improve the setter cache part in another PR.

As a side note, the fluent theme XAML files now compile to more than 1700 classes to the assembly, which might also need a change (even though I haven't measured the impact of having that many types versus one big type with many methods).

@grokys
Copy link
Collaborator

grokys commented Aug 5, 2022

I'd like to keep this PR code as-is if that's fine with the maintainers, and improve the setter cache part in another PR.

Sounds good to me - Setters are still used as part of styles (which won't be affected by lazy resources).

As a side note, the fluent theme XAML files now compile to more than 1700 classes to the assembly, which might also need a change

Yep, I'm pretty sure there's a lot of optimization we can do. We should check how much overhead this requires.

@kekekeks kekekeks merged commit 79ab88b into kekekeks:master Aug 12, 2022
@MrJul MrJul deleted the size-optimization branch August 15, 2022 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants