-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duplicate some ops to enable more fusion opportunities and reduce memory footprint #433
Conversation
// Modern NN networks are usually composed of multiple similar layers. Thus the | ||
// above patterns are very common especailly when we enable shape constraint ir | ||
// optimization (if enabled, we will do shape prpagation egaerly, and may | ||
// further enable cross layer CSE, which in turn increases the change of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change -> chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
let options = [ | ||
Option<"gpu_enabled_", "gpu-enabled", "bool", | ||
/*default=*/"true", "whether gpu is available.">, | ||
Option<"fusion_strategy_", "fusion-strategy", "std::string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need the fusion-strategy
option? If it is always base
, we can remove it in this PR, and add this option if it's required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to leave the option here even though we do not use it now actually. It's better if we make use of such config. Currently implementation is just a conservative strategy. Furthermore, we only duplicate scalar-bcast pattern in this PR, while eventually we will need a general "duplicate fusion" pass like XLA.
BTW, any benchmark data on a model about this pass? |
In my test case, it reduces around ~1.5ms (e2e is ~6.5ms). I haven't test this feature on other models. Thus I do not enable this feature by default (guarded by shape-constraint-ir flag). I'll evaluate |
No description provided.