Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate some ops to enable more fusion opportunities and reduce memory footprint #433

Merged
merged 4 commits into from
Jun 30, 2022

Conversation

wyzero
Copy link
Collaborator

@wyzero wyzero commented Jun 30, 2022

No description provided.

// Modern NN networks are usually composed of multiple similar layers. Thus the
// above patterns are very common especailly when we enable shape constraint ir
// optimization (if enabled, we will do shape prpagation egaerly, and may
// further enable cross layer CSE, which in turn increases the change of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change -> chance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

let options = [
Option<"gpu_enabled_", "gpu-enabled", "bool",
/*default=*/"true", "whether gpu is available.">,
Option<"fusion_strategy_", "fusion-strategy", "std::string",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need the fusion-strategy option? If it is always base, we can remove it in this PR, and add this option if it's required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to leave the option here even though we do not use it now actually. It's better if we make use of such config. Currently implementation is just a conservative strategy. Furthermore, we only duplicate scalar-bcast pattern in this PR, while eventually we will need a general "duplicate fusion" pass like XLA.

@Yancey1989
Copy link
Collaborator

BTW, any benchmark data on a model about this pass?

@wyzero
Copy link
Collaborator Author

wyzero commented Jun 30, 2022

BTW, any benchmark data on a model about this pass?

In my test case, it reduces around ~1.5ms (e2e is ~6.5ms). I haven't test this feature on other models. Thus I do not enable this feature by default (guarded by shape-constraint-ir flag). I'll evaluate shape-constraint-ir on more models in next month, and then make a decision if it's ready to enable this feature by default.

@wyzero wyzero merged commit 21c3113 into alibaba:main Jun 30, 2022
@wyzero wyzero deleted the features/wenyi_shape_constraint branch June 30, 2022 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants