Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize array cloning for Copy types #90755

Merged
merged 2 commits into from
Nov 11, 2021

Conversation

scottmcm
Copy link
Member

Because after PR 86041, the optimizer no longer load-merges at the LLVM IR level, which might be part of the perf loss. (I'll run perf and see if this makes a difference.)

Also I added a codegen test so this hopefully won't regress in future -- it passes on stable and with my change here, but not on the 2021-11-09 nightly.

Example on current nightly: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1f52d46fb8fc3ca3ac9f097390085ffa

type T = u8;
const N: usize = 3;

pub fn demo_clone(x: &[T; N]) -> [T; N] {
    x.clone()
}

pub fn demo_copy(x: &[T; N]) -> [T; N] {
    *x
}
; playground::demo_clone
; Function Attrs: mustprogress nofree nosync nounwind nonlazybind uwtable willreturn
define i24 @_ZN10playground10demo_clone17h98a4f11453d1a753E([3 x i8]* noalias nocapture readonly align 1 dereferenceable(3) %x) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %0 = getelementptr [3 x i8], [3 x i8]* %x, i64 0, i64 0
  %1 = getelementptr inbounds [3 x i8], [3 x i8]* %x, i64 0, i64 1
  %.val.i.i.i.i.i.i.i.i.i = load i8, i8* %0, align 1, !alias.scope !2, !noalias !9
  %2 = getelementptr inbounds [3 x i8], [3 x i8]* %x, i64 0, i64 2
  %.val.i.i.i.i.i.1.i.i.i.i = load i8, i8* %1, align 1, !alias.scope !2, !noalias !20
  %.val.i.i.i.i.i.2.i.i.i.i = load i8, i8* %2, align 1, !alias.scope !2, !noalias !23
  %array.sroa.6.0.insert.ext.i.i.i.i = zext i8 %.val.i.i.i.i.i.2.i.i.i.i to i32
  %array.sroa.6.0.insert.shift.i.i.i.i = shl nuw nsw i32 %array.sroa.6.0.insert.ext.i.i.i.i, 16
  %array.sroa.5.0.insert.ext.i.i.i.i = zext i8 %.val.i.i.i.i.i.1.i.i.i.i to i32
  %array.sroa.5.0.insert.shift.i.i.i.i = shl nuw nsw i32 %array.sroa.5.0.insert.ext.i.i.i.i, 8
  %array.sroa.0.0.insert.ext.i.i.i.i = zext i8 %.val.i.i.i.i.i.i.i.i.i to i32
  %array.sroa.5.0.insert.insert.i.i.i.i = or i32 %array.sroa.5.0.insert.shift.i.i.i.i, %array.sroa.0.0.insert.ext.i.i.i.i
  %array.sroa.0.0.insert.insert.i.i.i.i = or i32 %array.sroa.5.0.insert.insert.i.i.i.i, %array.sroa.6.0.insert.shift.i.i.i.i
  %.sroa.4.0.extract.trunc.i.i.i.i = trunc i32 %array.sroa.0.0.insert.insert.i.i.i.i to i24
  ret i24 %.sroa.4.0.extract.trunc.i.i.i.i
}

; playground::demo_copy
; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind readonly uwtable willreturn
define i24 @_ZN10playground9demo_copy17h7817453f9291d746E([3 x i8]* noalias nocapture readonly align 1 dereferenceable(3) %x) unnamed_addr #1 {
start:
  %.sroa.0.0..sroa_cast = bitcast [3 x i8]* %x to i24*
  %.sroa.0.0.copyload = load i24, i24* %.sroa.0.0..sroa_cast, align 1
  ret i24 %.sroa.0.0.copyload
}

Because after PR 86041, the optimizer no longer load-merges at the LLVM IR level, which might be part of the perf loss.  (I'll run perf and see if this makes a difference.)

Also I added a codegen test so this hopefully won't regress in future -- it passes on stable and with my change here, but not on the 2021-11-09 nightly.
@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 10, 2021
@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2021
@bors
Copy link
Contributor

bors commented Nov 10, 2021

⌛ Trying commit cc7d801 with merge 87df9f91778c7252dc2e7eddbe858af73d6d444c...

@bors
Copy link
Contributor

bors commented Nov 10, 2021

☀️ Try build successful - checks-actions
Build commit: 87df9f91778c7252dc2e7eddbe858af73d6d444c (87df9f91778c7252dc2e7eddbe858af73d6d444c)

@rust-timer
Copy link
Collaborator

Queued 87df9f91778c7252dc2e7eddbe858af73d6d444c with parent 8b09ba6, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (87df9f91778c7252dc2e7eddbe858af73d6d444c): comparison url.

Summary: This change led to moderate relevant mixed results 🤷 in compiler performance.

  • Moderate improvement in instruction counts (up to -1.1% on full builds of cranelift-codegen)
  • Small regression in instruction counts (up to 0.7% on incr-unchanged builds of wf-projection-stress-65510)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 10, 2021
@scottmcm
Copy link
Member Author

Hmm, this improvement in full-opt cranelift-codegen recovers what was lost in #86041 (comment) , but it's overall mixed. Dunno how people might feel about that.

@jackh726
Copy link
Member

Maybe try adding #[inline] annotations

@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2021
@bors
Copy link
Contributor

bors commented Nov 10, 2021

⌛ Trying commit 5b115fc with merge 619d3f7524949f70494dc855c8252f8bd77376d2...

@bors
Copy link
Contributor

bors commented Nov 10, 2021

☀️ Try build successful - checks-actions
Build commit: 619d3f7524949f70494dc855c8252f8bd77376d2 (619d3f7524949f70494dc855c8252f8bd77376d2)

@rust-timer
Copy link
Collaborator

Queued 619d3f7524949f70494dc855c8252f8bd77376d2 with parent 68ca579, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (619d3f7524949f70494dc855c8252f8bd77376d2): comparison url.

Summary: This change led to moderate relevant mixed results 🤷 in compiler performance.

  • Moderate improvement in instruction counts (up to -1.0% on full builds of cranelift-codegen)
  • Small regression in instruction counts (up to 0.7% on incr-unchanged builds of wf-projection-stress-65510)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2021
@jackh726
Copy link
Member

Okay so no difference with or without inline annotation.

r=me, whether you remove those or not

@scottmcm
Copy link
Member Author

The method on impl Clone has the #[inline]s, so I might as well leave the on the things it calls in this PR.

@bors r=jackh726

@bors
Copy link
Contributor

bors commented Nov 10, 2021

📌 Commit 5b115fc has been approved by jackh726

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 10, 2021
@bors
Copy link
Contributor

bors commented Nov 11, 2021

⌛ Testing commit 5b115fc with merge 62efba8...

@bors
Copy link
Contributor

bors commented Nov 11, 2021

☀️ Test successful - checks-actions
Approved by: jackh726
Pushing 62efba8 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 11, 2021
@bors bors merged commit 62efba8 into rust-lang:master Nov 11, 2021
@rustbot rustbot added this to the 1.58.0 milestone Nov 11, 2021
@jackh726
Copy link
Member

Targeted perf fix with mostly wins and a few small regressions.

@jackh726 jackh726 added the perf-regression-triaged The performance regression has been triaged. label Nov 11, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (62efba8): comparison url.

Summary: This change led to small relevant mixed results 🤷 in compiler performance.

  • Small improvement in instruction counts (up to -0.9% on full builds of cranelift-codegen)
  • Small regression in instruction counts (up to 0.7% on incr-unchanged builds of wf-projection-stress-65510)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

@scottmcm scottmcm deleted the spec-array-clone branch November 13, 2021 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants