Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTreeSet symmetric_difference & union optimized #65226

Merged
merged 1 commit into from
Oct 19, 2019
Merged

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Oct 8, 2019

No scalability changes, but:

  • Grew the cmp_opt function (shared by symmetric_difference & union) into a MergeIter, with less memory overhead than the pairs of Peekable iterators now, speeding up ~20% on my machine (not so clear on Travis though, I actually switched it off there because it wasn't consistent about identical code). Mainly meant to improve readability by sharing code, though it does end up using more lines of code. Extending and reusing the MergeIter in btree_map might be better, but I'm not sure that's possible or desirable. This MergeIter probably pretends to be more generic than it is, yet doesn't declare to be an iterator because there's no need to, it's only there to help construct genuine iterators SymmetricDifference & Union.
  • Compact the code of BTreeSet intersection, is_subset & difference optimizations #64820 by moving if/else into match guards.

r? @bluss

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 8, 2019
@ssomers ssomers changed the title BTreeSet difference, is_subset, symmetric_difference & union optimized BTreeSet symmetric_difference & union optimized Oct 10, 2019
@ssomers
Copy link
Contributor Author

ssomers commented Oct 10, 2019

The results of the relevant benches in my separate repository (using cargo bench --features merge) on my win64 Intel machine:

running 56 tests
test symmdiff_neg_vs_pos::_100_neg_vs_100_pos        ... bench:         844 ns/iter (+/- 42)
test symmdiff_neg_vs_pos::_100_neg_vs_100_pos_future ... bench:         539 ns/iter (+/- 9)
test symmdiff_neg_vs_pos::_100_neg_vs_10k_pos        ... bench:      35,264 ns/iter (+/- 1,644)
test symmdiff_neg_vs_pos::_100_neg_vs_10k_pos_future ... bench:      29,127 ns/iter (+/- 991)
test symmdiff_neg_vs_pos::_100_pos_vs_100_neg        ... bench:         848 ns/iter (+/- 7)
test symmdiff_neg_vs_pos::_100_pos_vs_100_neg_future ... bench:         507 ns/iter (+/- 14)
test symmdiff_neg_vs_pos::_100_pos_vs_10k_neg        ... bench:      41,068 ns/iter (+/- 1,083)
test symmdiff_neg_vs_pos::_100_pos_vs_10k_neg_future ... bench:      25,238 ns/iter (+/- 729)
test symmdiff_neg_vs_pos::_10k_neg_vs_100_pos        ... bench:      47,444 ns/iter (+/- 1,238)
test symmdiff_neg_vs_pos::_10k_neg_vs_100_pos_future ... bench:      25,106 ns/iter (+/- 737)
test symmdiff_neg_vs_pos::_10k_neg_vs_10k_pos        ... bench:      82,661 ns/iter (+/- 2,157)
test symmdiff_neg_vs_pos::_10k_neg_vs_10k_pos_future ... bench:      52,809 ns/iter (+/- 4,683)
test symmdiff_neg_vs_pos::_10k_pos_vs_100_neg        ... bench:      42,240 ns/iter (+/- 983)
test symmdiff_neg_vs_pos::_10k_pos_vs_100_neg_future ... bench:      23,718 ns/iter (+/- 629)
test symmdiff_neg_vs_pos::_10k_pos_vs_10k_neg        ... bench:      83,755 ns/iter (+/- 2,881)
test symmdiff_neg_vs_pos::_10k_pos_vs_10k_neg_future ... bench:      48,882 ns/iter (+/- 1,496)
test symmdiff_random_100::vs_100                     ... bench:         911 ns/iter (+/- 25)
test symmdiff_random_100::vs_100_future              ... bench:         557 ns/iter (+/- 10)
test symmdiff_random_100::vs_10k                     ... bench:      52,070 ns/iter (+/- 1,448)
test symmdiff_random_100::vs_10k_future              ... bench:      34,138 ns/iter (+/- 1,107)
test symmdiff_random_100::vs_1600                    ... bench:       9,070 ns/iter (+/- 241)
test symmdiff_random_100::vs_1600_future             ... bench:       5,310 ns/iter (+/- 174)
test symmdiff_random_10k::vs_10k                     ... bench:     202,292 ns/iter (+/- 6,135)
test symmdiff_random_10k::vs_10k_future              ... bench:     144,226 ns/iter (+/- 2,972)
test symmdiff_subsets::_100_vs_10k                   ... bench:      39,869 ns/iter (+/- 1,057)
test symmdiff_subsets::_100_vs_10k_future            ... bench:      25,936 ns/iter (+/- 1,838)
test symmdiff_subsets::_10_vs_100                    ... bench:         418 ns/iter (+/- 1)
test symmdiff_subsets::_10_vs_100_future             ... bench:         306 ns/iter (+/- 4)
test union_neg_vs_pos::_100_neg_vs_100_pos           ... bench:         800 ns/iter (+/- 26)
test union_neg_vs_pos::_100_neg_vs_100_pos_future    ... bench:         589 ns/iter (+/- 17)
test union_neg_vs_pos::_100_neg_vs_10k_pos           ... bench:      33,128 ns/iter (+/- 1,346)
test union_neg_vs_pos::_100_neg_vs_10k_pos_future    ... bench:      28,824 ns/iter (+/- 1,331)
test union_neg_vs_pos::_100_pos_vs_100_neg           ... bench:         818 ns/iter (+/- 37)
test union_neg_vs_pos::_100_pos_vs_100_neg_future    ... bench:         552 ns/iter (+/- 14)
test union_neg_vs_pos::_100_pos_vs_10k_neg           ... bench:      41,000 ns/iter (+/- 1,086)
test union_neg_vs_pos::_100_pos_vs_10k_neg_future    ... bench:      27,068 ns/iter (+/- 734)
test union_neg_vs_pos::_10k_neg_vs_100_pos           ... bench:      45,292 ns/iter (+/- 1,493)
test union_neg_vs_pos::_10k_neg_vs_100_pos_future    ... bench:      28,736 ns/iter (+/- 750)
test union_neg_vs_pos::_10k_neg_vs_10k_pos           ... bench:      78,098 ns/iter (+/- 2,049)
test union_neg_vs_pos::_10k_neg_vs_10k_pos_future    ... bench:      57,880 ns/iter (+/- 1,530)
test union_neg_vs_pos::_10k_pos_vs_100_neg           ... bench:      41,698 ns/iter (+/- 382)
test union_neg_vs_pos::_10k_pos_vs_100_neg_future    ... bench:      24,999 ns/iter (+/- 651)
test union_neg_vs_pos::_10k_pos_vs_10k_neg           ... bench:      81,957 ns/iter (+/- 2,914)
test union_neg_vs_pos::_10k_pos_vs_10k_neg_future    ... bench:      52,773 ns/iter (+/- 3,054)
test union_random_100::vs_100                        ... bench:         869 ns/iter (+/- 2)
test union_random_100::vs_100_future                 ... bench:         617 ns/iter (+/- 11)
test union_random_100::vs_10k                        ... bench:      49,279 ns/iter (+/- 2,365)
test union_random_100::vs_10k_future                 ... bench:      39,878 ns/iter (+/- 1,571)
test union_random_100::vs_1600                       ... bench:       8,640 ns/iter (+/- 230)
test union_random_100::vs_1600_future                ... bench:       6,443 ns/iter (+/- 190)
test union_random_10k::vs_10k                        ... bench:     196,242 ns/iter (+/- 10,827)
test union_random_10k::vs_10k_future                 ... bench:     150,521 ns/iter (+/- 4,301)
test union_subsets::_100_vs_10k                      ... bench:      35,721 ns/iter (+/- 1,508)
test union_subsets::_100_vs_10k_future               ... bench:      30,411 ns/iter (+/- 1,165)
test union_subsets::_10_vs_100                       ... bench:         399 ns/iter (+/- 16)
test union_subsets::_10_vs_100_future                ... bench:         346 ns/iter (+/- 9)

@ssomers
Copy link
Contributor Author

ssomers commented Oct 10, 2019

Oh sh... seems after many experiments I lost track of size_hint. size_hint is correct at the start, but can underestimate 1 after iteration has started. Fix coming up...

Comment on lines 143 to 147
let mut a_next = match self.peeked {
Some(MergeIterPeeked::A(next)) => next,
_ => self.a.next(),
};
let mut b_next = match self.peeked {
Some(MergeIterPeeked::B(next)) => next,
_ => self.b.next(),
};
Copy link
Contributor Author

@ssomers ssomers Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, if I define both (a_next, b_next) with a single match expression (which looks more elegant to me), the performance gain vanishes completely halves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unfortunate, it feels like that kind of detail has been there for a loong time (since early Rust 1.x), that tuple match has inferior codegen.

@ssomers
Copy link
Contributor Author

ssomers commented Oct 10, 2019

Since I claim this fancy MergeIter is so much better, why doesn't intersection use it too?

  • Intersection doesn't need any peeking, whenever next() returns, both iterators point to old news or nothing.
  • Intersection can stop as soon as one iterator peters out (as it did since day 1)

MergeIter could be taught the latter trick too, but even the former point on its own still makes a difference. This is what changes if intersection stitch uses a MergeIter that bails out as soon as it can:

test intersect_random_100::vs_100                     ... bench:         545 ns/iter (+/- 29)
test intersect_random_100::vs_100_future              ... bench:         662 ns/iter (+/- 32)
test intersect_random_10k::vs_10k                     ... bench:     146,793 ns/iter (+/- 1,923)
test intersect_random_10k::vs_10k_future              ... bench:     161,731 ns/iter (+/- 3,515)
test intersection_subsets::_10_vs_100                 ... bench:         213 ns/iter (+/- 3)
test intersection_subsets::_10_vs_100_future          ... bench:         247 ns/iter (+/- 2)

In addition, the current implementation of intersection is pretty simple, so there is no readability gain by complicating MergeIter instead.

Difference has a tougher implementation. This is what changes if difference stitch uses a MergeIter that bails out as soon as it can:

test difference_random_100::vs_100                     ... bench:         686 ns/iter (+/- 9)
test difference_random_100::vs_100_future              ... bench:         870 ns/iter (+/- 3)
test difference_random_10k::vs_10k                     ... bench:     166,044 ns/iter (+/- 6,178)
test difference_random_10k::vs_10k_future              ... bench:     174,894 ns/iter (+/- 10,889)
test difference_subsets::_10_vs_100                    ... bench:         196 ns/iter (+/- 3)
test difference_subsets::_10_vs_100_future             ... bench:         212 ns/iter (+/- 5)

And this is what change if difference avoids Peekable without relying on MergeIter:

test difference_random_100::vs_100                     ... bench:         676 ns/iter (+/- 18)
test difference_random_100::vs_100_future              ... bench:         636 ns/iter (+/- 11)
test difference_random_10k::vs_10k                     ... bench:     164,192 ns/iter (+/- 4,217)
test difference_random_10k::vs_10k_future              ... bench:     159,405 ns/iter (+/- 4,327)
test difference_subsets::_10_vs_100                    ... bench:         201 ns/iter (+/- 9)
test difference_subsets::_10_vs_100_future             ... bench:         179 ns/iter (+/- 6)

@bluss
Copy link
Member

bluss commented Oct 13, 2019

It would be cool to see the improvements as bench diffs, like https://github.com/BurntSushi/cargo-benchcmp style. (Just if you think it's interesting! I think it has prefix/suffix filters?)

@ssomers
Copy link
Contributor Author

ssomers commented Oct 15, 2019

With the latest benching code, and building benches in two flavours, running the same benches with both, and then comparing "old" (nightly) with "new" (this PR):

cargo bench --features merge symmdiff >bench_symmdiff_alone.txt
cargo bench --features merge,diff,intersect symmdiff >bench_symmdiff_crowd.txt
cargo benchcmp symmdiff_old symmdiff_new bench_symmdiff_crowd.txt
 name                         symmdiff_old ns/iter  symmdiff_new ns/iter  diff ns/iter  diff %  speedup
 ::parted_100_neg_vs_100_pos  800                   558                           -242  -30.25%   x 1.43
 ::parted_100_neg_vs_10k_pos  36,280                26,173                     -10,107  -27.86%   x 1.39
 ::parted_100_pos_vs_100_neg  839                   561                           -278  -33.13%   x 1.50
 ::parted_100_pos_vs_10k_neg  43,766                26,994                     -16,772  -38.32%   x 1.62
 ::parted_10k_neg_vs_100_pos  43,648                26,983                     -16,665  -38.18%   x 1.62
 ::parted_10k_neg_vs_10k_pos  79,831                52,857                     -26,974  -33.79%   x 1.51
 ::parted_10k_pos_vs_100_neg  39,621                28,233                     -11,388  -28.74%   x 1.40
 ::parted_10k_pos_vs_10k_neg  82,888                55,308                     -27,580  -33.27%   x 1.50
 ::random_100_vs_100          995                   575                           -420  -42.21%   x 1.73
 ::random_100_vs_10k          55,121                36,346                     -18,775  -34.06%   x 1.52
 ::random_100_vs_1600         9,640                 5,902                       -3,738  -38.78%   x 1.63
 ::random_10k_vs_10k          201,682               151,113                    -50,569  -25.07%   x 1.33
 ::random_10k_vs_160k         1,093,440             820,760                   -272,680  -24.94%   x 1.33
 ::subset_100_vs_10k          39,484                26,208                     -13,276  -33.62%   x 1.51
 ::subset_10_vs_100           411                   297                           -114  -27.74%   x 1.38

With the _alone measurement instead, the improvement is even better. But if we compare old with "newer" instead (i.e. #65375), as reported there, the _crowd measurement says the change is a total dud, and only the _alone measurement justifies doing it.

By the way, across 5 samples, the measurements for each flavour are within 5% and just a few over 3% different. But the same measurement for "difference" instead of "symmdiff" sometimes exhibits a ~10% boost across all benches, in both flavours.

@bluss
Copy link
Member

bluss commented Oct 15, 2019

@ssomers If i follow you, this PR has better results than #65375?

@ssomers
Copy link
Contributor Author

ssomers commented Oct 15, 2019

@bluss it depends... this PR is always better than the status quo, but #65375 is somewhere between a status quo and noticeably better than the best measurement of this PR.

E.g. comparing new (this PR) and newer (#65375) with the benches in isolation, threshold 5%:

 name                       symmdiff_new:: ns/iter  symmdiff_newer:: ns/iter  diff ns/iter  diff %  speedup
 parted_100_neg_vs_100_pos  529                     461                                -68  -12.85%   x 1.15
 parted_100_neg_vs_10k_pos  25,686                  19,565                          -6,121  -23.83%   x 1.31
 parted_100_pos_vs_100_neg  496                     441                                -55  -11.09%   x 1.12
 parted_10k_neg_vs_10k_pos  49,665                  44,083                          -5,582  -11.24%   x 1.13
 parted_10k_pos_vs_100_neg  23,126                  19,712                          -3,414  -14.76%   x 1.17
 parted_10k_pos_vs_10k_neg  49,149                  44,636                          -4,513   -9.18%   x 1.10
 random_100_vs_1600         5,305                   4,332                             -973  -18.34%   x 1.22
 subset_100_vs_10k          24,365                  22,470                          -1,895   -7.78%   x 1.08
 subset_10_vs_100           285                     257                                -28   -9.82%   x 1.11

@ssomers
Copy link
Contributor Author

ssomers commented Oct 16, 2019

Some more experiments confirm the latest findings, and that some of the ideas in #65375 only hold if you exclude other benches from the build. In particular, my claim that the enum variants A(Option<I::Item>) in this PR perform better than A(I::Item), or in other words, that MergeIter is magically better at caching None than Iter is at returning None. So wee bit simpler version coming up without those Options, and with review findings in #65375.

So whether #65375 is better remains a question, but my opinion is now the same as just before opening this PR: there's no real evidence that Peeking is better. The only thing that's still confirmed is that replacing Peekable with Peeking boosts difference a few percent (i.e. 0 to 5%), not really worth the trouble I guess.

f.debug_tuple("Intersection").field(&answer).finish()
}
}
f.debug_tuple("Intersection").field(&self.inner).finish()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not proposing any change.) I can't see or remember why these don't use derive(Debug), anyone know? It looks like implementation including trait bounds matches what the derive would produce.

Copy link
Contributor Author

@ssomers ssomers Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matches, the old code takes out the enum variant names. But I think those are pretty handy. And I don't think brevity is a good counterargument, given that fields like iterators list the whole set awaiting (at least for small sets, I haven't tried big ones).

Copy link
Contributor Author

@ssomers ssomers Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh but there's one thing the old code also left out: the set reference itself, for the search variants. That's just something I figured would be better, because it never changes throughout iteration. I don't think it was ever discussed.

If there's Since there is no code around to limit the number of elements shown for an iterator, and if since there's no such code for showing a set, than debugging with large sets would does become more painful. Or more accurately, the Search alternative inadvertently helped debugging by hiding the large set and that brief period of relief is now over.

But for debugging small sets, it becomes easier, since you can see everything involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, now I see another angle to the question. My guess is that it's the only way to get that 1.17 version showing up in the documentation

.field(&self.a)
.field(&self.b)
.finish()
f.debug_tuple("SymmetricDifference").field(&self.0).finish()
Copy link
Member

@bluss bluss Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked on the internals forum whether we could prefer the style f.debug_tuple(stringify!(SymmetricDifference)) when we hard-code the struct/enum name. I still prefer that style, but on the forum the suggestion got no interest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that style too, but not passionately. I haven't even seen that stringify! in the wild since reading about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more convinced if it could be f.debug_tuple(stringify!(Self)), to prevent copy-and-paste errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just print Self.

Copy link
Member

@bluss bluss Oct 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringify! has no smarts, but at least it look like an ident and a refactoring tool could potentially handle renames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and even though renaming seems far-fetched for a stable struct like SymmetricDifference, it's still useful to guide others when they base their work on the source code.

But consistency is also a virtue, and Iter does a literal string too, so should we stringify that otherwise unrelated code? Come to think of it, why doesn't debug_set have a name? And is there a way to find posts containing debug_set on internals.rust-lang.org (hint: it's not through the search facility and not by replacing the underscore with %5F in the URL).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been decided that it is not necessary to have type information in the Debug formats, but structs usually have their name (for clarity I guess). The nameless set probably begins with the nameless Vec formatting, which comes naturally from delegating to the slice's formatting, which is also nameless.

Keep it consistent if you want. We shouldn't go out to change unrelated code in this PR, and changing this to use stringify in the whole crate is probably needlessly contentious.

struct MergeIterInner<I>
where
I: Iterator,
I::Item: Copy,
Copy link
Member

@bluss bluss Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The where style in the new code does not match the existing code in the file.

I'd prefer to keep the existing style, we keep the same file consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally rustfmt everything, and there are only 2 places where set.rs doesn't follow suit (but that's because a lot of code is commented out in my working copy).

Note sure what the existing style is though. They're all functions instead of types, and while most have one line per clause, BTreeSet::range has all clauses on one line, split_off squashes further. map.rs shows them on structs, but is in doubt about the number of clauses per line.

So it seems we indent where always once (or twice, but not trice for nested functions) and put the first clause behind.

@bluss
Copy link
Member

bluss commented Oct 16, 2019

Ok, so now I abandon the other PR and we focus on this one, that sounds good?

match self.0.nexts() {
(None, None) => return None,
(Some(item), None) => return Some(item),
(None, Some(item)) => return Some(item),
Copy link
Member

@bluss bluss Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be combined into one branch (Some(item), None) | (None, Some(item)) =>

And I guess it could be in the style of (elt @ Some(_), None) | (None, elt @ Some(_)) => return elt. Same thing, I guess it can't possibly change codegen(?)

Copy link
Contributor Author

@ssomers ssomers Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the @ thing. I don't think combining here helps readability because the repeated code is tiny and aligned, but I can live with it.

You raise an interesting name debate: I recently got to https://internals.rust-lang.org/t/the-is-not-empty-method-as-more-clearly-alternative-for-is-empty/10612/91 and took for granted the statement that we should be talking about items and not elements. Now I see elements in the comments too. But I wouldn't call an optional element an element, I'd call it next. never mind, the match says it's really an element/item

I'm going to benchmark too if a_next.and(b_next).is_none() { return a_next.or(b_next); }. Not a beauty but it is short.

Copy link
Member

@bluss bluss Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option::xor is new. I'm mildly excited because it looks like it fits here. Oh. it doesn't fit exactly.. let down :)

I didn't intentionally change item to elt. elt is just what my fingers produce without thinking. Using item is of course fine. (I learned elt from the rust community I believe.)

Copy link
Contributor Author

@ssomers ssomers Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how xor fits here? I used it instead of or here, but it doesn't serve any purpose there.

let (a_next, b_next) = self.0.nexts();
if a_next.and(b_next).is_none() {
    return a_next.xor(b_next)
}

Now hold on to your seat (or be weary of benchmarks)... this xor thrashes performance by up to 20%. Just turn that xor into or and performance is back to what it is with the 4 line match above.

Another disappointment:

match self.0.nexts() {
    (None, None) => return None,
    (next @ Some(_), None) | (None, next @ Some(_)) => return next,
    (Some(_), Some(_)) => (),
}

is about 6% worse, and more so with the "parted_" benches, where there are no (Some(), Some()) matches. I simplified it further to (a surprise in itself):

match self.0.nexts() {
    (Some(_), Some(_)) => (),
    (next, None) | (None, next) => return next,
}

and that improves it to just about the best level, except for the "parted_" benches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and earlier I had tried some alternatives to MergeIter::next returning a pair of items:

  • (Option<&Item>, usize) where second is the number of sets the item (if any) was in: seemed okay
  • (Option<&Item>, bool) where second is basically "the if clause of SymmetricDifference": ugly, even if I'd turn the bool into an enum
  • move that bool into MergeIter, or just the next function itself: ugly

and none seemed to perform better or worse.

Copy link
Contributor Author

@ssomers ssomers Oct 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw another angle to that xor: pass a lambda to the next function that says what to do with a result on both sides. Option::or for Union, Option::xor for SymmetridDifference. And as a consequence move the possible loop inside MergeIterInner. Looks nice, actually, but performance suffers (according to what we now trust as benchmarking), for both SymmetricDifference and Union.

@bluss
Copy link
Member

bluss commented Oct 16, 2019

Looks good to me, I only had style questions

@ssomers
Copy link
Contributor Author

ssomers commented Oct 16, 2019

Ok, so now I abandon the other PR and we focus on this one, that sounds good?

Yeah, basically what I've done, but I'll still be trying to find hints or answers whether and why benches fooled us.

@bluss
Copy link
Member

bluss commented Oct 17, 2019

Cool, you've done a lot of exploration. This is r=me when you think it is ready

@ssomers
Copy link
Contributor Author

ssomers commented Oct 17, 2019

Whitespace around where fixed, Union::next shortenend, size_hint comments edited. Apart from perhaps the debug_tuple stringification, that's everything covered I think.

Current benchmark comparisons, with 5% threshold: no change for difference & intersection, symmetric_difference & union say:

 name                       symmdiff_old:: ns/iter  symmdiff_new:: ns/iter  diff ns/iter   diff %  speedup
 parted_100_neg_vs_100_pos  847                     606                             -241  -28.45%   x 1.40
 parted_100_neg_vs_10k_pos  38,270                  31,852                        -6,418  -16.77%   x 1.20
 parted_100_pos_vs_100_neg  896                     599                             -297  -33.15%   x 1.50
 parted_100_pos_vs_10k_neg  44,218                  28,996                       -15,222  -34.42%   x 1.52
 parted_10k_neg_vs_100_pos  45,628                  27,675                       -17,953  -39.35%   x 1.65
 parted_10k_neg_vs_10k_pos  83,578                  59,588                       -23,990  -28.70%   x 1.40
 parted_10k_pos_vs_100_neg  42,773                  29,604                       -13,169  -30.79%   x 1.44
 parted_10k_pos_vs_10k_neg  86,666                  58,811                       -27,855  -32.14%   x 1.47
 random_100_vs_100          939                     587                             -352  -37.49%   x 1.60
 random_100_vs_10k          57,084                  39,194                       -17,890  -31.34%   x 1.46
 random_100_vs_1600         9,864                   6,105                         -3,759  -38.11%   x 1.62
 random_10k_vs_10k          206,880                 153,195                      -53,685  -25.95%   x 1.35
 random_10k_vs_160k         1,139,840               822,350                     -317,490  -27.85%   x 1.39
 subset_100_vs_10k          40,483                  30,309                       -10,174  -25.13%   x 1.34
 subset_10_vs_100           423                     343                              -80  -18.91%   x 1.23

 name                       union_old:: ns/iter  union_new:: ns/iter  diff ns/iter   diff %  speedup
 parted_100_neg_vs_100_pos  791                  670                          -121  -15.30%   x 1.18
 parted_100_pos_vs_100_neg  780                  695                           -85  -10.90%   x 1.12
 parted_10k_neg_vs_100_pos  40,680               29,719                    -10,961  -26.94%   x 1.37
 parted_10k_neg_vs_10k_pos  76,771               65,485                    -11,286  -14.70%   x 1.17
 parted_10k_pos_vs_100_neg  39,534               29,350                    -10,184  -25.76%   x 1.35
 parted_10k_pos_vs_10k_neg  77,690               67,567                    -10,123  -13.03%   x 1.15
 random_100_vs_100          984                  639                          -345  -35.06%   x 1.54
 random_100_vs_10k          49,362               43,915                     -5,447  -11.03%   x 1.12
 random_100_vs_1600         8,802                7,237                      -1,565  -17.78%   x 1.22
 random_10k_vs_10k          197,955              152,490                   -45,465  -22.97%   x 1.30
 random_10k_vs_160k         1,020,540            878,320                  -142,220  -13.94%   x 1.16
 subset_100_vs_10k          36,520               33,817                     -2,703   -7.40%   x 1.08

r=bluss

@bluss
Copy link
Member

bluss commented Oct 19, 2019

Thanks! @bors r+ rollup

@bors
Copy link
Contributor

bors commented Oct 19, 2019

📌 Commit 5697432 has been approved by bluss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 19, 2019
Centril added a commit to Centril/rust that referenced this pull request Oct 19, 2019
BTreeSet symmetric_difference & union optimized

No scalability changes, but:
- Grew the cmp_opt function (shared by symmetric_difference & union) into a MergeIter, with less memory overhead than the pairs of Peekable iterators now, speeding up ~20% on my machine (not so clear on Travis though, I actually switched it off there because it wasn't consistent about identical code). Mainly meant to improve readability by sharing code, though it does end up using more lines of code. Extending and reusing the MergeIter in btree_map might be better, but I'm not sure that's possible or desirable. This MergeIter probably pretends to be more generic than it is, yet doesn't declare to be an iterator because there's no need to, it's only there to help construct genuine iterators SymmetricDifference & Union.
- Compact the code of rust-lang#64820 by moving if/else into match guards.

r? @bluss
Centril added a commit to Centril/rust that referenced this pull request Oct 19, 2019
BTreeSet symmetric_difference & union optimized

No scalability changes, but:
- Grew the cmp_opt function (shared by symmetric_difference & union) into a MergeIter, with less memory overhead than the pairs of Peekable iterators now, speeding up ~20% on my machine (not so clear on Travis though, I actually switched it off there because it wasn't consistent about identical code). Mainly meant to improve readability by sharing code, though it does end up using more lines of code. Extending and reusing the MergeIter in btree_map might be better, but I'm not sure that's possible or desirable. This MergeIter probably pretends to be more generic than it is, yet doesn't declare to be an iterator because there's no need to, it's only there to help construct genuine iterators SymmetricDifference & Union.
- Compact the code of rust-lang#64820 by moving if/else into match guards.

r? @bluss
bors added a commit that referenced this pull request Oct 19, 2019
Rollup of 5 pull requests

Successful merges:

 - #64007 (Add check for overlapping ranges to unreachable patterns lint)
 - #65192 (Use structured suggestion for restricting bounds)
 - #65226 (BTreeSet symmetric_difference & union optimized)
 - #65448 (rustc_codegen_ssa: remove some unnecessary Box special-casing.)
 - #65505 (Rc: value -> allocation)

Failed merges:

r? @ghost
@bors bors merged commit 5697432 into rust-lang:master Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants