Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OrderSet #46

Merged
merged 12 commits into from
Nov 25, 2017
Merged

Add OrderSet #46

merged 12 commits into from
Nov 25, 2017

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Nov 21, 2017

An OrderSet<T> wraps an OrderMap<T, ()>, and also adds set-like
functionality: union, intersection, etc.

The type and all of its supporting iterators are defined in the public
set module, and OrderSet is re-exported at the crate root. There is
also a new orderset! macro for construction.

Fixes #8.

@bluss bluss self-assigned this Nov 21, 2017
Copy link
Member

@bluss bluss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through this!

src/macros.rs Outdated
($($value:expr,)+) => { orderset!($($value),+) };
($($value:expr),*) => {
{
let _cap = ordermap!(@count $($value),*);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd copy the count part of the macro to this place. It's the only way to support #[macro_use(orderset)] which I'd like to do whenever possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/serde.rs Outdated
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where A: SeqAccess<'de>
{
let mut values = OrderSet::with_capacity_and_hasher(seq.size_hint().unwrap_or(0), Default::default());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S::default() is the neater style but I see it's just copying the existing code, so that's understandable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated both cases.

src/set.rs Outdated
/// insertion and removal calls on the set. The order does not depend on the
/// values or the hash function at all.
///
/// All iterators traverse the set in *the order*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a comment on the order in the set operation iterators

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/set.rs Outdated
}

/// Return an iterator over the values that are in `self` but not `other`.
pub fn difference<'a, S2>(&'a self, other: &'a OrderSet<T, S2>) -> Difference<'a, T, S2>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a comment on the order, not sure if there's anything interesting we can say. If we prefer to not guarantee anything that's ok. Difference, in particular, seems like a simple example of guaranteeing the order -- same as self, whenever an element appears.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to guarantee order on all of them. The tests I wrote do make sure of this, since they compare the iterators directly against range iterators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've documented them all.

src/set.rs Outdated
fn collect<C>(self) -> C
where C: FromIterator<Self::Item>
{
self.iter.map(|(x, ())| x).collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to reach its goal -- delegate down to the sliceiterator.map(f).collect() -- we need to not go through the OrderMap iterator here, right? The reason we delegate down to the libcore iterators is to get their TrustedLen etc optimizations for collect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other iterator methods should be free of this problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't really understand why you were overriding collect() at all, but it makes sense to leverage the unstable traits this way. I can make this use self.iter.iter.map(...).collect() -- although once we're digging into private fields, maybe it would be better for set to just create and use the underlying vec/slice iterator directly, avoiding the doubled map(..) in the other methods.

I was mostly trying to avoid having the set use private features of the map, but maybe that's not so helpful. I was somewhat planning for the possibility that you might rather publish this as a separate orderset crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to ship in the same crate, they are closely related.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cuviper
Copy link
Member Author

cuviper commented Nov 22, 2017

I just moved iterator_methods! to a common place too, so that can be reused by the set iterators.

@bluss bluss merged commit 52251bb into indexmap-rs:master Nov 25, 2017
@bluss
Copy link
Member

bluss commented Nov 25, 2017

Thanks a lot!

@lucab
Copy link
Contributor

lucab commented Dec 1, 2017

I'd like to point out that the behavior of the OrderSet::insert() here is actually an "upsert": if the element already exist, its relative order in the set will be updated. I'm not sure if this is intentional design.

If possible, I'd like to:

  • update insert to avoid changing order on re-insertion, similarly to other languages/libraries (see top note for Java LinkedHashSet)
  • have a dedicate upsert (or similar name)
  • explicitly document those two different behaviors

Please note that I'm not arguing which of the two designs is better, as I have (different) use cases for both of them.

EDIT: I misread the code and made a false statement at first, striked out.

@cuviper
Copy link
Member Author

cuviper commented Dec 1, 2017

@lucab

I'd like to point out that the behavior of the OrderSet::insert() here is actually an "upsert": if the element already exist, its relative order in the set will be updated. I'm not sure if this is intentional design.

That's not my intention, and doesn't seem to be the case:

    let mut set: OrderSet<_> = (0..10).collect();
    println!("{:?}", set);
    set.insert(3);
    println!("{:?}", set);

That doesn't perturb the order AFAICS.

@lucab
Copy link
Contributor

lucab commented Dec 1, 2017

@cuviper you are right, my bad. I misread OrderMap::insert() implementation and didn't check with a running example. So minor remaining questions related to my comment above:

  • is this guaranteed/documented somewhere?
  • can we have an explicit OrderSet::upsert() for cases where I need to update the order on re-insertion?

For both cases I can happily provide docs/tests/code, depending on the answers.

@cuviper
Copy link
Member Author

cuviper commented Dec 1, 2017

  • IMO this is fine to guarantee. I don't think this aspect of update behavior is documented on either the map or the set, but it's actually the easier way to implement it anyway.
  • An upsert (which I would expand as either update_insert or insert_update) has the same basic problem as remove -- what do you do with any following elements? Should it behave like swap_remove and move the last element into the hole? Should it do an O(n) shift like Add ordered_remove with O(n) perf. #15?

@cuviper
Copy link
Member Author

cuviper commented Dec 1, 2017

(Huh, I didn't realize upsert was already a common term in databases. Maybe that's not too bad then.)

@lucab
Copy link
Contributor

lucab commented Dec 1, 2017

Yes, name suggestion comes from https://wiki.postgresql.org/wiki/UPSERT. I'm not attached to it, but at the same time it already exists for this exact behavior.

Regarding removal-then-insert, as a user I'd pretty much want the behavior of #15 even if it could be very expensive performance-wise. Pointing that out in doc will probably give me (as a library user) a clear hint that I should try a different design instead first, and fallback to upsert only as a last resort.

@bluss
Copy link
Member

bluss commented Dec 1, 2017

@lucab yes, if you want strict insertion order, then this is not the right data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants