Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HashSet::get_or_insert_with_mut #396

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

yyny
Copy link

@yyny yyny commented Feb 14, 2023

This function avoids unnecessairy work when the value to be stored can take ownership of the lookup key, such as when interning. It makes a stronger guarantee than HashSet::get_or_insert_with, since the reference given to f is guaranteed to be unique, and thus safe to mutate.

The functionality of this function can already be achieved by (ab)using UnsafeCell, but that relies on the implementation detail that the &Q given to f is unique.

This function avoids unnecessairy work when the value to be stored can
take ownership of the lookup key, such as when interning.
It makes a stronger guarantee than `HashSet::get_or_insert_with`, since
the reference given to `f` is guaranteed to be unique, and thus safe to
mutate.

The functionality of this function can already be achieved by (ab)using
`UnsafeCell`, but that relies on the implementation detail that the `&Q`
given to `f` is unique.
@JustForFun88
Copy link
Contributor

JustForFun88 commented Feb 15, 2023

To be honest, the implementation of this, as well as the get_or_insert_with method, raises questions. These two methods make it quite easy to break the invariant of set that all elements are repeated in set only once. Let's say this test passes:

#[test]
fn some_invalid_insert() {
	let mut set = HashSet::new();
	set.insert(1);
	set.get_or_insert_with(&2, |_| 1);
	set.get_or_insert_with(&3, |_| 1);
	assert_eq!(vec![&1, &1, &1], set.iter().collect::<Vec<_>>());
}

Can you at least document it?

@yyny
Copy link
Author

yyny commented Feb 16, 2023

Oh wow, nice catch! In hindsight, I would have not have expected this behavior from a (safe) function called get_or_insert_with, had I not had experience with hash-map internals already. I would have expected something like:

// This code doesn't borrow-check :(

if let Some(result) = self.get(value) {
    result
} else {
    self.insert_unique_unchecked(f(value))
}

On that note, we also probably want a get_or_compute that does this, especially since the "obvious" way to do it as shown above doesn't actually compile.

As for get_or_insert_with and get_or_insert_with_mut, It might be warranted to either

  1. When inserting, check that the &Q and the T are equivalent.
  2. When inserting, check that the &Q and the T are equivalent, and that their hashes are the same.
  3. When inserting, find and insert into the entry for T instead of inserting into the entry for &Q.
  4. Rename these functions to _unchecked.
  5. Mark these functions unsafe.

My two cents:

1, 2) These would be the safest option, but it seems overly cautious. There are already other safe functions like insert_unique_unchecked that also have unspecified behavior when invariants are not met.

3) A separate get_or_compute could be added for this

4) This would be a lot more self-documenting, and there is precedent for this, but:

4, 5) These are hard ones... get_or_insert_with_mut is a generalization of get_or_insert_with, which is a generalization of get_or_insert_owned, which is a (more performant) generalization of get_or_insert/get_or_compute. Any argument for marking get_or_insert_with as unsafe or _unchecked could equally apply to get_or_insert_owned.

It could be argued that get_or_insert_with is "too general" because it can be abused, but get_or_insert_owned could be abused in the same way by providing a weird ToOwned implementation.

I will just add a comment for now mentioning the expected invariants.

@JustForFun88
Copy link
Contributor

Oh wow, nice catch! In hindsight, I would have not have expected this behavior from a (safe) function called get_or_insert_with, had I not had experience with hash-map internals already. I would have expected something like:

// This code doesn't borrow-check :(

if let Some(result) = self.get(value) {
    result
} else {
    self.insert_unique_unchecked(f(value))
}

#400. But we have to wait for @Amanieu to approve it.

@Amanieu
Copy link
Member

Amanieu commented Feb 22, 2023

get_or_insert_with_mut still has the issue that you can abuse it to change the key to something with a different hash, which is fundamentally the same issue as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants