Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developers can use a built in PriorityQueue type #43957

Closed
4 tasks done
Tracked by #44314
eiriktsarpalis opened this issue Oct 28, 2020 · 64 comments
Closed
4 tasks done
Tracked by #44314

Developers can use a built in PriorityQueue type #43957

eiriktsarpalis opened this issue Oct 28, 2020 · 64 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections Bottom Up Work Not part of a theme, epic, or user story Cost:S Work that requires one engineer up to 1 week Priority:3 Work that is nice to have Team:Libraries User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 28, 2020

Background and Motivation

Adding a priority queue has been a feature long requested by the community, however so far we've fallen short of finalizing a design, primarily due to a lack of consensus on certain high-level features: namely, the support for priority updates and any API or performance considerations this feature entails.

To better identify what a popular .NET priority queue should look like, I have been going through .NET codebases for common usage patterns as well as running benchmarks against various prototypes. The key takeaway is that 90% of the use cases examined do not require priority updates. It also turns out that implementations without update support can be 2-3x faster compared to ones that do support it. Please refer to the original issue for more details on my investigations.

This issue presents a finalized priority queue API proposal, informed by the above findings. The goal is to introduce a PriorityQueue class that has a simple design, caters to the majority of our users' requirements and is as efficient as possible.

A prototype implementing the proposal can be found here.

Design Decisions

  • Class is named PriorityQueue instead of Heap.
  • Implements a min priority queue (element with lowest priority dequeued first).
  • Priority ordinals are passed as separate values (rather than being encapsulated by the element).
  • Uses an array-backed quaternary heap.
  • No support for priority updates.
  • Not a thread-safe collection.
  • Not a stable queue (elements enqueued with equal priority not guaranteed to be dequeued in the same order).
  • The type does not implement IEnumerable or ICollection. Unlike Queue<T>, PriorityQueue cannot efficiently enumerate elements by ascending priority. We therefore expose the enumerable as a separate UnorderedItems property, to more effectively communicate this behaviour.

Implementation Checklist (Definition of Done)

Proposed API

namespace System.Collections.Generic
{
    public class PriorityQueue<TElement, TPriority> // NB does not implement IEnumerable or ICollection
    {
        /// <summary>
        ///   Creates an empty PriorityQueue instance.
        /// </summary>
        public PriorityQueue();

         /// <summary>
        ///   Creates a PriorityQueue instance with specified initial capacity in its backing array.
        /// </summary>
        public PriorityQueue(int initialCapacity);

        /// <summary>
        ///   Creates a PriorityQueue instance with specified priority comparer.
        /// </summary>
        public PriorityQueue(IComparer<TPriority>? comparer);
        public PriorityQueue(int initialCapacity, IComparer<TPriority>? comparer);

        /// <summary>
        ///   Creates a PriorityQueue populated with the specified values and priorities.
        /// </summary>
        public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values);
        public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer);

        /// <summary>
        ///   Gets the current element count in the queue.
        /// </summary>
        public int Count { get; }

        /// <summary>
        ///   Gets the priority comparer of the queue.
        /// </summary>
        public IComparer<TPriority> Comparer { get; }

        /// <summary>
        ///   Enqueues the element with specified priority.
        /// </summary>
        public void Enqueue(TElement element, TPriority priority);

        /// <summary>
        ///   Gets the element with minimal priority, if it exists.
        /// </summary>
        /// <exception cref="InvalidOperationException">The queue is empty.</exception>
        public TElement Peek();

        /// <summary>
        ///    Dequeues the element with minimal priority, if it exists.
        /// </summary>
        /// <exception cref="InvalidOperationException">The queue is empty.</exception>
        public TElement Dequeue();

        /// <summary>
        ///    Try-variants of Dequeue and Peek methods.
        /// </summary>
        public bool TryDequeue([MaybeNullWhen(false)] out TElement element, [MaybeNullWhen(false)] out TPriority priority);
        public bool TryPeek([MaybeNullWhen(false)] out TElement element, [MaybeNullWhen(false)] out TPriority priority);

        /// <summary>
        ///   Combined enqueue/dequeue operation, generally more efficient than sequential Enqueue/Dequeue calls.
        /// </summary>
        public TElement EnqueueDequeue(TElement element, TPriority priority);

        /// <summary>
        ///   Enqueues a sequence of element/priority pairs to the queue.
        /// </summary>
        public void EnqueueRange(IEnumerable<(TElement Element, TPriority Priority)> values);

        /// <summary>
        ///   Enqueues a sequence of elements with provided priority.
        /// </summary>
        public void EnqueueRange(IEnumerable<TElement> values, TPriority priority);

        /// <summary>
        ///   Removes all objects from the PriorityQueue.
        /// </summary>
        public void Clear();

        /// <summary>
        ///   Ensures that the PriorityQueue can hold the specified capacity and resizes its underlying buffer if necessary.
        /// </summary>
        public void EnsureCapacity(int capacity);

        /// <summary>
        ///   Sets capacity to the actual number of elements in the queue, if that is less than 90 percent of current capacity.
        /// </summary>
        public void TrimExcess();

        /// <summary>
        ///   Gets a collection that enumerates the elements of the queue.
        /// </summary>
        public UnorderedItemsCollection UnorderedItems { get; }
        public class UnorderedItemsCollection : IReadOnlyCollection<(TElement Element, TPriority Priority)>, ICollection
        {
            public struct Enumerator : IEnumerator<(TElement TElement, TPriority Priority)>, IEnumerator { }
            public Enumerator GetEnumerator();
        }
    }
}

Usage Examples

var pq = new PriorityQueue<string, int>();

pq.Enqueue("John", 1940);
pq.Enqueue("Paul", 1942);
pq.Enqueue("George", 1943);
pq.Enqueue("Ringo", 1940);

Assert.Equal("John", pq.Dequeue());
Assert.Equal("Ringo", pq.Dequeue());
Assert.Equal("Paul", pq.Dequeue());
Assert.Equal("George", pq.Dequeue());

Alternative Designs

We recognize the need for heaps that support efficient priority updates, so we will consider introducing a separate specialized class that addresses this requirement at a later stage. Please refer to the original issue for a more in-depth analysis on the alternatives.

Open Questions

  • Should we use KeyValuePair<TPriority, TElement> instead of (TPriority Priority, TElement Element)?
    • We will use tuple types instead of KeyValuePair.
  • Should enumeration of elements be sorted by priority?
    • No. We will expose the enumerable as separate property whose name emphasizes that it is unsorted.
  • Should we include a bool Contains(TElement element); method?
    • We should consider adding this. Will be O(n) operation. Another issue is taking custom element equality into consideration.
  • Should we include a void Remove(TElement element); method?
    • No. Can be ambiguous operation in the presence of duplicate elements.
@eiriktsarpalis eiriktsarpalis added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections labels Oct 28, 2020
@eiriktsarpalis eiriktsarpalis added this to the 6.0.0 milestone Oct 28, 2020
@eiriktsarpalis eiriktsarpalis self-assigned this Oct 28, 2020
@ghost
Copy link

ghost commented Oct 28, 2020

Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley
See info in area-owners.md if you want to be subscribed.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Oct 28, 2020
@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Oct 28, 2020
@MihaZupan
Copy link
Member

MihaZupan commented Oct 28, 2020

Priority ordinals are passed as separate values (rather than being encapsulated by the element)

How come? Is it common to use a heap without having a custom node type? I'm just surprised by this one as I've always had the node perform comparisons.

@stephentoub
Copy link
Member

public class PriorityQueue<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)>

Are there any constraints on the type parameters? e.g. where TPriority : notnull?

public bool TryDequeue(out TElement element, out TPriority priority);
public bool TryPeek(out TElement element, out TPriority priority);

These should have [MaybeNullWhen(false)] on both arguments.

EnqueueDequeue

Can you share an example where this is used? And it's more efficient because, if it's based on a heap, it only has to sift once instead of twice?

: IReadOnlyCollection<(TElement Element, TPriority Priority)>

Not that LINQ has some optimizations based on the non-generic ICollection.Count. I think this would the (or one of the) first prominent collection types that didn't implement it. Queue<T>, for example, implements both IReadOnlyCollection<T> and the non-generic ICollection. Did you consider implementing it? Or as part of this are we going to revisit IReadOnlyCollection<T> specialization in LINQ? Or... maybe it doesn't matter?

Should enumeration of elements be sorted by priority?

Can that be done without O(N) additional space?

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Oct 28, 2020

Priority ordinals are passed as separate values (rather than being encapsulated by the element)

How come? Is it common to use a heap without having a custom node type? I'm just surprised by this one as I've always had the node perform comparisons.

I think it is pretty common actually. When investigating our codebases I found many PQ implementations passing priority as a separate parameter. However even for queues ordering on the elements themselves, it was very common for consumers to populate the type parameter with a tuple containing an element and a priority, then use a custom comparer that projected to the priority. To me this seems like we might be forcing additional boilerplate just to encapsulate the priority.

I think the real issue here is performance, since you end up copying more stuff. In practice however I found the perf impact to be negligible, but it could still be a problem if your ordinal is a large struct.

public class PriorityQueue<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)>

Are there any constraints on the type parameters? e.g. where TPriority : notnull?

Probably not. Comparisons are done using the provided IComparer<TPriority> instance, and the default comparers seem perfectly capable of handling null parameters.

EnqueueDequeue

Can you share an example where this is used? And it's more efficient because, if it's based on a heap, it only has to sift once instead of twice?

Correct. It's also O(1) if the enqueued operation is less or equal to the min element and is always guaranteed to succeed. The standard term for that operation is push-pop and is available in the python heap implementation.

: IReadOnlyCollection<(TElement Element, TPriority Priority)>

Not that LINQ has some optimizations based on the non-generic ICollection.Count. I think this would the (or one of the) first prominent collection types that didn't implement it. Queue, for example, implements both IReadOnlyCollection and the non-generic ICollection. Did you consider implementing it? Or as part of this are we going to revisit IReadOnlyCollection specialization in LINQ? Or... maybe it doesn't matter?

I was considering ICollection<T> but it was not clear to me how to best implement the Contains() and Remove() methods. ICollection is probably fine though, will update the API proposal.

Should enumeration of elements be sorted by priority?

Can that be done without O(N) additional space?

Not to my knowledge. I thought I'd raise this point though, since we had customer feedback stating that unsorted enumeration felt counterintuitive (given how Queue<T> behaves).

@stephentoub
Copy link
Member

The standard term for that operation is push-pop and is available in the python heap implementation.

Can you share an example of an algorithm that uses it?

Can that be done without O(N) additional space?

Not to my knowledge. I thought I'd raise this point though, since we had customer feedback stating that unsorted enumeration felt counterintuitive (given how Queue behaves).

I think a struct-based Enumerator allocating O(N) space would also be counterintuitive ;-)

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Oct 29, 2020

The standard term for that operation is push-pop and is available in the python heap implementation.

Can you share an example of an algorithm that uses it?

In python it is commonly used in algorithms that need to keep the heap within a given capacity, for example when calculating k closest points.

Transcribed to C# it might look as follows:

public static IEnumerable<TValue> KLargest<TValue, TSize>(TValue[] inputs, int k, Func<TValue, TSize> calculateSize)
{
    var pq = new PriorityQueue<TValue, TSize>(inputs.Select(inp => (inp, calculateSize(inp))).Take(k));

    for (int i = k; i < inputs.Length; i++)
    {
        TValue value = inputs[i];
        pq.EnqueueDequeue(value, calculateSize(value));
    }

    return pq.Select(x => x.Element);
}

@stephentoub
Copy link
Member

I see. Such use doesn't actually care about the result of the dequeue, just that the min is replaced.

@eiriktsarpalis
Copy link
Member Author

Correct. Happy to discuss a better method name, I simply adapted pushpop to our terminology.

@stephentoub
Copy link
Member

Happy to discuss a better method name

Seems ok

@ericsampson
Copy link

Do you envision a future ConcurrentPriorityQueue ?

@stephentoub
Copy link
Member

stephentoub commented Oct 30, 2020

Do you envision a future ConcurrentPriorityQueue ?

Not at present. We'd need significant evidence that there was both a) a strong need for it and b) a significantly better implementation than just lock'ing around each operation (and not just for machines with a bazillion cores).

@GSPP
Copy link

GSPP commented Oct 31, 2020

Sets capacity to the actual number of elements in the queue, if that is less than 90 percent of current capacity.

  1. Should we not allow the user to pick that policy? They can easily perform this check themselves with a percentage of their liking. With this design, it is not possible to use a percentage above 90%.

  2. Would sorted enumeration be more efficient than the user sorting after enumeration? If yet it might be worth it to provide a EnumerateOrdered method.

  3. A ToArray method seems useful.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Nov 2, 2020

Should we not allow the user to pick that policy? They can easily perform this check themselves with a percentage of their liking. With this design, it is not possible to use a percentage above 90%.

That's literally replicating the logic of the Queue.TrimExcess method. I'm not sure why the 90% threshold was chosen there.

Would sorted enumeration be more efficient than the user sorting after enumeration? If yet it might be worth it to provide a EnumerateOrdered method.

Maybe, but users might assume it's a cheap operation. I'd rather they explicitly pay the price by sorting the elements.

A ToArray method seems useful.

Sounds reasonable, will update the proposal.

On second thought, it is not common for key/value pair collections to have a ToArray() method.

@ericsampson
Copy link

If your TElement is a class/struct that contains its priority as a field, is there a suggested pattern to avoid storing the priority twice? (is it dumb/slow to have TPriority = TElement and then use a custom comparer to project the priority field?)

@eiriktsarpalis
Copy link
Member Author

If your TElement is a class/struct that contains its priority as a field, is there a suggested pattern to avoid storing the priority twice? (is it dumb/slow to have TPriority = TElement and then use a custom comparer to project the priority field?)

Good question. You wouldn't be able to avoid storing the priority twice. However, in my benchmarks I found that this duplication has negligible performance impact (I presume though that the story might be different if the priority type is a large struct).

I recently ran an investigation of codebases that employ PriorityQueues, and found the following to be true:

  • Most elements do not encapsulate priorities.
  • Priorities are typically small, numeric values.

The design proposed here therefore is a trade-off between duplication (for the cases where elements do contain their priorities) and API simplicity (not requiring users to write wrapper types and custom comparers).

@Enderlook
Copy link

Can I ask why not having both types?
I mean, one as PriorityQueue<TElement, TPriority> and other as PriorityQueue<T>, where T is both the element and the priority.
Both implementations would be quite similar, so it wouldn't be much effort to have both.

@eiriktsarpalis
Copy link
Member Author

Can I ask why not having both types?

I don't think the benefits outweigh the effort of designing and maintaining two almost identical types.

@GSPP
Copy link

GSPP commented Nov 3, 2020

Is it considered acceptable and "a good idea" to put value tuples in the public surface area? ((TElement Element, TPriority Priority)). An alternative could be a custom struct. It would be a pretty simple DTO type.

@eiriktsarpalis
Copy link
Member Author

Is it considered acceptable and "a good idea" to put value tuples in the public surface area?

There is precedent, e.g. the Enumerable.Zip method. Alternatively we could just use KeyValuePair<TPriority, TElement>, but I felt that "Key" might not be a name users necessarily associate with priority (even though it is sometimes referred to as that in certain heap implementations).

@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Nov 4, 2020
@bartonjs
Copy link
Member

Not a stable queue (elements enqueued with equal priority not guaranteed to be dequeued in the same order).

I don't think you can call it a queue then. Even in your sample usage you asserted that the item inserted at a given priority (or birth year, whatever) first came out first.

@eiriktsarpalis
Copy link
Member Author

I don't think you can call it a queue then.

Does the name 'queue' necessarily imply FIFO? There is precedent in the priority queues defined in the Java and C++ standard libraries which are also not FIFO.

pgolebiowski pushed a commit to pgolebiowski/dotnet-runtime-fork that referenced this issue Dec 12, 2020
This commit adds a new data structure, priority queue.

Fixes dotnet#43957
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 12, 2020
@pgolebiowski
Copy link
Contributor

pgolebiowski commented Dec 12, 2020

Hi @eiriktsarpalis! I've created the scaffolding for product code and tests. It would be really great if you could have a look and sanity check this scaffolding before I fill it with implementation and test cases 😊

@jeffhandley
Copy link
Member

@pgolebiowski FYI - I'm going to add @eiriktsarpalis and myself to the assignees list so that we can see this on our team's project board.

@jeffhandley jeffhandley added the Priority:3 Work that is nice to have label Jan 14, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 23, 2021
@eiriktsarpalis
Copy link
Member Author

Reopening to track outstanding documentation work.

@sharpninja
Copy link

I don't understand why you would not preserve order of insertion for items at the same queue level. This basically eliminate use cases where you could dispatch events based on priority from this structure.

@eiriktsarpalis
Copy link
Member Author

@sharpninja this is a property common with most heap implementations. You could still guarantee stability using the technique detailed here.

@Gakk
Copy link

Gakk commented Mar 22, 2021

My point is that the term 'queue' doesn't seem to carry strong connotations of FIFO-like behaviour. For example, the javadoc for queues contains a rather loose interpretation:

Queues typically, but do not necessarily, order elements in a FIFO (first-in-first-out) manner. Among the exceptions are priority queues, which order elements according to a supplied comparator, or the elements' natural ordering, and LIFO queues (or stacks) which order the elements LIFO (last-in-first-out).

Bottom line, I don't think we would be able to guarantee this without some performance hit. At the same time, it is fairly trivial for users to obtain stability if they need it. [@eiriktsarpalis]

The way I read the javadoc you quoted, even Java believes queue usually is FIFO - with the exception of priority queue as they can pick a priority element in the middle of the queue. The other exception is a LIFO, and that also has an stable order.

I believe naming this class PriorityQueue is wrong as long as it is not a stable queue for elements with equal priority. Even if the name queue does not have to reflect a stable queue, it is bad naming since a lot of developers will misinterpret and believe it to be a FIFO for elements of equal priority.

The first blogpost I read of this clearly believes this is a stable queue, as it as example uses a line in the bank, and that some customers have priority. It is expected to handle other customers based on order (FIFO/LIFO) within each priority.

Please either make PriorityQueue stable, or consider renaming to PriorityBag, as this already is an established term in .NET.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Mar 22, 2021

The way I read the javadoc you quoted, even Java believes queue usually is FIFO - with the exception of priority queue as they can pick a priority element in the middle of the queue. The other exception is a LIFO, and that also has an stable order.

Emphasis on usually. The example is there to illustrate the fact that the terms 'queue' and 'FIFO', while synonyms, are not entirely identical.

The first blogpost I read of this clearly believes this is a stable queue, as it as example uses a line in the bank, and that some customers have priority. It is expected to handle other customers based on order (FIFO/LIFO) within each priority.

Please either make PriorityQueue stable, or consider renaming to PriorityBag, as this already is an established term in .NET.

Heap-backed priority queues are not stable (which is not to say it is impossible to build a priority queue that is stable). This is a fairly well-understood property of most standard priority queue implementations, although it still tends to surprise people:

It is fairly evident that established usage of the term 'priority queue' does not imply stability. A quick search on the term "stable priority queue" should further highlight this fact.

@Gakk
Copy link

Gakk commented Mar 22, 2021

Thanks for the well documented answer 👍

This is a fairly well-understood property of most standard priority queue implementations, although it still tends to surprise people.

I would argue that the other languages then have failed with naming. I understand that when C#'s implementation of PriorityQueue is equivalent to the others, a similar naming should apply, and therefore i withdraw my earlier proposal for PriorityBag.

Another thought is that the other implementations might be older, from a time where most developers knew the difference between heap and stack. Your references to StackOverflow shows that it is a common misunderstanding, and now on introduction is the perfect time to try to avoid this also for .NET.

I hope at least that the documentation is very clear about this, maybe add in the first line of the description (summary that shows as tooltip in a lot of editors) that the constructor creates an unstable priority queue. This might tease the curiosity enough that they read what is the difference between stable and unstable queue. A name of UnstablePriorityQueue would also trigger this, and should avoid most misunderstandings 😇

@Enderlook
Copy link

Enderlook commented Mar 22, 2021

Don't want to be picky, but a name as UnstablePriorityQueue for a class may cause misunderstanding between beginners.

The word "Unstable" looks as if the priority queue were beta and may still have bugs, or as if the feature is unsafe, like methods that starts with Dangerous[...] or Unsafe[...].

@ericsampson
Copy link

Naming always tends to be bikesheddy, but it is interesting to consider "is it more important to be consistent with other languages, or internally within .NET?" because they each have pros and cons.
I also suggested PriorityBag a few months ago on the original feature discussion, because AKAIC it really is natural within the existing .NET data structure class naming. So that probably shows my bias :)

But everyone has a different bias/viewpoint, and at the end of the day what's far more important is that the functionality exists :)
And hopefully that a followup UpdateablePriorityQueue is realistic :)

Cheers

@eiriktsarpalis
Copy link
Member Author

I also suggested PriorityBag a few months ago on the original feature discussion, because AKAIC it really is natural within the existing .NET data structure class naming.

"Bag" implies lack of order, from the ConcurrentBag docs:

Represents a thread-safe, unordered collection of objects.

This is nothing like heaps, where elements are ordered up to priority.

@ericsampson
Copy link

Duh me 🤦‍♂️. I might have been thinking of OrderedBag, but that's from a NuGet package and not built in.

I don't want to waste any more time on naming discussions, I was just musing on that same discussion I've seen other cases, matching .NET conventions vs industry recognizable names.

Back to actual useful discussion :)

@theodorzoulias
Copy link
Contributor

And hopefully that a followup UpdateablePriorityQueue is realistic :)

I wonder whether allowing the PriorityQueue to be updateable through a boolean property like SupportsUpdating, in the spirit of the BackgroundWorker.WorkerSupportsCancellation, would be a better option than exposing two priority queue types.

@FiniteReality
Copy link

And hopefully that a followup UpdateablePriorityQueue is realistic :)

I wonder whether allowing the PriorityQueue to be updateable through a boolean property like SupportsUpdating, in the spirit of the BackgroundWorker.WorkerSupportsCancellation, would be a better option than exposing two priority queue types.

I don't think it would be - I think the most likely result for that would be a split implementation (akin to Channels) internally, or far more complicated to support both situations.

@eiriktsarpalis
Copy link
Member Author

And hopefully that a followup UpdateablePriorityQueue is realistic :)

Please indicate your support for the feature by contributing to the conversation in #44871. The popularity of certain issues influences our priorities (much how the popularity of #14032 convinced us to build this PriorityQueue). There's also open design questions that we need your input in order to resolve.

@ghost ghost locked as resolved and limited conversation to collaborators May 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections Bottom Up Work Not part of a theme, epic, or user story Cost:S Work that requires one engineer up to 1 week Priority:3 Work that is nice to have Team:Libraries User Story A single user-facing feature. Can be grouped under an epic.
Projects
None yet
Development

No branches or pull requests