Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass naive ordering when ordering by key (natural leveldb order) #22

Closed
wants to merge 2 commits into from

Conversation

raulk
Copy link
Member

@raulk raulk commented Jan 8, 2019

Equivalent of ipfs/go-ds-badger#43 for leveldb, but simplified, as leveldb iterators don't allow specifying reverse order.

Needed for correctness in libp2p/go-libp2p-peerstore#47

@ghost ghost assigned raulk Jan 8, 2019
@ghost ghost added the status/in-progress In progress label Jan 8, 2019
datastore.go Outdated
// leveldb iterators return entries sorted lexicographically by key, so we can bypass the naive ordering
// for OrderByKey. Since OrderByKey.Sort() has as struct receiver, the value stored in q.Orders can be a
// pointer or a struct, so we need to check for both for correctness.
if _, ok := o.(dsq.OrderByKey); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

switch o.(type) {
case dsq.OrderByKey, *dsq.OrderByKey:
  continue
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here and in ipfs/go-ds-badger#43 ;-)

switch o.(type) {
case dsq.OrderByKey, *dsq.OrderByKey:
continue
}
qr = dsq.NaiveOrder(qr, o)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need OrderByKey to override other orders (this currently just skips these). Given that keys are unique and have a total order, I believe we should:

  1. Start at the end.
  2. Work backwards until we find the last OrderByKey.
  3. Apply any remaining orders.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, in theory, optimize OrderByKeyDescending as well by taking advantage of the same total ordering. That is, stop at the last OrderByKeyDescending (if we reach one before we reach an OrderByKey). Unlike OrderByKey, we'll have to actually apply the OrderByKeyDescending order.

@raulk
Copy link
Member Author

raulk commented Jan 9, 2019

@Stebalien – I'll take a look, basically to honour the interface of Query.Orders which does state in a comment that sorts should be applied sequentially.

However, I honestly fail to find a use case where someone would want to consume an iterator fully to then sort it in memory, before returning results to the caller, unless it's for very small slices, and even then you're probably better off handling this in client code and modeling the data to avoid such needs.

Facilitating this in the public API with no warnings is a formula for horrendous performance served in a silver platter, if you ask me.

@raulk
Copy link
Member Author

raulk commented Jan 9, 2019

BTW – are you aware of any consumers that use the ordering feature at all? An off-the-cuff search on go-ipfs returned nothing.

@raulk
Copy link
Member Author

raulk commented Jan 9, 2019

Also, Order is a public interface which means that clients could supply custom Orders to queries, hence we have to think beyond OrderByKey* and OrderByValue* if we want to fully support this.

@raulk
Copy link
Member Author

raulk commented Jan 9, 2019

@Stebalien – I get that OrderByKey* should be a terminating sort criteria.

However, as I look at the ordering code and the combinatorics, the more I think it doesn't work as intended. My assumption is that multiple sorting criteria should work like in SQL, first order by X, then order by Y (which only takes effect on records with equal X).

Unless I'm interpreting wrong, a sequence of ordering criteria, applied via naive order (behaviour as of today) will end up resorting the results, instead of applying a nested sorting.

As a user, I would expect the following behaviour:

  • OrderByValue, OrderByKey => order records by value, and for records with equal value, order them by key.
  • OrderByKey, OrderByValue => orders by key and ignores ordering by value (in databases where key is unique).

(same heuristic applies for Descending variants) I'm omitting other combinations, like repeating operators and such. The current code allows that.

However, what I think happens currently is:

  • OrderByValue, OrderByKey => incurs in the cost of ordering by value, only to resort everything by key.
  • OrderByKey, OrderByValue => viceversa.

Am I reading it wrong?

P.S.: apologies for rambling.

@Stebalien
Copy link
Member

Now I see what you mean. Yeah, that's what I'd also expect but that's definitely not what this does.

Can you think of an interface you'd be happy with? At this point, we can probably make breaking changes as the interface is fundamentally broken.

@raulk
Copy link
Member Author

raulk commented Jan 24, 2019

Superseded by #23 following interface change.

@raulk raulk closed this Jan 24, 2019
@ghost ghost removed the status/in-progress In progress label Jan 24, 2019
@raulk raulk deleted the feat/native-order branch January 24, 2019 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants