Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make query sharding deterministic #707

Merged
merged 5 commits into from
Jan 12, 2022
Merged

Conversation

colega
Copy link
Contributor

@colega colega commented Jan 7, 2022

What this PR does:

Query sharding was executing queries concurrently and appending their results without any specific order. Unfortunately, basic mathematical operations on floats are not conmutative.

Given float numbers a = 0.03298, b = 0.09894, the sum a+a+b differs from a+b+a.

We can't fix float arithmetics, but at least we can make the result deterministic, so weird query results will be easier to debug.

Which issue(s) this PR fixes:

None

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Query sharding was executing queries concurrently and appending their
results without any specific order. Unfortunately, basic mathematical
operations on floats are not conmutative. Given float numbers
a = 0.03298, b = 0.09894, the sum a+a+b differs from a+b+a.

We can't fix float arithmetics, but at least we can make the result
deterministic, so weird query results will be easier to debug.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@@ -156,6 +149,14 @@ func (q *shardedQuerier) Close() error {
return nil
}

func createJobIndexes(l int) []interface{} {
Copy link
Contributor

@replay replay Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in dskit there's already concurrency.CreateJobsFromStrings which is almost exactly the same as this, but with strings. Would it maybe make sense to also add concurrency.CreateJobsFromInts there as well? (until we finally get generics :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but since generics are behind the corner, I preferred to keep this here until I see at least one more usage for it.

Also, IMO it would be just easier to make concurrency.ForEachJobID like:

// ForEachJobID runs the provided jobFunc for each job ID in `[0, jobs)`.
// The execution breaks on first error encountered.
func ForEachJobID(ctx context.Context, jobs int, concurrency int, jobFunc func(ctx context.Context, job int) error) error

And then just doing input[job] in the function instead of having to type-assert the interface or play with generics:

concurrency.ForEachJobID(ctx, len(input), someConcurrency, func(ctx context.Context idx int) error {
    return process(input[idx])
})

I just checked and it seems that it would fit 100% of the concurrency.ForEach usages removing all the type assertions.

If we like that I can open a PR in dskit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and it seems that it would fit 100% of the concurrency.ForEach usages removing all the type assertions.

If this is true, then I think this solution would be clearly better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets see what people think: grafana/dskit#113

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That got merged. I'll update mimir once this PR is merged.

Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I only added comments of minor importance

colega and others added 3 commits January 10, 2022 10:29
Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pracucci pracucci merged commit 739c7a4 into main Jan 12, 2022
@pracucci pracucci deleted the make-querysharding-deterministic branch January 12, 2022 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants