Upgraded to t-digest 3.3. #3634

dblock · 2022-06-20T15:30:46Z

Description

The upgrade to t-digest 3.3 fixes a number of bugs in calculating percentiles.

Looking at sample output, the old version 3.2 was interpolating data (see #3634 (comment) for an explanation) and producing different (wrong) results, especially in the small sample size. For example, given input of [1, 51, 101, 151] the 25th, 50th and 75th percentiles changed from [26,76,126] to [51,101,151] with this upgrade.

The tests in this PR have been adjusted to reflect the new expected percentiles, and I added both a 2.x mixed cluster test, and made all the other tests select a 3.x node to preserve a trail of this change. I also corrected the assumption that the number of centroids is <= that the number of data points, not =.

Because results change significantly I think this is a 3.x change and should not be back-ported, but open to other considerations.

There are many changes between t-digest 3.2 and 3.3, see tdunning/t-digest#194.

Issues Resolved

Closes #1756.

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

opensearch-ci-bot · 2022-06-20T15:36:55Z

❌ Gradle Check failure 2a5132460f22fcf4bc4a0f6f1e267467ffb5d9c4
Log 6149

Reports 6149

Signed-off-by: dblock <dblock@dblock.org>

opensearch-ci-bot · 2022-06-20T15:58:37Z

❌ Gradle Check failure 89125fc
Log 6150

Reports 6150

dblock · 2022-06-20T16:22:57Z

Ok, doesn't look so simple...

org.opensearch.search.aggregations.metrics.InternalTDigestPercentilesRanksTests > testEqualsAndHashcode FAILED
    java.lang.AssertionError: expected:<64> but was:<81>
        at __randomizedtesting.SeedInfo.seed([DF508FEF9F0E8839:AE5FF72250E9C116]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.search.aggregations.metrics.InternalTDigestPercentilesRanksTests.createTestInstance(InternalTDigestPercentilesRanksTests.java:56)
        at org.opensearch.search.aggregations.metrics.InternalTDigestPercentilesRanksTests.createTestInstance(InternalTDigestPercentilesRanksTests.java:42)

tdunning/t-digest#171 (comment)

Signed-off-by: dblock <dblock@dblock.org>

opensearch-ci-bot · 2022-06-20T16:50:53Z

❌ Gradle Check failure 3c20434
Log 6152

Reports 6152

dblock · 2022-06-20T17:52:51Z

java.lang.AssertionError: aggregations.percentiles_int.values.25\.0 didn't match expected value:
        aggregations.percentiles_int.values.25\.0: expected Double [26.0] but was Double [51.0]
            at org.opensearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:115)
            at org.opensearch.test.rest.yaml.section.Assertion.execute(Assertion.java:89)
            at org.opensearch.test.rest.yaml.OpenSearchClientYamlSuiteTestCase.executeSection(OpenSearchClientYamlSuiteTestCase.java:447)

Signed-off-by: dblock <dblock@dblock.org>

dblock · 2022-06-21T17:55:32Z

@tdunning Would you mind assisting here a bit please?

The upgrade from 3.2 to 3.3 produces different percentiles in some scenarios given very simple data. Neither the old data nor the new data is "correct", but that is expected I imagine given that we use t-digest. You can see raw data in https://github.com/opensearch-project/OpenSearch/blob/37651e9b5fe914a99f0abe0a36e10bd46d958691/rest-api-spec/src/main/resources/rest-api-spec/test/search.aggregation/180_percentiles_tdigest_metric.yml and the diff in this PR for how those changed. The data is just 4 values: 1, 51, 101 and 151, Google Sheets results below.

Data	Percentile	Result
1	1	2.5
51	5	8.5
101	25	38.5
151	50	76
	75	113.5
	95	143.5
	99	149.5
	100	151

I didn't expect such a big difference in a dot release. At the very least I'd like to understand whether this is expected, and whether this is going to have to be released as a major breaking change for OpenSearch users.

More different results with smaller tests: bb9e8f2.

opensearch-ci-bot · 2022-06-21T18:17:10Z

❌ Gradle Check failure 37651e9
Log 6185

Reports 6185

Signed-off-by: dblock <dblock@dblock.org>

opensearch-ci-bot · 2022-06-21T19:40:54Z

❌ Gradle Check failure 6b03882
Log 6192

Reports 6192

Signed-off-by: dblock <dblock@dblock.org>

opensearch-ci-bot · 2022-06-21T20:20:45Z

❌ Gradle Check failure bb9e8f2
Log 6193

Reports 6193

Signed-off-by: dblock <dblock@dblock.org>

opensearch-ci-bot · 2022-06-21T22:35:32Z

❌ Gradle Check failure 78e4c08
Log 6195

Reports 6195

tdunning · 2022-06-21T22:51:30Z

I am not entirely clear about how to read the difference here. I think that what you are saying is that given samples [1, 51, 101, 151] the 25th, 50th and 75th percentiles changed from [26,76,126] to [51,101,151]. Is that correct? If so, this was a bug fix involved in cleaning up the behavior of the system in small count cases. The problem is that with just four data points, we don't need to summarize the data at all. As such, any quantiles that involve interpolation between observed values are simply wrong.

…

On Tue, Jun 21, 2022 at 10:55 AM Daniel (dB.) Doubrovkine < ***@***.***> wrote: @tdunning <https://github.com/tdunning> Would you mind assisting here a bit please? The upgrade from 3.2 to 3.3 produces different percentiles in some scenarios given very simple data. You can see raw data in https://github.com/opensearch-project/OpenSearch/blob/37651e9b5fe914a99f0abe0a36e10bd46d958691/rest-api-spec/src/main/resources/rest-api-spec/test/search.aggregation/180_percentiles_tdigest_metric.yml and the diff in this PR for how those changed. The data is just 4 values: 1, 51, 101 and 151. I didn't expect such a big difference in a dot release. At the very least I'd like to understand whether this is expected, and whether this is going to have to be released as a major breaking change for OpenSearch users. — Reply to this email directly, view it on GitHub <#3634 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAB5E6XFO6CHQPJNE2LVYX3VQH62BANCNFSM5ZJKGMWQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tdunning · 2022-06-22T05:42:42Z

Sorry... I missed your very fine explanation.

I understand now that you were doing a regression test against previous behavior and were surprised at a change in this behavior.

The fact is, however, this old behavior was a bug. That bug was fixed.

If we look at the quantile curve for your data, we see this:

The circles indicate an open boundary and the filled dots indicate a closed one. Because we retain all of the data, we can't in good faith interpolate. The only question is whether the quantile at exactly 0.25 should be 1 or 51. In t-digest, I settled on the lower value. The old code was interpolating and was just wrong.

The upgrade from 3.2 to 3.3 produces different percentiles in some scenarios given very simple data. Neither the old data nor the new data is "correct", but that is expected I imagine given that we use t-digest. You can see raw data in https://github.com/opensearch-project/OpenSearch/blob/37651e9b5fe914a99f0abe0a36e10bd46d958691/rest-api-spec/src/main/resources/rest-api-spec/test/search.aggregation/180_percentiles_tdigest_metric.yml and the diff in this PR for how those changed. The data is just 4 values: 1, 51, 101 and 151.

Data Percentile Result
1 1 2.5
51 5 8.5
101 25 38.5
151 50 76
75 113.5
95 143.5
99 149.5
100 151
I didn't expect such a big difference in a dot release. At the very least I'd like to understand whether this is expected, and whether this is going to have to be released as a major breaking change for OpenSearch users.

tdunning · 2022-06-22T05:48:05Z

In case you are curious, a similar issue arises with the cdf function. There, the graph for you data looks like this:

Here, what I have chosen is to use the mid point when you ask for the CDF at exactly a sample point. This gets a bit fancier when there are multiple samples at just the same point. In general, I take the CDF to be the

dblock · 2022-06-22T15:18:05Z

@tdunning Thank you! This is super clear.

opensearch-ci-bot · 2022-06-22T16:01:02Z

❌ Gradle Check failure 633595636b789bec83313b4ee346d3d114e6c3f5
Log 6219

Reports 6219

Signed-off-by: dblock <dblock@dblock.org>

opensearch-ci-bot · 2022-06-22T18:28:03Z

✅ Gradle Check success 83be47b
Log 6227

Reports 6227

dblock · 2022-06-28T15:48:23Z

@kartg I do care about users more than merge conflicts, but I hear you. Any feelings about user impact?

kavilla · 2022-06-29T21:30:48Z

One of the functional tests from OpenSearch Dashboards displayed the incorrect value [link it issue]. We will update the value on main but keep the 2.x branches untouched.

Origin: opensearch-project/OpenSearch#3634 The previous value was actually incorrect after OpenSearch bumped t-digest the value is now the correct value. Issue: opensearch-project#1821 Signed-off-by: Kawika Avilla <kavilla414@gmail.com>

sharp-pixel · 2022-06-29T21:50:06Z

I settled on the lower value.

Then shouldn't the result be [1, 51, 101]?

[1, 51, 101] is the result I get from Mathematica as well:

vec = {1, 51, 101, 151};
Quantile[vec, #] & /@ {1/4, 1/2, 3/4}

{1, 51, 101}

dblock · 2022-06-29T21:52:58Z

@tdunning ^

Origin: opensearch-project/OpenSearch#3634 The previous value was actually incorrect after OpenSearch bumped t-digest the value is now the correct value. Issue: opensearch-project#1821 Signed-off-by: Kawika Avilla <kavilla414@gmail.com>

* [Tests] update expected value for percentile ranks Origin: opensearch-project/OpenSearch#3634 The previous value was actually incorrect after OpenSearch bumped t-digest the value is now the correct value. Issue: #1821 Signed-off-by: Kawika Avilla <kavilla414@gmail.com> * skip inconsistent values Signed-off-by: Kawika Avilla <kavilla414@gmail.com> * use slice Signed-off-by: Kawika Avilla <kavilla414@gmail.com>

kavilla · 2022-07-05T06:24:09Z

So when updated the values for this test it seemed to get inconsistent values for the 50th, 75th, and 95th percentile for example: https://github.com/opensearch-project/OpenSearch-Dashboards/runs/7120932868?check_suite_focus=true

and

opensearch-project/OpenSearch-Dashboards#1822 (comment)

dblock · 2022-07-05T16:47:14Z

@kavilla Are you sure? I felt like I was getting something similar, but turned out the tests were seeded with some random value.

In any case if you are sure, open a new issue?

tdunning · 2022-07-05T20:16:59Z

I am happy to comment on the t-digest side of things if somebody can say what the test is actually doing.

dblock · 2022-07-06T18:04:20Z

I settled on the lower value.

Then shouldn't the result be [1, 51, 101]?

[1, 51, 101] is the result I get from Mathematica as well:
vec = {1, 51, 101, 151};
Quantile[vec, #] & /@ {1/4, 1/2, 3/4}
{1, 51, 101}

@tdunning Could you check out the above, please?

dblock · 2022-11-01T16:21:35Z

@kavilla Want to open an issue in t-digest re: ^ ?

tdunning · 2022-11-01T18:48:29Z

The problem here is that the inverse CDF (aka quantile) is not a function. For the example you give with observations at [1, 51, 101, 151], the CDF is well behaved and looks like this: ![image](https://user-images.githubusercontent.com/250490/199615932-8508dffb-6df0-462e-9b2a-3843cfdd804c.png) This does have discontinuities at each observed value, of course, and we could adopt a variety of conventions to define the value of the CDF exactly at each sampled value. One convention defines the CDF as segments that are open on the left, but closed on the right. This would make the value of CDF(1) be 0. The opposite convention has all segments closed on the left and open on the right so CDF(1) would be 0.25. These conventions are asymmetric, however, which can lead to surprising results if, for instance, you negate all sampled values. I would prefer that CDF(samples, x) == CDF(-samples, -x) so t-digest uses the convention that the CDF is midway between these alternative conventions so that CDF(1) = 0.125 instead of either 0 or 0.25. Resolving the value at these discontinuities does not solve the problem you are talking about. As you can see, the value of the CDF is constant for all values in the open interval (51, 101). This means that there is no unique value for the inverse CDF at 0.5. Any value in (51, 101) could be used. Mathematic uses 51. Julia and R use 76. T-digest uses 101.

…

On Wed, Jul 6, 2022 at 11:04 AM Daniel (dB.) Doubrovkine < ***@***.***> wrote: I settled on the lower value. Then shouldn't the result be [1, 51, 101]? [1, 51, 101] is the result I get from Mathematica as well: vec = {1, 51, 101, 151}; Quantile[vec, #] & /@ {1/4, 1/2, 3/4} {1, 51, 101} @tdunning <https://github.com/tdunning> Could you check out the above, please? — Reply to this email directly, view it on GitHub <#3634 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAB5E6XAUISG2R7HV62BBRDVSXDC7ANCNFSM5ZJKGMWQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

dblock · 2022-11-02T19:06:42Z

@tdunning The images didn't make it to GitHub, if you care to edit, but thanks for your explanation!

sharp-pixel · 2022-11-02T20:55:54Z

From https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/quantile and https://mathworld.wolfram.com/Quantile.html, we could choose between 9 standardized definitions.

To make sure we are comparing the same things - especially in unit tests - we probably should decide on a default one, and optionally enable the choice of the others types.

Mathematica uses type 1 by default, R uses type 7 by default, but both provide the option to choose.

tdunning · 2022-11-02T22:40:51Z

@tdunning The images didn't make it to GitHub, if you care to edit, but thanks for your explanation!

Sorry about that. The image is very similar to what I posted in an earlier comment. I have edited my reply but in editing the response, I had difficulty getting the image to show up.

tdunning · 2022-11-03T01:02:08Z

So I have been experimenting a fair bit with the Julia implementation (easier than playing with the Java version because interactive).

I have changed the problem in question a tiny bit to make it more clear what is happening. I am using points at [1,11,15,30].

My first experiment was to verify that the cdf function performs as expected. It does. In particular, the cdf at exactly the sample points shows the desired interpolation behavior. Here is a picture produced by scanning x in small increments to get the blue line and then plotting points at exactly the sample points:

The point of real interest, however, was to determine how the quantile function behaves. The following plot shows that the quantile (wider blue line) and cdf (thin gray line) functions lay right on top of each other. Further, evaluating quantile just before and just after q=[0.25,0.5,0.75] (green and purple dots) we see the desired boundary behavior.

I would contend that it is hard to do better than this due to inevitable floating point limits.

@dblock , @kavilla , @sharp-pixel what do you think?

Also, I looked into the R and Julia implementations of the quantile function. In fact, they are trying to estimate the theoretical distribution rather than the empirical inverse cdf. This is a different problem entirely. Adding the Julia quantile to the graph shows what I mean

The result is far from the empirical inverse cdf function.

sharp-pixel · 2022-11-03T07:59:28Z

Thanks @tdunning.

It seems Julia uses type 7 (from https://docs.julialang.org/en/v1/stdlib/Statistics/#Statistics.quantile!), so we just need to pick one quantile function type as the default, document it, and optionally have the choice to override the type.

tdunning · 2022-11-03T15:21:45Z

I am not so sure of that.

The different types of quantile estimation are all geared toward estimating a population quantile function assuming that the data we have is only a sample of that population.

That's an important problem.

But it isn't what t-digest is intended to do.

Instead, t-digest is intended to estimate the cdf and inverse cdf of the data we are given as it actually is. This refers to the empirical distribution as opposed to the population CDF. This is much simpler in many ways than trying to estimate the population, but it can be confusing because of the collision on the name "quantile".

There is clearly a problem here (user confusion is indisputably a problem), but I really think that the correct action here is to fix documentation on both TDigest.quantile and in the overview.

dblock · 2022-11-07T19:12:41Z

I opened #5115 to discuss the user-facing aspects of this. Because there's no one correct result at the edges, I think we could support multiple strategies to everybody's satisfaction.

dblock requested review from a team and reta as code owners June 20, 2022 15:30

Upgraded to t-digest 3.3.

89125fc

Signed-off-by: dblock <dblock@dblock.org>

dblock force-pushed the t-digest-3.3 branch from 2a51324 to 89125fc Compare June 20, 2022 15:37

Remove centroid count assertion.

3c20434

Signed-off-by: dblock <dblock@dblock.org>

dblock mentioned this pull request Jun 20, 2022

Help users of t-digest upgrade tdunning/t-digest#171

Open

dblock added 2 commits June 21, 2022 15:17

The number of centroids is defined as <= the number of samples inserted.

564aa43

Signed-off-by: dblock <dblock@dblock.org>

Updated percentiles generated by t-digest 3.3.

37651e9

Signed-off-by: dblock <dblock@dblock.org>

dblock marked this pull request as draft June 21, 2022 17:50

The number of centroids is defined as <= the number of samples inserted.

6b03882

Signed-off-by: dblock <dblock@dblock.org>

Fixed more unit tests.

bb9e8f2

Signed-off-by: dblock <dblock@dblock.org>

Experimenting with flipping these back?

78e4c08

Signed-off-by: dblock <dblock@dblock.org>

Select a 2.x vs. a 3.x node with predictable results.

83be47b

Signed-off-by: dblock <dblock@dblock.org>

dblock force-pushed the t-digest-3.3 branch from 6335956 to 83be47b Compare June 22, 2022 17:57

saratvemulapalli pushed a commit that referenced this pull request Jun 27, 2022

Upgraded to t-digest 3.3. (#3634)

d5683ce

dblock mentioned this pull request Jun 29, 2022

[BUG][Tests] should show Percentile Ranks test failure opensearch-project/OpenSearch-Dashboards#1821

Closed

kavilla mentioned this pull request Jun 29, 2022

[Tests] update expected value for percentile ranks opensearch-project/OpenSearch-Dashboards#1822

Merged

7 tasks

imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jul 3, 2022

Upgraded to t-digest 3.3. (opensearch-project#3634)

5844eb1

dblock mentioned this pull request Nov 7, 2022

Add support for different strategies for doing percentile aggregation #5115

Open

mfussenegger mentioned this pull request Jan 8, 2024

Update dependencies crate/crate#15289

Merged

risdenk mentioned this pull request Mar 8, 2024

Update dependency com.tdunning:t-digest to v3.3 apache/solr#2136

Merged

1 task

LantaoJin mentioned this pull request Jun 5, 2024

Bump HdrHistogram to 2.2.2 and move the dependency version to version.properties #13986

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgraded to t-digest 3.3. #3634

Upgraded to t-digest 3.3. #3634

dblock commented Jun 20, 2022 •

edited

Loading

opensearch-ci-bot commented Jun 20, 2022

opensearch-ci-bot commented Jun 20, 2022

dblock commented Jun 20, 2022 •

edited

Loading

opensearch-ci-bot commented Jun 20, 2022

dblock commented Jun 20, 2022

dblock commented Jun 21, 2022 •

edited

Loading

opensearch-ci-bot commented Jun 21, 2022

opensearch-ci-bot commented Jun 21, 2022

opensearch-ci-bot commented Jun 21, 2022

opensearch-ci-bot commented Jun 21, 2022

tdunning commented Jun 21, 2022 via email

tdunning commented Jun 22, 2022

tdunning commented Jun 22, 2022

dblock commented Jun 22, 2022

opensearch-ci-bot commented Jun 22, 2022

opensearch-ci-bot commented Jun 22, 2022

dblock commented Jun 28, 2022

kavilla commented Jun 29, 2022

sharp-pixel commented Jun 29, 2022

dblock commented Jun 29, 2022

kavilla commented Jul 5, 2022

dblock commented Jul 5, 2022

tdunning commented Jul 5, 2022

dblock commented Jul 6, 2022

dblock commented Nov 1, 2022

tdunning commented Nov 1, 2022 via email •

edited

Loading

dblock commented Nov 2, 2022

sharp-pixel commented Nov 2, 2022

tdunning commented Nov 2, 2022

tdunning commented Nov 3, 2022

sharp-pixel commented Nov 3, 2022 •

edited

Loading

tdunning commented Nov 3, 2022

dblock commented Nov 7, 2022

Upgraded to t-digest 3.3. #3634

Upgraded to t-digest 3.3. #3634

Conversation

dblock commented Jun 20, 2022 • edited Loading

Description

Issues Resolved

Check List

opensearch-ci-bot commented Jun 20, 2022

opensearch-ci-bot commented Jun 20, 2022

dblock commented Jun 20, 2022 • edited Loading

opensearch-ci-bot commented Jun 20, 2022

dblock commented Jun 20, 2022

dblock commented Jun 21, 2022 • edited Loading

opensearch-ci-bot commented Jun 21, 2022

opensearch-ci-bot commented Jun 21, 2022

opensearch-ci-bot commented Jun 21, 2022

opensearch-ci-bot commented Jun 21, 2022

tdunning commented Jun 21, 2022 via email

tdunning commented Jun 22, 2022

tdunning commented Jun 22, 2022

dblock commented Jun 22, 2022

opensearch-ci-bot commented Jun 22, 2022

opensearch-ci-bot commented Jun 22, 2022

dblock commented Jun 28, 2022

kavilla commented Jun 29, 2022

sharp-pixel commented Jun 29, 2022

dblock commented Jun 29, 2022

kavilla commented Jul 5, 2022

dblock commented Jul 5, 2022

tdunning commented Jul 5, 2022

dblock commented Jul 6, 2022

dblock commented Nov 1, 2022

tdunning commented Nov 1, 2022 via email • edited Loading

dblock commented Nov 2, 2022

sharp-pixel commented Nov 2, 2022

tdunning commented Nov 2, 2022

tdunning commented Nov 3, 2022

sharp-pixel commented Nov 3, 2022 • edited Loading

tdunning commented Nov 3, 2022

dblock commented Nov 7, 2022

dblock commented Jun 20, 2022 •

edited

Loading

dblock commented Jun 20, 2022 •

edited

Loading

dblock commented Jun 21, 2022 •

edited

Loading

tdunning commented Nov 1, 2022 via email •

edited

Loading

sharp-pixel commented Nov 3, 2022 •

edited

Loading