feat: form committees and find majority results #304

bajtos · 2024-08-01T13:52:49Z

Group accepted measurements to committees on a per-retrieval-task basis. Evaluate each committee to find an absolute majority result. Reject results that are in minority.

When the committee is too small to give us confidence in a majority being honest, or if we cannot find an absolute majority, then reject all measurements in such committee.

Links:

TODO:

cross-check more fields
- CAR size in bytes
- CAR checksum
add telemetry to monitor committees too small or committees with no absolute majority

OUT OF SCOPE

Change the way how we calculate retrieval success rates from per measurement to per deal.

NOTE
We are not cross-checking the fields that were used to calculate the retrievalResult value:

Response status code
Was it a timeout?
Was the CAR too large?

If there is an agreement on the retrieval result, then those fields must have the same value, too.

bajtos · 2024-08-01T13:56:50Z

@juliangruber I would like to get early feedback on the implementation.

I implemented cross-checking/majority-finding for the first few fields (providerId, indexerResult, retrievalResult). Checking more fields is trivial, we just need to add more calls to this.#checkMeasuredField(). From this point of view, the current version is pretty complete.

I will need to fix the failing tests.

The linked GH issue mentions that we should also change how we calculate RSR:

As part of this work, we need to change the way how we calculate retrieval success rates.

At the moment, we measure % of measurements that report a successful retrieval

After the change, we should measure % of committees (tasks/deals) where the majority reports a successful retrieval

That's out of the scope of this pull request.

I would also like to add new telemetry - how many committees were too small or the absolute majority was not found. I'd like to do that as part of this pull request, so that we get visibility into what's going on after we deploy this change.

lib/committee.js

lib/evaluate.js

Group accepted measurements to committees on a per-retrieval-task basis. Evaluate each committee to find an absolute majority result. Reject results that are in minority. When the committee is too small to give us confidence in a majority being honest, or if we cannot find an absolute majority, then reject all measurements in such committee. Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos · 2024-08-08T13:26:40Z

I compared the output of node bin/dry-run.js 12649.

Only 1.3% of measurements have a minority result
Only 0.1% of committees are too small (less than 30 measurements)

-EVALUATE ROUND 12649n: built per-node task lists in 3804ms [Tasks=1000;TN=15;Nodes=55235]
-EVALUATE ROUND 12649n: added 2980n as rounding to MAX_SCORE
-EVALUATE ROUND 12649n: Evaluated 333523 measurements, found 93814 honest entries.
+EVALUATE ROUND 12649n: built per-node task lists in 3958ms [Tasks=1000;TN=15;Nodes=55235]
+EVALUATE ROUND 12649n: added 3871n as rounding to MAX_SCORE
+EVALUATE ROUND 12649n: Evaluated 333523 measurements, found 92574 honest entries.
 {
-  OK: 93814,
+  OK: 92574,
   TASK_NOT_IN_ROUND: 37557,
   DUP_INET_GROUP: 34102,
   TOO_MANY_TASKS: 81874,
-  TASK_WRONG_NODE: 86176
+  TASK_WRONG_NODE: 86176,
+  MINORITY_RESULT: 1213,
+  COMMITTEE_TOO_SMALL: 27
 }

The overall RSR went slightly up from 5.29% to 5.31%.

- success_rate: '0.052934530027501224',
+ success_rate: '0.053114265344481174',

The number of IPNI 404 errors decreased slightly from 61.08% to 60.09%.

-  result_rate_IPNI_ERROR_404: '0.6107830387788603',
+  result_rate_IPNI_ERROR_404: '0.6089398751269255',

The CPU & memory used by the evaluation did not change much.

-Duration: 10114ms
+Duration: 10630ms
 {
-  rss: 450183168,
+  rss: 451018752,

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos · 2024-08-09T14:12:25Z

@juliangruber the PR is ready for another round of review.

cross-check more fields

This should be trivial to add - update the array of fields checked + write more tests.

juliangruber · 2024-08-26T10:24:08Z

lib/committee.js

+    this.indexerResult = 'MAJORITY_NOT_FOUND'
+
+    /** @type {RetrievalResult} */
+    this.retrievalResult = 'MAJORITY_NOT_FOUND'


Do you think there can be a case where Committee is initialized, then there's an error during evaluate(), then it's state is read and it's falsely interpreted as 'MAJORITY_NOT_FOUND', while it should rather be something like 'UNKNOWN_ERROR'?

Yeah, you are right that this design (a mutable Committee class) allows that to happen in theory. I think it should not happen in practice right now because if there is an error, then an exception is thrown and the entire evaluation is aborted.

I propose to move these two "*Result" properties into a nested object stored in Commitee.evaluation property, and let this property be undefined until the evaluation completes.

See 029db0d

A possible next refactoring is to remove the Committee class entirely and modify Committee.evaluate into a pure function accepting ({requiredCommitteeSize, retrievalTask, measurements}) and returning CommitteeEvaluation.

Such change would push the complexity to runFraudDetection, which would need to create the list of committees - an array of {retrievalTask, measurements, evaluation} objects. This list of committees is later used to extract stats for InfluxDB and spark-stats API.

In that light, such refactoring does not seem to be an improvement.

Let me know what you think!

+1 to having the result be undefined or something else, which we can differentiate from the previous default state

And I'm not sure whether removing the class does help, let's see how it goes?

lib/committee.js

lib/evaluate.js

Co-authored-by: Julian Gruber <julian@juliangruber.com>

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

…ements Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos · 2024-08-27T13:08:48Z

Comparing the output of node bin/dry-run.js 13944 - main vs b8916a9:

Only 0.26% (573/215182) of measurements have a minority result
All committees were large enough to give us the confidence of the majority being honest

-EVALUATE ROUND 13944n: built per-node task lists in 845ms [Tasks=1000;TN=15;Nodes=12334]
-EVALUATE ROUND 13944n: added 3920n as rounding to MAX_SCORE
-EVALUATE ROUND 13944n: Evaluated 215182 measurements, found 98111 honest entries.
+EVALUATE ROUND 13944n: built per-node task lists in 865ms [Tasks=1000;TN=15;Nodes=12334]
+EVALUATE ROUND 13944n: added 3931n as rounding to MAX_SCORE
+EVALUATE ROUND 13944n: Evaluated 215182 measurements, found 97335 honest entries.
 {
-  OK: 98111,
+  OK: 97335,
   TASK_NOT_IN_ROUND: 16191,
   DUP_INET_GROUP: 24871,
   TOO_MANY_TASKS: 75847,
-  TASK_WRONG_NODE: 162
+  TASK_WRONG_NODE: 162,
+  MINORITY_RESULT: 573,
+  MAJORITY_NOT_FOUND: 203
 }

The overall RSR went slightly down from 8.72% to 8.65%.

-  success_rate: '0.0871869617066384',
+  success_rate: '0.08654646324549237',

The number of IPNI 404 errors increased slightly from 59.87% to 60.35%.

- result_rate_IPNI_ERROR_404: '0.5986892397386634',
+ result_rate_IPNI_ERROR_404: '0.6034622694816869',

The CPU & memory used by the evaluation did not change much.

-Duration: 5148ms
+Duration: 5551ms
 {
-  rss: 370573312,
+  rss: 312492032,

bajtos · 2024-08-27T13:09:11Z

@juliangruber the PR is ready for final review & landing

juliangruber

Great work!

Fix a regression introduced by 4925298 (#304). With the recently introduced “committees & majorities”, measurements that don’t agree with the majority are rejected. As a result, the code calculating RSR will receive only majority measurements, it will see that either a) all accepted measurements say the deal is retrievable or b) all accepted measurements say the deal is not retrievable. As a result, the RSR is heavily influenced by how many measurements were collected for each deal. For example, let’s say an SP has one deal that’s retrievable and another that is not. Now consider two cases: 1. The task testing the retrievable deal produces 100 retrieval requests (accepted measurements) while the task testing the non-retrievable deal produces 50 retrieval requests (accepted measurements). The RSR is 100 / (100 + 50) = 66%. 2. The task testing the retrievable deal produces 50 retrieval requests (accepted measurements) while the task testing the non-retrievable deal produces 100 retrieval requests (accepted measurements). The RSR is 50 / (100 + 50) = 33$. This commit fixes the problem by changing the implementation of `updatePublicStats` to iterate over all measurements assigned to all committees. This way we include all measurements in the calculation: minority measurements, measurements where no majority was found, measurements belonging to committees that are too small. Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

Fix a regression introduced by 4925298 (#304). With the recently introduced “committees & majorities”, measurements that don’t agree with the majority are rejected. As a result, the code calculating RSR will receive only majority measurements, it will see that either a) all accepted measurements say the deal is retrievable or b) all accepted measurements say the deal is not retrievable. As a result, the RSR is heavily influenced by how many measurements were collected for each deal. For example, let’s say an SP has one deal that’s retrievable and another that is not. Now consider two cases: 1. The task testing the retrievable deal produces 100 retrieval requests (accepted measurements) while the task testing the non-retrievable deal produces 50 retrieval requests (accepted measurements). The RSR is 100 / (100 + 50) = 66%. 2. The task testing the retrievable deal produces 50 retrieval requests (accepted measurements) while the task testing the non-retrievable deal produces 100 retrieval requests (accepted measurements). The RSR is 50 / (100 + 50) = 33$. This commit fixes the problem by changing the implementation of `updatePublicStats` to iterate over all measurements assigned to all committees. This way we include all measurements in the calculation: minority measurements, measurements where no majority was found, measurements belonging to committees that are too small. --------- Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos requested a review from juliangruber August 1, 2024 13:52

bajtos commented Aug 1, 2024

View reviewed changes

lib/committee.js Outdated Show resolved Hide resolved

bajtos commented Aug 1, 2024

View reviewed changes

lib/committee.js Outdated Show resolved Hide resolved

juliangruber requested changes Aug 2, 2024

View reviewed changes

bajtos mentioned this pull request Aug 2, 2024

test: remove the integration test #306

Merged

bajtos force-pushed the feat-majorities branch from 6a7b2f7 to 42c73c0 Compare August 2, 2024 09:05

bajtos added 9 commits August 2, 2024 11:59

Merge branch 'main' into feat-majorities

1660c57

Update lib/committee.js

74661e8

refactor: simplify committee evaluation

fa872f8

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

refactor: committee.measurements are private now

0732097

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

Merge branch 'main' into feat-majorities

96d23a8

Merge branch 'main' into feat-majorities

504a159

refactor: make requiredCommitteeSize configurable

5018dbf

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

refactor: calculate committee stats from committees

f9cf793

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

feat: telemetry committees_too_small

3a91be9

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos requested a review from juliangruber August 9, 2024 14:09

add reminders what to change as part of this PR

16381a5

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

juliangruber approved these changes Aug 26, 2024

View reviewed changes

bajtos and others added 8 commits August 26, 2024 17:18

Update lib/committee.js

d959ecd

Co-authored-by: Julian Gruber <julian@juliangruber.com>

Update lib/committee.js

3197125

Co-authored-by: Julian Gruber <julian@juliangruber.com>

Update lib/committee.js

01cdb6d

Co-authored-by: Julian Gruber <julian@juliangruber.com>

Merge branch 'main' into feat-majorities

ac582d2

refactor: allow undefined committee evaluation

029db0d

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

fixup! document why we use async generator to iterate accepted measur…

d61cef9

…ements Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

feat: calculate indexer query stats using committees

889b112

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

feat: calculate daily deals using committees

b8916a9

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

feat: cross-check CAR size and checksum

25cde29

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos closed this Aug 27, 2024

bajtos reopened this Aug 27, 2024

bajtos marked this pull request as ready for review August 27, 2024 13:09

juliangruber approved these changes Aug 27, 2024

View reviewed changes

bajtos enabled auto-merge (squash) August 28, 2024 10:03

Merge branch 'main' into feat-majorities

eb38242

bajtos merged commit 4925298 into main Aug 28, 2024
6 checks passed

bajtos deleted the feat-majorities branch August 28, 2024 10:04

This was referenced Sep 2, 2024

fix: include minority results in RSR calculation #343

Merged

Find majorities in Spark measurements space-meridian/roadmap#59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: form committees and find majority results #304

feat: form committees and find majority results #304

bajtos commented Aug 1, 2024 •

edited

Loading

bajtos commented Aug 1, 2024

bajtos commented Aug 8, 2024

bajtos commented Aug 9, 2024

juliangruber Aug 26, 2024

bajtos Aug 27, 2024

juliangruber Aug 27, 2024

juliangruber Aug 27, 2024

bajtos commented Aug 27, 2024

bajtos commented Aug 27, 2024

juliangruber left a comment

feat: form committees and find majority results #304

feat: form committees and find majority results #304

Conversation

bajtos commented Aug 1, 2024 • edited Loading

bajtos commented Aug 1, 2024

bajtos commented Aug 8, 2024

bajtos commented Aug 9, 2024

juliangruber Aug 26, 2024

Choose a reason for hiding this comment

bajtos Aug 27, 2024

Choose a reason for hiding this comment

juliangruber Aug 27, 2024

Choose a reason for hiding this comment

juliangruber Aug 27, 2024

Choose a reason for hiding this comment

bajtos commented Aug 27, 2024

bajtos commented Aug 27, 2024

juliangruber left a comment

Choose a reason for hiding this comment

bajtos commented Aug 1, 2024 •

edited

Loading