Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: form committees and find majority results #304

Merged
merged 21 commits into from
Aug 28, 2024
Merged

Conversation

bajtos
Copy link
Member

@bajtos bajtos commented Aug 1, 2024

Group accepted measurements to committees on a per-retrieval-task basis. Evaluate each committee to find an absolute majority result. Reject results that are in minority.

When the committee is too small to give us confidence in a majority being honest, or if we cannot find an absolute majority, then reject all measurements in such committee.

Links:

TODO:

  • cross-check more fields
    • CAR size in bytes
    • CAR checksum
  • add telemetry to monitor committees too small or committees with no absolute majority

OUT OF SCOPE

  • Change the way how we calculate retrieval success rates from per measurement to per deal.

NOTE
We are not cross-checking the fields that were used to calculate the retrievalResult value:

  • Response status code
  • Was it a timeout?
  • Was the CAR too large?

If there is an agreement on the retrieval result, then those fields must have the same value, too.

@bajtos
Copy link
Member Author

bajtos commented Aug 1, 2024

@juliangruber I would like to get early feedback on the implementation.

I implemented cross-checking/majority-finding for the first few fields (providerId, indexerResult, retrievalResult). Checking more fields is trivial, we just need to add more calls to this.#checkMeasuredField(). From this point of view, the current version is pretty complete.

I will need to fix the failing tests.

The linked GH issue mentions that we should also change how we calculate RSR:

As part of this work, we need to change the way how we calculate retrieval success rates.

  • At the moment, we measure % of measurements that report a successful retrieval
  • After the change, we should measure % of committees (tasks/deals) where the majority reports a successful retrieval

That's out of the scope of this pull request.

I would also like to add new telemetry - how many committees were too small or the absolute majority was not found. I'd like to do that as part of this pull request, so that we get visibility into what's going on after we deploy this change.

lib/committee.js Outdated Show resolved Hide resolved
lib/committee.js Outdated Show resolved Hide resolved
lib/committee.js Outdated Show resolved Hide resolved
lib/committee.js Outdated Show resolved Hide resolved
lib/committee.js Outdated Show resolved Hide resolved
lib/committee.js Outdated Show resolved Hide resolved
lib/committee.js Outdated Show resolved Hide resolved
lib/evaluate.js Show resolved Hide resolved
Group accepted measurements to committees on a per-retrieval-task basis.
Evaluate each committee to find an absolute majority result. Reject
results that are in minority.

When the committee is too small to give us confidence in a majority
being honest, or if we cannot find an absolute majority, then reject
all measurements in such committee.

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos
Copy link
Member Author

bajtos commented Aug 8, 2024

I compared the output of node bin/dry-run.js 12649.

  • Only 1.3% of measurements have a minority result
  • Only 0.1% of committees are too small (less than 30 measurements)
-EVALUATE ROUND 12649n: built per-node task lists in 3804ms [Tasks=1000;TN=15;Nodes=55235]
-EVALUATE ROUND 12649n: added 2980n as rounding to MAX_SCORE
-EVALUATE ROUND 12649n: Evaluated 333523 measurements, found 93814 honest entries.
+EVALUATE ROUND 12649n: built per-node task lists in 3958ms [Tasks=1000;TN=15;Nodes=55235]
+EVALUATE ROUND 12649n: added 3871n as rounding to MAX_SCORE
+EVALUATE ROUND 12649n: Evaluated 333523 measurements, found 92574 honest entries.
 {
-  OK: 93814,
+  OK: 92574,
   TASK_NOT_IN_ROUND: 37557,
   DUP_INET_GROUP: 34102,
   TOO_MANY_TASKS: 81874,
-  TASK_WRONG_NODE: 86176
+  TASK_WRONG_NODE: 86176,
+  MINORITY_RESULT: 1213,
+  COMMITTEE_TOO_SMALL: 27
 }

The overall RSR went slightly up from 5.29% to 5.31%.

- success_rate: '0.052934530027501224',
+ success_rate: '0.053114265344481174',

The number of IPNI 404 errors decreased slightly from 61.08% to 60.09%.

-  result_rate_IPNI_ERROR_404: '0.6107830387788603',
+  result_rate_IPNI_ERROR_404: '0.6089398751269255',

The CPU & memory used by the evaluation did not change much.

-Duration: 10114ms
+Duration: 10630ms
 {
-  rss: 450183168,
+  rss: 451018752,

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos
Copy link
Member Author

bajtos commented Aug 9, 2024

@juliangruber the PR is ready for another round of review.

cross-check more fields

This should be trivial to add - update the array of fields checked + write more tests.

lib/committee.js Outdated
this.indexerResult = 'MAJORITY_NOT_FOUND'

/** @type {RetrievalResult} */
this.retrievalResult = 'MAJORITY_NOT_FOUND'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there can be a case where Committee is initialized, then there's an error during evaluate(), then it's state is read and it's falsely interpreted as 'MAJORITY_NOT_FOUND', while it should rather be something like 'UNKNOWN_ERROR'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right that this design (a mutable Committee class) allows that to happen in theory. I think it should not happen in practice right now because if there is an error, then an exception is thrown and the entire evaluation is aborted.

I propose to move these two "*Result" properties into a nested object stored in Commitee.evaluation property, and let this property be undefined until the evaluation completes.

See 029db0d


A possible next refactoring is to remove the Committee class entirely and modify Committee.evaluate into a pure function accepting ({requiredCommitteeSize, retrievalTask, measurements}) and returning CommitteeEvaluation.

Such change would push the complexity to runFraudDetection, which would need to create the list of committees - an array of {retrievalTask, measurements, evaluation} objects. This list of committees is later used to extract stats for InfluxDB and spark-stats API.

In that light, such refactoring does not seem to be an improvement.

Let me know what you think!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to having the result be undefined or something else, which we can differentiate from the previous default state

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm not sure whether removing the class does help, let's see how it goes?

lib/committee.js Outdated Show resolved Hide resolved
lib/committee.js Outdated Show resolved Hide resolved
lib/committee.js Outdated Show resolved Hide resolved
lib/evaluate.js Outdated Show resolved Hide resolved
bajtos and others added 8 commits August 26, 2024 17:18
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
…ements

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos bajtos closed this Aug 27, 2024
@bajtos bajtos reopened this Aug 27, 2024
@bajtos
Copy link
Member Author

bajtos commented Aug 27, 2024

Comparing the output of node bin/dry-run.js 13944 - main vs b8916a9:

  • Only 0.26% (573/215182) of measurements have a minority result
  • All committees were large enough to give us the confidence of the majority being honest
-EVALUATE ROUND 13944n: built per-node task lists in 845ms [Tasks=1000;TN=15;Nodes=12334]
-EVALUATE ROUND 13944n: added 3920n as rounding to MAX_SCORE
-EVALUATE ROUND 13944n: Evaluated 215182 measurements, found 98111 honest entries.
+EVALUATE ROUND 13944n: built per-node task lists in 865ms [Tasks=1000;TN=15;Nodes=12334]
+EVALUATE ROUND 13944n: added 3931n as rounding to MAX_SCORE
+EVALUATE ROUND 13944n: Evaluated 215182 measurements, found 97335 honest entries.
 {
-  OK: 98111,
+  OK: 97335,
   TASK_NOT_IN_ROUND: 16191,
   DUP_INET_GROUP: 24871,
   TOO_MANY_TASKS: 75847,
-  TASK_WRONG_NODE: 162
+  TASK_WRONG_NODE: 162,
+  MINORITY_RESULT: 573,
+  MAJORITY_NOT_FOUND: 203
 }

The overall RSR went slightly down from 8.72% to 8.65%.

-  success_rate: '0.0871869617066384',
+  success_rate: '0.08654646324549237',

The number of IPNI 404 errors increased slightly from 59.87% to 60.35%.

- result_rate_IPNI_ERROR_404: '0.5986892397386634',
+ result_rate_IPNI_ERROR_404: '0.6034622694816869',

The CPU & memory used by the evaluation did not change much.

-Duration: 5148ms
+Duration: 5551ms
 {
-  rss: 370573312,
+  rss: 312492032,

@bajtos bajtos marked this pull request as ready for review August 27, 2024 13:09
@bajtos
Copy link
Member Author

bajtos commented Aug 27, 2024

@juliangruber the PR is ready for final review & landing

Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@bajtos bajtos enabled auto-merge (squash) August 28, 2024 10:03
@bajtos bajtos merged commit 4925298 into main Aug 28, 2024
6 checks passed
@bajtos bajtos deleted the feat-majorities branch August 28, 2024 10:04
bajtos added a commit that referenced this pull request Sep 2, 2024
Fix a regression introduced by 4925298 (#304).

With the recently introduced “committees & majorities”, measurements
that don’t agree with the majority are rejected. As a result, the code
calculating RSR will receive only majority measurements, it will see
that either a) all accepted measurements say the deal is retrievable or
b) all accepted measurements say the deal is not retrievable. As a
result, the RSR is heavily influenced by how many measurements were
collected for each deal.

For example, let’s say an SP has one deal that’s retrievable and another
that is not. Now consider two cases:

1. The task testing the retrievable deal produces 100 retrieval requests
   (accepted measurements) while the task testing the non-retrievable
   deal produces 50 retrieval requests (accepted measurements).

   The RSR is 100 / (100 + 50) = 66%.

2. The task testing the retrievable deal produces 50 retrieval requests
   (accepted measurements) while the task testing the non-retrievable
   deal produces 100 retrieval requests (accepted measurements).

   The RSR is 50 / (100 + 50) = 33$.

This commit fixes the problem by changing the implementation of
`updatePublicStats` to iterate over all measurements assigned to all
committees. This way we include all measurements in the calculation:
minority measurements, measurements where no majority was found,
measurements belonging to committees that are too small.

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
bajtos added a commit that referenced this pull request Sep 2, 2024
Fix a regression introduced by 4925298 (#304).

With the recently introduced “committees & majorities”, measurements
that don’t agree with the majority are rejected. As a result, the code
calculating RSR will receive only majority measurements, it will see
that either a) all accepted measurements say the deal is retrievable or
b) all accepted measurements say the deal is not retrievable. As a
result, the RSR is heavily influenced by how many measurements were
collected for each deal.

For example, let’s say an SP has one deal that’s retrievable and another
that is not. Now consider two cases:

1. The task testing the retrievable deal produces 100 retrieval requests
   (accepted measurements) while the task testing the non-retrievable
   deal produces 50 retrieval requests (accepted measurements).

   The RSR is 100 / (100 + 50) = 66%.

2. The task testing the retrievable deal produces 50 retrieval requests
   (accepted measurements) while the task testing the non-retrievable
   deal produces 100 retrieval requests (accepted measurements).

   The RSR is 50 / (100 + 50) = 33$.

This commit fixes the problem by changing the implementation of
`updatePublicStats` to iterate over all measurements assigned to all
committees. This way we include all measurements in the calculation:
minority measurements, measurements where no majority was found,
measurements belonging to committees that are too small.

---------

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

2 participants