-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: form committees and find majority results #304
Conversation
@juliangruber I would like to get early feedback on the implementation. I implemented cross-checking/majority-finding for the first few fields (providerId, indexerResult, retrievalResult). Checking more fields is trivial, we just need to add more calls to I will need to fix the failing tests. The linked GH issue mentions that we should also change how we calculate RSR:
That's out of the scope of this pull request. I would also like to add new telemetry - how many committees were too small or the absolute majority was not found. I'd like to do that as part of this pull request, so that we get visibility into what's going on after we deploy this change. |
Group accepted measurements to committees on a per-retrieval-task basis. Evaluate each committee to find an absolute majority result. Reject results that are in minority. When the committee is too small to give us confidence in a majority being honest, or if we cannot find an absolute majority, then reject all measurements in such committee. Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
I compared the output of
-EVALUATE ROUND 12649n: built per-node task lists in 3804ms [Tasks=1000;TN=15;Nodes=55235]
-EVALUATE ROUND 12649n: added 2980n as rounding to MAX_SCORE
-EVALUATE ROUND 12649n: Evaluated 333523 measurements, found 93814 honest entries.
+EVALUATE ROUND 12649n: built per-node task lists in 3958ms [Tasks=1000;TN=15;Nodes=55235]
+EVALUATE ROUND 12649n: added 3871n as rounding to MAX_SCORE
+EVALUATE ROUND 12649n: Evaluated 333523 measurements, found 92574 honest entries.
{
- OK: 93814,
+ OK: 92574,
TASK_NOT_IN_ROUND: 37557,
DUP_INET_GROUP: 34102,
TOO_MANY_TASKS: 81874,
- TASK_WRONG_NODE: 86176
+ TASK_WRONG_NODE: 86176,
+ MINORITY_RESULT: 1213,
+ COMMITTEE_TOO_SMALL: 27
} The overall RSR went slightly up from 5.29% to 5.31%. - success_rate: '0.052934530027501224',
+ success_rate: '0.053114265344481174', The number of IPNI 404 errors decreased slightly from 61.08% to 60.09%. - result_rate_IPNI_ERROR_404: '0.6107830387788603',
+ result_rate_IPNI_ERROR_404: '0.6089398751269255', The CPU & memory used by the evaluation did not change much. -Duration: 10114ms
+Duration: 10630ms
{
- rss: 450183168,
+ rss: 451018752, |
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@juliangruber the PR is ready for another round of review.
This should be trivial to add - update the array of fields checked + write more tests. |
lib/committee.js
Outdated
this.indexerResult = 'MAJORITY_NOT_FOUND' | ||
|
||
/** @type {RetrievalResult} */ | ||
this.retrievalResult = 'MAJORITY_NOT_FOUND' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there can be a case where Committee
is initialized, then there's an error during evaluate()
, then it's state is read and it's falsely interpreted as 'MAJORITY_NOT_FOUND'
, while it should rather be something like 'UNKNOWN_ERROR
'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right that this design (a mutable Committee class) allows that to happen in theory. I think it should not happen in practice right now because if there is an error, then an exception is thrown and the entire evaluation is aborted.
I propose to move these two "*Result" properties into a nested object stored in Commitee.evaluation
property, and let this property be undefined
until the evaluation completes.
See 029db0d
A possible next refactoring is to remove the Committee class entirely and modify Committee.evaluate
into a pure function accepting ({requiredCommitteeSize, retrievalTask, measurements})
and returning CommitteeEvaluation
.
Such change would push the complexity to runFraudDetection
, which would need to create the list of committees - an array of {retrievalTask, measurements, evaluation}
objects. This list of committees is later used to extract stats for InfluxDB and spark-stats API.
In that light, such refactoring does not seem to be an improvement.
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to having the result be undefined
or something else, which we can differentiate from the previous default state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'm not sure whether removing the class does help, let's see how it goes?
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
…ements Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Comparing the output of
-EVALUATE ROUND 13944n: built per-node task lists in 845ms [Tasks=1000;TN=15;Nodes=12334]
-EVALUATE ROUND 13944n: added 3920n as rounding to MAX_SCORE
-EVALUATE ROUND 13944n: Evaluated 215182 measurements, found 98111 honest entries.
+EVALUATE ROUND 13944n: built per-node task lists in 865ms [Tasks=1000;TN=15;Nodes=12334]
+EVALUATE ROUND 13944n: added 3931n as rounding to MAX_SCORE
+EVALUATE ROUND 13944n: Evaluated 215182 measurements, found 97335 honest entries.
{
- OK: 98111,
+ OK: 97335,
TASK_NOT_IN_ROUND: 16191,
DUP_INET_GROUP: 24871,
TOO_MANY_TASKS: 75847,
- TASK_WRONG_NODE: 162
+ TASK_WRONG_NODE: 162,
+ MINORITY_RESULT: 573,
+ MAJORITY_NOT_FOUND: 203
} The overall RSR went slightly down from 8.72% to 8.65%. - success_rate: '0.0871869617066384',
+ success_rate: '0.08654646324549237', The number of IPNI 404 errors increased slightly from - result_rate_IPNI_ERROR_404: '0.5986892397386634',
+ result_rate_IPNI_ERROR_404: '0.6034622694816869', The CPU & memory used by the evaluation did not change much. -Duration: 5148ms
+Duration: 5551ms
{
- rss: 370573312,
+ rss: 312492032, |
@juliangruber the PR is ready for final review & landing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Fix a regression introduced by 4925298 (#304). With the recently introduced “committees & majorities”, measurements that don’t agree with the majority are rejected. As a result, the code calculating RSR will receive only majority measurements, it will see that either a) all accepted measurements say the deal is retrievable or b) all accepted measurements say the deal is not retrievable. As a result, the RSR is heavily influenced by how many measurements were collected for each deal. For example, let’s say an SP has one deal that’s retrievable and another that is not. Now consider two cases: 1. The task testing the retrievable deal produces 100 retrieval requests (accepted measurements) while the task testing the non-retrievable deal produces 50 retrieval requests (accepted measurements). The RSR is 100 / (100 + 50) = 66%. 2. The task testing the retrievable deal produces 50 retrieval requests (accepted measurements) while the task testing the non-retrievable deal produces 100 retrieval requests (accepted measurements). The RSR is 50 / (100 + 50) = 33$. This commit fixes the problem by changing the implementation of `updatePublicStats` to iterate over all measurements assigned to all committees. This way we include all measurements in the calculation: minority measurements, measurements where no majority was found, measurements belonging to committees that are too small. Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Fix a regression introduced by 4925298 (#304). With the recently introduced “committees & majorities”, measurements that don’t agree with the majority are rejected. As a result, the code calculating RSR will receive only majority measurements, it will see that either a) all accepted measurements say the deal is retrievable or b) all accepted measurements say the deal is not retrievable. As a result, the RSR is heavily influenced by how many measurements were collected for each deal. For example, let’s say an SP has one deal that’s retrievable and another that is not. Now consider two cases: 1. The task testing the retrievable deal produces 100 retrieval requests (accepted measurements) while the task testing the non-retrievable deal produces 50 retrieval requests (accepted measurements). The RSR is 100 / (100 + 50) = 66%. 2. The task testing the retrievable deal produces 50 retrieval requests (accepted measurements) while the task testing the non-retrievable deal produces 100 retrieval requests (accepted measurements). The RSR is 50 / (100 + 50) = 33$. This commit fixes the problem by changing the implementation of `updatePublicStats` to iterate over all measurements assigned to all committees. This way we include all measurements in the calculation: minority measurements, measurements where no majority was found, measurements belonging to committees that are too small. --------- Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Group accepted measurements to committees on a per-retrieval-task basis. Evaluate each committee to find an absolute majority result. Reject results that are in minority.
When the committee is too small to give us confidence in a majority being honest, or if we cannot find an absolute majority, then reject all measurements in such committee.
Links:
TODO:
OUT OF SCOPE
NOTE
We are not cross-checking the fields that were used to calculate the retrievalResult value:
If there is an agreement on the retrieval result, then those fields must have the same value, too.