Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: per-miner & per-client daily deal stats #336

Merged
merged 16 commits into from
Sep 9, 2024

Conversation

bajtos
Copy link
Member

@bajtos bajtos commented Aug 30, 2024

Rework the daily_deals table into the following schema:

For each (day, miner_id, client_id), we want to know the following numbers (counts):

  • tested: (NEW) total deals tested
  • index_majority_found: (NEW) deals where we found a majority agreeing on the same result of the IPNI query
  • indexed: deals announcing retrievals to IPNI (HTTP or Graphsync retrievals)
  • indexed_http: (NEW) deals announcing HTTP retrievals to IPNI
  • retrieval_majority_found: (NEW) deals where we found a majority agreeing on the same result of the retrieval request
  • retrievable: deals where the majority agrees the content can be retrieved

BREAKING CHANGE: spark-stats endpoints consuming this table need to change queries to use tested instead of total.

Links:

Remaining work:

  • add tests for the new columns
  • add index to speed up date-based lookups (INDEX ON daily_deals (day))

Rework the daily_deals table into the following schema:

For each (day, miner_id, client_id), we want to know the following numbers (counts):
 - `tested`: (NEW) total deals tested
 - `indexed`: deals announcing retrievals to IPNI (HTTP or Graphsync retrievals)
 - `indexed_http`: (NEW) deals announcing HTTP retrievals to IPNI
 - `majority_found`: (NEW) deals where we found a majority agreeing on the same result
 - `retrievable`: deals where the majority agrees the content can be retrieved

BREAKING CHANGE: spark-stats endpoints consuming this table need to
change queries to use `tested` instead of `total`.

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@juliangruber
Copy link
Member

BREAKING CHANGE: spark-stats endpoints consuming this table need to change queries to use tested instead of total.

Can we use this as an opportunity to fix this coupling, by adding an HTTP endpoint to spark-evaluate and letting spark-stats consume that, instead of querying its db?

Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #336 (comment)

lib/public-stats.js Outdated Show resolved Hide resolved
lib/public-stats.js Outdated Show resolved Hide resolved
@bajtos
Copy link
Member Author

bajtos commented Sep 2, 2024

BREAKING CHANGE: spark-stats endpoints consuming this table need to change queries to use tested instead of total.

Can we use this as an opportunity to fix this coupling, by adding an HTTP endpoint to spark-evaluate and letting spark-stats consume that, instead of querying its db?

spark-evaluate does not expose any HTTP interface right now. Changing that is way beyond the scope of this work, IMO.

IIRC, the direction we wanted to take is to improve spark-evaluate to publish the evaluation results to IPFS & commit the CID to the smart contract and then move all code building publicly-retrievable stats from spark-evaluate to spark-stats repository.

@bajtos
Copy link
Member Author

bajtos commented Sep 3, 2024

@juliangruber I realised I need two values for majority_found - one regarding the indexer result and another regarding the retrieval result. It's possible for a deal to be correctly indexed but to not have majority agreement on whether it can be retrieved.

What would be good column names?

I feel that retrievable_majority_found and indexed_majority_found could be confusing, I read it as "somebody found an indexed majority" or "a majority that's retrievable".

How about has_result_indexed and has_result_retrievable?

The scores will be then calculated as follows:

  • deals indexed: indexed / has_result_indexed
  • deals offering HTTP retrievals: indexed_http / has_result_indexed
  • retrievable deals: retrievable / has_result_retrievable

@juliangruber
Copy link
Member

What about these:

  • index_majority_found
  • retrieval_majority_found

I think these make sense because they stand for "majority found in the index process" and "majority found in the retrieval process". Strictly speaking it's "index resolution", so should be index_resolution_majority_found, but I don't think this level of precision is necessary

bajtos and others added 11 commits September 5, 2024 14:35
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos bajtos marked this pull request as ready for review September 6, 2024 08:23
@bajtos
Copy link
Member Author

bajtos commented Sep 6, 2024

The PR is ready for final review & landing. @juliangruber PTAL 🙏🏻

lib/public-stats.js Outdated Show resolved Hide resolved
lib/public-stats.js Outdated Show resolved Hide resolved
lib/public-stats.js Outdated Show resolved Hide resolved
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos
Copy link
Member Author

bajtos commented Sep 9, 2024

@juliangruber LGTY now?

Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Miro!!

@bajtos bajtos merged commit d2fe8ca into main Sep 9, 2024
6 checks passed
@bajtos bajtos deleted the deal-retrievability-score branch September 9, 2024 09:57
Copy link

sentry-io bot commented Sep 11, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ Error: contract runner does not support sending transactions (operation="sendTransaction", code=UNSUPPOR... ?(platform-stats) View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

2 participants