Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFM17 - Provider Record Liveness #16

Merged
merged 31 commits into from
Sep 6, 2022
Merged

Conversation

cortze
Copy link
Contributor

@cortze cortze commented Aug 18, 2022

This is the first draft of the report for RFM17.

All the feedback is more than welcome, so please, go ahead and leave some comments. 😄

I still have to merge a few branches of the hoarder, so I didn't add the GitHub submodule yet (please be patient 😅, it will come soon).

Copy link
Contributor

@guillaumemichel guillaumemichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good and well detailed report!

Great description of the Hoarder and very interesting results. Would it make sense to add a small section between the results and the conclusion discussing and suggesting which changes could be taken in order to improve IPFS, as you mentioned briefly in the conclusion? This section could contain recommendations for republish intervals, K replication parameter, Hydra nodes and the tradeoffs associated with these suggestions. The conclusion could then sum up the preferred changes described in this section.

The boxplots contain a lot of outliers stacking on top of each other. Would it be possible to have circles of different sizes according to the number of stacked outliers? This would help to visualize a bit better the outliers.

Could you update the folder name implementations/rfm-17-provider-record-liveness to implementations/rfm17-provider-record-liveness?

results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Show resolved Hide resolved

In Figure 1, we can observe that the CDF follows a linear pattern, with an average of 39.06 CIDs (shown below) in each of the normalized 256 bins displayed. Although the distribution is fairly homogeneous, we can still appreciate in Figure 2 that the PDF's max and min values are, 58 CIDs at 0.38 and 22 CIDs at 0.82, respectively.

Despite the randomness of the bytes used to generate the _CIDs_ and the homogeneousness that the _SHA256_ encoding function provides might be affected by the relatively short size of the _CID_ dataset (10.000 CIDs).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note: you could have directly generated random CIDs without having to generate random content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know it, do you have any link to the method or to an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen an implementation, but generating a random multihash (instead of hashing random data) should work too

results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved

**In-degree ratio**

Following the same track of the comparison between K=20 with hydras and the current k=20 without hydras, Figure 42 shows the comparison between the in-degree ratio's percentage of the _PR Holders_ when the hydra filter is on and off. In the figure, we can appreciate that the participation follows a similar distribution, where the data set that includes the hydras has a slightly 5% more in-degree ratio. In both cases, the median never drops below 70%, which is achieved later on by the "with-hydras" dataset after ~32 hours.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the figure, we can appreciate that the participation follows a similar distribution, where the data set that includes the hydras has a slightly 5% more in-degree ratio

Phrasing not very clear

results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved

The steady in-degree ratio measured for k=20 over +80 hours showed that the initial closest peers keep being the closest ones for more than 48 hours. This measurement dismisses any existing doubt about the healthiness of any existing _PR_, and it opens the possibility of decreasing the overhead of the network by increasing the _PR republish interval_.

In a currently over-saturated network, where running a DHT Server is way more CPU and bandwidth consuming than a single DHT Client, any window of improvement has to be taken. Although reducing the _K_ value to 15 would imply a 25% overhead reduction, it implies a performance risk that should be considered more carefully. However, increasing the _PR republish interval_ seems a far more reasonable action to reduce the overhead without interfering with the performance and reliability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although reducing the K value to 15 would imply a 25% overhead reduction

25% of the number of Provider Records stored on each DHT server node?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say reduction in PR-related processes (connection, bandwidth, CPU, storage related to sending and storing PRs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiannisbot I would say that it is only the storage that gets affected. Connections and bandwidth are expected to stay the same, as each node will store 25% less Provider Records, but they will have to respond to +25% requests for each of them, as they are stored as 25% less peers. Overall, the number of requests stays the same for the content.
There will be a small bandwidth reduction for the publish operation (25%) as the content is published x15 and not x20, but I don't think that the publish operation is a big share of all requests.

@guillaumemichel guillaumemichel requested review from guillaumemichel and removed request for guillaumemichel August 22, 2022 20:46
yiannisbot and others added 9 commits August 30, 2022 08:50
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
@yiannisbot yiannisbot marked this pull request as ready for review August 30, 2022 08:07
Co-authored-by: Guillaume Michel - guissou <guillaumemichel@users.noreply.github.com>
Copy link
Member

@yiannisbot yiannisbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just excellent work @cortze! I've made quite a few suggested edits - please commit directly if you agree. Conclusions are very informative.

One important point that would be great to address is adding a TL;DR, or "Summary of Findings/Results" at the top of the report, probably just before the "Methodology" section. This would roughly be a one-sentence summary of every paragraph in the conclusions section. The report is rather long, so a reader would have to spend a lot of time reading and understanding before getting to the results. Some people might not even be interested in the details and just want to the "Take Home" message. Please address this in the next iteration.

results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved
results/rfm17-provider-record-liveness.md Outdated Show resolved Hide resolved

The steady in-degree ratio measured for k=20 over +80 hours showed that the initial closest peers keep being the closest ones for more than 48 hours. This measurement dismisses any existing doubt about the healthiness of any existing _PR_, and it opens the possibility of decreasing the overhead of the network by increasing the _PR republish interval_.

In a currently over-saturated network, where running a DHT Server is way more CPU and bandwidth consuming than a single DHT Client, any window of improvement has to be taken. Although reducing the _K_ value to 15 would imply a 25% overhead reduction, it implies a performance risk that should be considered more carefully. However, increasing the _PR republish interval_ seems a far more reasonable action to reduce the overhead without interfering with the performance and reliability.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say reduction in PR-related processes (connection, bandwidth, CPU, storage related to sending and storing PRs).


In Figure 13, we perceive the abrupt drop at hour 24 that was expected and that we previously introduced. As with the activity of the peers, during those first 24 hours, we can find certain stability with the lower Q1 quartile set at 12 peers sharing the _PRs_. The disposition of the outliers also shows that none of the _CIDs_ reached a point where _PRs_ weren't retrievable.

This last statement is a bit trivial. Although we can assume that if the _PR Holders_ keep the records over the 24 hours the _CIDs_ are reachable, adversary events on the peer-to-peer network, like high sudden node churn, could leave this node isolated from the rest. The bigger impact of this isolation is, in fact, that a given _PR Holder_ could not be included in the rest of the peers' routing table. Therefore, no one would reach out to that isolated peer asking for the _CID_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the phrasing can be softened here to include the cases when that would be possible (i.e., severe network fragmentation, or similar).

![img](../implementations/rfm-17-provider-record-liveness/plots/kcomparison/active_total_pr_holders.png)
<p style="text-align: center;">Figure 22. Comparison of the active PR Holders' for the different K values (median on the left, average on the right)</p>

This pattern gets even more evident when displaying the percentage of the active _PR Holders_ (see Figure 23). Here we can clearly see that the difference between K=15 and K=40 's median is in order of a 5% of more active peers when we increase the _K_ value to 40 peers. In the graph displaying the averages, we can distinguish with a higher resolution the initial drop (first 10 hours) and the following catch-up (hours 20 to 25) previously mentioned. The spotted pattern has been observed over all the different K values and we address it to a specific set of users in the same time zone that disconnect over a set of hours in a daily period (it could be users shutting down their PCs during the night).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for active PR Holders, so it's intuitive AFAIU. The more peers you store the record with, the more peers will be active (and hold/provide it) after a period of time.

results/rfm17-provider-record-liveness.md Show resolved Hide resolved
cortze and others added 12 commits August 31, 2022 12:13
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
cortze and others added 5 commits August 31, 2022 17:26
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
@cortze
Copy link
Contributor Author

cortze commented Sep 1, 2022

Thank you very much @yiannisbot and @guillaumemichel for the support and the feedback!
I think I covered all the typos 🙈 , suggestions, and missing explanations. Let me know if I missed something and what you think about this second iteration🚀

Copy link
Member

@yiannisbot yiannisbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@yiannisbot yiannisbot merged commit 5a00404 into probe-lab:master Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants