Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fair queuing for block processor #4476

Merged
merged 10 commits into from
Apr 4, 2024

Conversation

pwojcikdev
Copy link
Contributor

@pwojcikdev pwojcikdev commented Mar 10, 2024

The current design of block processor queue is problematic. We have a single, large queue that is shared by all peers and node components (bootstrapping, local blocks, unchecked table). This means that in periods of high activity this queue can quickly become overwhelmed, with excessively active peers consuming its capacity.

I propose we use a new design: a fair queue where each peer and each component gets its own smaller queue with a configurable size and priority. Block processor will then use this queue to process incoming blocks in a (weighted) round robin manner. This should ensure that even when network is under stress, data coming from well-behaved peers is ingested quickly.

Currently all peers are treated with equal priority, but it is possible to modify this implementation to give more resources to representatives - once we implement a reliable way to authenticate channels.

The default size of queue per peer is 128 blocks. This should be enough for live network, but probably should be adjusted for beta in order to be able to saturate the network.

nano/node/fair_queue.hpp Outdated Show resolved Hide resolved
@SparK-Cruz
Copy link

SparK-Cruz commented Mar 11, 2024

Bad, very bad.
Never trust a peer, even with signed connections, even with 65% weight and 100% uptime for the last century. (Let's not get started on MITM attacks)
"Y can't spam, but X and Z can spam if they want but they won't because they are trust worthy" is a very bad take.
Much prefer something like #3166

@gr0vity-dev
Copy link
Contributor

"Y can't spam, but X and Z can spam if they want but they won't because they are trust worthy" is a very bad take.

That's not what's happening here. Everybody gets a fair chance to submit their blocks to the network.
If on the one hand we have a spammer with 1000 blocks and on the other hand 10 peers with 1 block each, all the peers will have their first block processed before the spammer can offload the remaining 999 blocks. No one is excluded.

Much prefer something like #3166

A better solution to this has already been implemented here #4454 afaik. Instead of limiting legit services to wait for the previous confirmation, all the PRs in the network wait for the previous confirmation before publishing the block. So we have the best of both worlds. Reduced traffic and no artificial limits for legit services that want to publish all their blocks at once.

@pwojcikdev
Copy link
Contributor Author

pwojcikdev commented Mar 11, 2024

As gr0vity-dev explained, the point of this PR is not to somehow filter only "trusted" peers, but to ensure every peer is treated equally and ensure QoS for legit transactions, even under constant network stress. If there are 100 peers and 99 of them are spamming, that 1 well-behaved peer will have its transaction preempt the rest of the traffic.

@SparK-Cruz
Copy link

The "configurable size and priority" part got me reading this all wrong.


namespace nano
{
template <typename Request, typename Source>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief comment here to explain what fair_queue does would be nice.

Copy link
Contributor

@dsiganos dsiganos Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help the issue, here is my understanding, use it as you see fit:

A fair queue holds objects of type Requests that have arrived from an origin described by type origin_type.
A fair queue behaves like a single queue at a high level but it is composed of a number of sub-queues.
It creates a sub-queue for each origin type and each sub-queue has a priority and a max_size.
The purpose of the fair queue is to ensure that blocks are taken out of the fair queue in a round robin fashion from each sub-queue.
So for example, 10 blocks from each origin max, before moving to the next origin.
Origin objects have the concept of liveness, a their sub-queue is deleted when the origin is no longer alive.

@pwojcikdev
Copy link
Contributor Author

I see, my description could be a bit misleading. Only local blocks (submitted via RPC) and bootstrapping is treated with more priority and gets allocated larger queues.

nano/node/fair_queue.hpp Outdated Show resolved Hide resolved
};

private:
struct entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name entry is a bit too generic for something so important. How about origin_queue?

nano/node/fair_queue.hpp Outdated Show resolved Hide resolved
@dsiganos
Copy link
Contributor

Is this a possible problem? If a node (for example one of @gr0vity-dev test scripts) creates a TCP connection, pushes some blocks and then immediately closes the connection. If the cleanup function is called fast enough, will the blocks be dropped? That will not be a major problem in the real network (because connection are not typically short lived) but it might cause problems with test scripts.


TEST (fair_queue, construction)
{
nano::fair_queue<source_enum, int> queue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types are reversed compared to other tests.

forced.pop_front ();
return entry;
}
debug_assert (!queue.empty ()); // This should be checked before calling next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour of forced queue is changed here.
Previously force always had priority whereas now it is one of the round robin queues.
Due to the round robin nature of the implementation, maybe this is fine, but it is a difference in behaviour that I thought I should flag up.

case nano::block_source::local:
return config.priority_local;
default:
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give block_source::forced a priority of 1.
I would expect forced to have a higher priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can make it higher priority, but can you explain exactly why it should be higher priority?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force insertion is used by elections to process forked block. I must be missing something, but seems to me that giving higher priority to forked elections/blocks is unwanted behavior. Why should we prioritize forks?

using queue_t = std::deque<Request>;
queue_t requests;

size_t priority;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Priority seems like a misleading term because blocks with highest priority do not necessarily go first.
Throughput factor or queue width or something indicating heavier flow might be a better name.
I do not feel very strongly about this, feel free to ignore if you disagree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to change it to a better name, though for aesthetic reasons it must be a single word.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowance or quota?

@pwojcikdev
Copy link
Contributor Author

@dsiganos Is this this behavior a problem or a feature? What if a peer repeatedly reconnects and pushes blocks?

dsiganos
dsiganos previously approved these changes Mar 14, 2024
# Conflicts:
#	nano/lib/stats_enums.hpp
#	nano/node/CMakeLists.txt
Copy link
Contributor

@clemahieu clemahieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but I think it's at a good point to merge and iterate.

@@ -3,6 +3,7 @@
#include <nano/lib/locks.hpp>
#include <nano/lib/timer.hpp>
#include <nano/node/transport/channel.hpp>
#include <nano/node/transport/fake.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be forward declared

@@ -38,6 +38,26 @@ enum class block_source
std::string_view to_string (block_source);
nano::stat::detail to_stat_detail (block_source);

class block_processor_config final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be its own file. Including node_config.hpp pulls in blockprocessor.hpp -> fair_queue.hpp -> transport/channel.hpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this pattern of declaring config next to the class in quite a few places at this point. I find it very convenient when working, to the point where having to always split it and keep in separate file would probably outweigh any compilation speed benefits, at least on my hardware.

But this can be fixed at the nodeconfig level much easier, by just using the same pattern as in node header itself: forward declare and store unique_ptrs to configs with an additional reference for convenience.

@pwojcikdev pwojcikdev merged commit ec6207f into nanocurrency:develop Apr 4, 2024
25 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Merged / V27.0
Development

Successfully merging this pull request may close these issues.

6 participants