Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude Epoch Boundary from Health Check #105

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 18 additions & 11 deletions relayer/src/health_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ use std::{

use crossbeam_channel::{select, tick, Receiver, Sender};
use solana_metrics::datapoint_info;
use solana_sdk::clock::Slot;
use solana_sdk::clock::{Slot, DEFAULT_SLOTS_PER_EPOCH};

#[derive(PartialEq, Eq, Copy, Clone)]
pub enum HealthState {
Expand Down Expand Up @@ -42,23 +42,30 @@ impl HealthManager {
let mut slot_sender_max_len = 0usize;
let channel_len_tick = tick(Duration::from_secs(5));
let check_and_metrics_tick = tick(missing_slot_unhealthy_threshold / 2);
let mut outside_epoch_boundary = true;

while !exit.load(Ordering::Relaxed) {
select! {
recv(check_and_metrics_tick) -> _ => {
let new_health_state =
match last_update.elapsed() <= missing_slot_unhealthy_threshold {
true => HealthState::Healthy,
false => HealthState::Unhealthy,
};
*health_state.write().unwrap() = new_health_state;
datapoint_info!(
"relayer-health-state",
("health_state", new_health_state, i64)
);
if outside_epoch_boundary {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move the check of outside_epoch_boundary into the match statement/if/else for health state? may be helpful to continue emitting data points during that time

let new_health_state =
match last_update.elapsed() <= missing_slot_unhealthy_threshold {
true => HealthState::Healthy,
false => HealthState::Unhealthy,
};
*health_state.write().unwrap() = new_health_state;
datapoint_info!(
"relayer-health-state",
("health_state", new_health_state, i64)
);
}
}
recv(slot_receiver) -> maybe_slot => {
let slot = maybe_slot.expect("error receiving slot, exiting");
// Don't perform health updates within +/- 75 slots of epoch boundary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on moving this logic to the health state stuff above?

// Note: This is not necessarily correct for local testing
let slot_index = slot % DEFAULT_SLOTS_PER_EPOCH;
outside_epoch_boundary = 75 < slot_index && slot_index < (DEFAULT_SLOTS_PER_EPOCH - 75);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add a const for 75 with a comment on why the specific value

slot_sender.send(slot).expect("error forwarding slot, exiting");
last_update = Instant::now();
}
Expand Down
Loading