Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Overhaul of scheduler component #1169

Merged
merged 1 commit into from
Jul 17, 2024
Merged

[Refactor] Overhaul of scheduler component #1169

merged 1 commit into from
Jul 17, 2024

Conversation

allada
Copy link
Member

@allada allada commented Jul 16, 2024

This is a significant overhaul to Nativelink's scheduler
component. This new scheduler design is to enable a distributed
scheduling system.

The new components & definitions:

  • AwaitedActionDb - An interface that is easier to work with when
    dealing with key-value storage systems.
  • MemoryAwaitedActionDb - An in-memory set of hashmaps & btrees used
    to satisfy the requirements of AwaitedActionDb interface.
  • ClientStateManager - A minimal interface required to satisfy the
    requirements of a client-facing scheduler.
  • WorkerStateManager - A minimal interface required to satisfy the
    requirements of a worker-facing scheduler.
  • MatchingEngineStateManager - A minimal interface required to
    satisfy a engine that matches queued jobs to workers.
  • SimpleSchedulerStateManager - An implements that satisfies
    ClientStateManager, WorkerStateManager & MatchingEngineStateManager
    with all the logic of the previous "SimpleScheduler" logic moved
    behind each interface.
  • ApiWorkerScheduler - A component that handles all knowledge about
    workers state and implements the WorkerScheduler interface and
    translates them into the WorkerStateManager interface.
  • SimpleScheduler - Translation calls of the ClientScheduler
    interface into ClientStateManager & MatchingEngineStateManager.
    This component is currently always forwards calls to
    SimpleSchedulerStateManager then to MemoryAwaitedActionDb.
    Future changes will make these inner components dynamic via config.

In addition we have hardened the interactions of different kind of
IDs in NativeLink. Most relevant is the separation & introduction of:

  • OperationId - Represents an individual operation being requested
    to be executed that is unique across all of time.
  • ClientOperationId - An ID issued to the client when the client
    requests to execute a job. This ID will point to an OperationId
    internally, but the client is never exposed to the OperationId.
  • AwaitedActionHashKey - A key used to uniquely identify an action
    that is not unique across time. This means that this key might
    have multiple OperationId's that have executed it across different
    points in time. This key is used as a "fingerprint" of an operation
    that the client wants to execute and the scheduler may decide to
    join the stream onto an existing operation if this key has a hit.

Overall these changes pave the way for more robust scheduler
implementations, most notably, distributed scheduler implementations
will be easier to implement and will be introduced in follow-up PRs.

This commit was developed on a side branch and consisted of the
following commits with corresponding code reviews:
54ed73c
Add scheduler metrics back (#1171)
50fdbd7
fix formatting (#1170)
8926236
Merge in main and format (#1168)
9c2c7b9
key as u64 (#1166)
0192051
Cleanup unused code and comments (#1165)
080df5d
Add versioning to AwaitedAction (#1163)
73c19c4
Fix sequence bug in new memory store manager (#1162)
6e50d2c
New AwaitedActionDb implementation (#1157)
18db991
Fix test on running_actions_manager_test (#1141)
e50ef3c
Rename workers to worker_scheduler
1fdd505
SimpleScheduler now uses config for action pruning (#1137)
eaaa872
Change encoding for items that are cachable (#1136)
d647056
Errors are now properly handles in subscription (#1135)
7c3e730
Restructure files to be more appropriate (#1131)
5e98ec9
ClientAwaitedAction now uses a channel to notify drops happened (#1130)
52beaf9
Cleanup unused structs (#1128)
e86fe08
Remove all uses of salt and put under ActionUniqueQualifier (#1126)
3b86036
Remove all need for workers to know about ActionId (#1125)
5482d7f
Fix bazel build and test on dev (#1123)
ba52c7f
Implement get_action_info to all ActionStateResult impls (#1118)
2fa4fee
Remove MatchingEngineStateManager::remove_operation (#1119)
34dea06
Remove unused proto field (#1117)
3070a40
Remove metrics from new scheduler (#1116)
e95adfc
StateManager will now cleanup actions on client disconnect (#1107)
6f8c001
Fix worker execution issues (#1114)
d353c30
rename set_priority to upgrade_priority (#1112)
0d93671
StateManager can now be notified of noone listeneing (#1093)
cfc0cf6
ActionScheduler will now use ActionListener instead of tokio::watch (#1091)
d70d31d
QA fixes for scheduler-v2 (#1092)
f2cea0c
[Refactor] Complete rewrite of SimpleScheduler
34d93b7
[Refactor] Move worker notification in SimpleScheduler under Workers
b9d9702
[Refactor] Moves worker logic back to SimpleScheduler
7a16e2e
[Refactor] Move scheduler state behind mute

This change is Reviewable

Copy link

cloudflare-workers-and-pages bot commented Jul 16, 2024

Deploying nativelink with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2954bfa
Status: ✅  Deploy successful!
Preview URL: https://575a1134.nativelink-3e2.pages.dev
Branch Preview URL: https://dev.nativelink-3e2.pages.dev

View logs

@allada allada force-pushed the dev branch 2 times, most recently from e4a3a89 to f103f63 Compare July 16, 2024 20:46
@allada allada changed the title New Scheduler [Refactor] Overhaul of scheduler component Jul 16, 2024
This is a significant overhaul to Nativelink's scheduler
component. This new scheduler design is to enable a distributed
scheduling system.

The new components & definitions:
* AwaitedActionDb - An interface that is easier to work with when
  dealing with key-value storage systems.
* MemoryAwaitedActionDb - An in-memory set of hashmaps & btrees used
  to satisfy the requirements of AwaitedActionDb interface.
* ClientStateManager - A minimal interface required to satisfy the
  requirements of a client-facing scheduler.
* WorkerStateManager - A minimal interface required to satisfy the
  requirements of a worker-facing scheduler.
* MatchingEngineStateManager - A minimal interface required to
  satisfy a engine that matches queued jobs to workers.
* SimpleSchedulerStateManager - An implements that satisfies
  ClientStateManager, WorkerStateManager & MatchingEngineStateManager
  with all the logic of the previous "SimpleScheduler" logic moved
  behind each interface.
* ApiWorkerScheduler - A component that handles all knowledge about
  workers state and implmenets the WorkerScheduler interface and
  translates them into the WorkerStateManager interface.
* SimpleScheduler - Translation calls of the ClientScheduler
  interface into ClientStateManager & MatchingEngineStateManager.
  This component is currently always forwards calls to
  SimpleSchedulerStateManager then to MemoryAwaitedActionDb.
  Future changes will make these inner components dynamic via config.

In addition we have hardened the interactions of different kind of
IDs in NativeLink. Most relevant is the separation & introduction of:
* OperationId - Represents an individual  operation being requested
  to be executed that is unique across all of time.
* ClientOperationId - An ID issued to the client when the client
  requests to execute a job. This ID will point to an OperationId
  internally, but the client is never exposed to the OperationId.
* AwaitedActionHashKey - A key used to uniquely identify an action
  that is not unique across time. This means that this key might
  have multiple OperationId's that have executed it across different
  points in time. This key is used as a "fingerprint" of an operation
  that the client wants to execute and the scheduler may decide to
  join the stream onto an existing operation if this key has a hit.

Overall these changes pave the way for more robust scheduler
implementations, most notably, distributed scheduler implementations
will be easier to implement and will be introduced in followup PRs.

This commit was developed on a side branch and consisted of the
following commits with corresponding code reviews:
54ed73c
    Add scheduler metrics back (#1171)
50fdbd7
    fix formatting (#1170)
8926236
    Merge in main and format (#1168)
9c2c7b9
    key as u64 (#1166)
0192051
    Cleanup unused code and comments (#1165)
080df5d
    Add versioning to AwaitedAction (#1163)
73c19c4
    Fix sequence bug in new memory store manager (#1162)
6e50d2c
    New AwaitedActionDb implementation (#1157)
18db991
    Fix test on running_actions_manager_test (#1141)
e50ef3c
    Rename workers to `worker_scheduler`
1fdd505
    SimpleScheduler now uses config for action pruning (#1137)
eaaa872
    Change encoding for items that are cachable (#1136)
d647056
    Errors are now properly handles in subscription (#1135)
7c3e730
    Restructure files to be more appropriate (#1131)
5e98ec9
    ClientAwaitedAction now uses a channel to notify drops happened (#1130)
52beaf9
    Cleanup unused structs (#1128)
e86fe08
    Remove all uses of salt and put under ActionUniqueQualifier (#1126)
3b86036
    Remove all need for workers to know about ActionId (#1125)
5482d7f
    Fix bazel build and test on dev (#1123)
ba52c7f
    Implement get_action_info to all ActionStateResult impls (#1118)
2fa4fee
    Remove MatchingEngineStateManager::remove_operation (#1119)
34dea06
    Remove unused proto field (#1117)
3070a40
    Remove metrics from new scheduler (#1116)
e95adfc
    StateManager will now cleanup actions on client disconnect (#1107)
6f8c001
    Fix worker execution issues (#1114)
d353c30
    rename set_priority to upgrade_priority (#1112)
0d93671
    StateManager can now be notified of noone listeneing (#1093)
cfc0cf6
    ActionScheduler will now use ActionListener instead of tokio::watch (#1091)
d70d31d
    QA fixes for scheduler-v2 (#1092)
f2cea0c
    [Refactor] Complete rewrite of SimpleScheduler
34d93b7
    [Refactor] Move worker notification in SimpleScheduler under Workers
b9d9702
    [Refactor] Moves worker logic back to SimpleScheduler
7a16e2e
    [Refactor] Move scheduler state behind mute
Copy link
Member

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments can be considered all nits / future things to change

:lgtm_strong: 🚢

Reviewed 57 of 57 files at r1, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and 10 discussions need to be resolved


nativelink-scheduler/src/action_scheduler.rs line 35 at r1 (raw file):

    fn changed(
        &mut self,
    ) -> Pin<Box<dyn Future<Output = Result<Arc<ActionState>, Error>> + Send + '_>>;

nit: Maybe for a future conversation, what if we type aliased longer definitions on traits to reduce verbosity

type ChangedResult =  Pin<Box<dyn Future<Output = Result<Arc<ActionState>, Error>> + Send + '_>>;
[...]
     /// Waits for the action state to change.
     fn changed(
         &mut self,
    ) -> ChangedResult;
 }
[...]

Most of the hover over tooling in IDEs should easily resolve it. I get that the verbosity here does lend well for the way rust tends to pack a lot in a single file, so avoiding collisions in naming or going overboard with every variant of a type doesn't cause explosion in aliases.


nativelink-scheduler/src/api_worker_scheduler.rs line 313 at r1 (raw file):

    /// Attempts to find a worker that is capable of running this action.
    // TODO(blaise.bruer) This algorithm is not very efficient. Simple testing using a tree-like

This comment should be moved into inner_find_worker_for_action


nativelink-scheduler/src/api_worker_scheduler.rs line 326 at r1 (raw file):

    /// Checks to see if the worker exists in the worker pool. Should only be used in unit tests.
    #[must_use]
    pub async fn contains_worker_for_test(&self, worker_id: &WorkerId) -> bool {

Thoughts on using something like https://docs.rs/cfg-if/latest/cfg_if/ in the future to reduce the compiled in test utilities, make them only available on #[cfg(test)] compilation paths. I'm unsure if our integration tests compile with that cfg, there could be another flag that differentiates between those contexes, test/integration/prod


nativelink-scheduler/src/grpc_scheduler.rs line 130 at r1 (raw file):

            // Our operation_id is not needed here is just a place holder to recycle existing object.
            // The only thing that actually matters is the operation_id.
            let operation_id =

Could this be a const NullOperationId ?


nativelink-scheduler/src/memory_awaited_action_db.rs line 44 at r1 (raw file):

/// Number of events to process per cycle.
const MAX_ACTION_EVENTS_RX_PER_CYCLE: usize = 1024;

In the future this should be a configurable value


nativelink-scheduler/src/memory_awaited_action_db.rs line 47 at r1 (raw file):

/// Duration to wait before sending client keep alive messages.
const CLIENT_KEEPALIVE_DURATION: Duration = Duration::from_secs(10);

In the future this should be a configurable value


nativelink-scheduler/src/memory_awaited_action_db.rs line 139 at r1 (raw file):

        drop_tx: mpsc::UnboundedSender<ActionEvent>,
    ) -> Self {
        awaited_action_rx.mark_changed();

This mark_changed and the other one (i forget where), should be commented on why we are doing that, its a bit specific since it doesn't seem like a change is applied directly in this code


nativelink-scheduler/src/memory_awaited_action_db.rs line 529 at r1 (raw file):

                    make_err!(
                        Code::Internal,
                        "Failed to get operation id {}",

"Failed to get operation id {}", ->"Failed to get operation id {operation_id}", ?


nativelink-scheduler/src/memory_awaited_action_db.rs line 592 at r1 (raw file):

                make_err!(
                    Code::Internal,
                    "OperationId does not exist in map in AwaitedActionDb::update_awaited_action"

We should capture the new_awaited_action.operation_id() in the error message


nativelink-scheduler/src/simple_scheduler_state_manager.rs line 40 at r1 (raw file):

/// Maximum number of times an update to the database
/// can fail before giving up.
const MAX_UPDATE_RETRIES: usize = 5;

def in the future should be configurable

@allada allada merged commit 3b8c3a5 into main Jul 17, 2024
30 checks passed
@allada allada deleted the dev branch July 17, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants