Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduled Configurations HLD #1700

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

amazor
Copy link

@amazor amazor commented May 20, 2024

High level design document for Scheduled Configurations feature.

This description will be updated to connect all the code changes in all SONiC Repositories.

Repository Pull Request State
sonic-swss-common TODO Open
sonic-swss TODO Open
sonic-buildimage TODO Open
sonic-utilities TODO Open

@amazor amazor changed the title Scheduled Configurations HLD Initial Commit Scheduled Configurations HLD May 27, 2024
##### CONFIG_DB

- **TIME_RANGE**: A table representing the time ranges that were configured by the user.
- **SCHEDULED_CONFIGURATIONS**: Represents a configuration that is will be applied when the associated time range is enabled. Contains a field that binds this configuration to a time range.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you validate the scheduled configuration if it is valid from yang model point of view.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to create a scheduled configuration yang that has the specific scheduled confirmation + a choice of the the already defined "features" ( like "ACL_RULE"), need to see if validator can support include statement needed to include the feature specific yangs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was mentioned during review that we can use a union type that can verify the syntax. Will look into this.


### Subscription APPL_DB

In order to fully integrate with the new scheduled configurations feature, all switch components desiring to utilize this functionality must subscribe to the `APPL_DB`. The table schema will be the same as the schema used in `CONFIG_DB` to apply new configurations, all that is needed is to add the new subscription to the orchagent subcomponents. This is done in order to distinguish between administrator configurations, and configurations applied due to an application (ie. *Scheduled Configurations*).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this assumption is true that "all switch components desiring to utilize this functionality must subscribe to the APPL_DB". @prsunny and @dgsudharsan

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the scheduled configuration itself is configuration, applying the embedded configuration should not come from the config_db , if scheduled configuration need to be change it will be done in config_db to the scheduled configuration.
if this assumption is accepted than the embedded configuration must not pass using config db, a natural candidate is app_db as the scheduled configuration itself is a sort of application ( focused on setting configuration is a certain time)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the issue is with using the APPL_DB. If another management application outside of orchagent needs access to the table in the APPL_DB then it can subscribe to is since all *mgr applications inherit from "orch".

There also should not be any collisions when configuring through the APPL_DB since there is no table schema from CONFIG_DB used in APPL_DB.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan Can we get more details of problems with using APPL_DB?

4. `timerangemgrd` deletes the time range entry from the `TIME_RANGE_STATUS_TABLE` table found in the `STATE_DB`
5. A subscription notification is sent to `scheduledconfigmgrd` the op *delete*
6. `scheduledconfigmgrd` checks if the time range is active, if yes then it will deactivate the configurations bound to this time range by deleting the configurations from the `APPL_DB`
7. Finally, `scheduledconfigmgrd` deletes the time range internally, and the scheduled configurations will be unbound
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the there were some old configuration being replaced by this scheduled configuration, are we recovering old configuration?

also, what if scheduled configuration is changed, how do we revert?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the feature was designed to apply an additional configuration rather than changing exiting configuration, this conceptionally remove the need to "restore" old configuration, handle conflicting configurations (e.g. user change the default configuration while in scheduled configuration).
i think it should be the system admin responsibility to handle conflicts, as the scheduled configuration is a configuration and the default configuration is also configuration, two options can be used :
one all scheduled configuration that is conflicting should be placed as scheduled configuration, e.g. assume "new" configuration in time range X so need to set "default" configuration to time range ~X ( already supported)
second option : on time range end allow admin to set configuration ( rather then just delete configuration), this allow admin to reconfigure the "default"
regrading the second question on time configuration change need to remove the scheduled configuration and configure a new one

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan How do these options in the design sound to you?


## High-Level Design

- This feature will be a built-in part of SONiC, specifically inside the SWSS container.
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the community meeting, can we explore and design this feature generically without disturbing the existing flows between app---appl_db---orchagent i.e just publish the configs at the time range required via new module(may be using overlay DB instance) so that any of the existing internal design we have today is intact.

For example,
Today, we do
config-db--->event1-(vlan,intf,lag configs.)-->mgr (vlanmgr/intf-mgmr)---appl_db---orchagent
config-db-->event2(e.g FRR configs)--->bgpcfgd/frrrcfgd---FRR
config-db-->event3 (e.g ACL)---.orchagent

Without disturbing any of the existing design, can we explore options for handling the schedule manager events external to the apps?
For example,
config-db--->scheduled-mgr--->config event fires when the time range is active/inactive or configs doesn't desire scheduling --->various apps handling the config-db events

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think simpler solution is to leave this to the admin see #1700 (comment)

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about introducing a scheduler profile and map the profile to config(s) and when the config is stored into the config-DB, it checks for the profile association, if it's present, immediate publish is avoided and scheduler manager handles the configs that are associated with the profile and publishes the config event when the timer (config from a profile) is expired based on the cron job. IMO, scheduler manager should work on top of apps and should not make any changes to the APPL_DB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i think we agree that and conlicts and restore to default should be done by the admin setting up the time ranges and the configuration ( in your concept it the profile ), regarding which db to use, this is a complex issue, if we adopt your concept of using config_db than used cam change the configuration directly ( e.g. profile that set acl rule, now use can directly change the configuration... unless you introduce a more fundamental change that all configuration has a "profile id" which i think is a very big change, maybe application db is the not the correct DB, this relate to the question is scheduled configuration an application ? in any case i think config_db is not the right db to use.

…guration.

This new option allows the admin to replace the deactivated configuration with another configuration.
- Does not apply changes to APPL_DB, instead configures CONFIG_DB
- Added new mandatory parameter to scheduled_confiugurations: deactivation_configuration. This field applies a new configuration at deactivation, or can be set to remove the previously applied configuration
This command will validate using the YANG models to verify that the internal (and external) data is correct.

## Overview

The *Scheduled Configurations* feature will allow administrators to schedule specific network configurations to be applied and later removed or reconfigured at predefined times without manual intervention. This capability is particularly useful for network scenarios where policies need to change dynamically at certain times of the day or week, such as different ACL rules that should be active only during business hours. The scheduled configuration feature easily integrates within the existing infrastructure of SONiC by following established practices and utilizing widely adopted tools like `cron`. `cron` is a time-based job scheduler in Unix-like operating systems, used to automate the execution of scripts and commands at specified times. This approach not only minimizes the learning curve associated with the adoption of new features but also leverages the proven reliability and efficiency of an open-source utility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a non-fancy proposal: why not keep this controller outside sonic device, which keeps a gnmi connection and automatically send the gnmi SetRequest at sceduled time. The benefit are:

  1. clear seperation of network switch devices and intelligent controller
  2. prompty SetResponse to the controoler, and it could follow up with mitigation, auto fix or alert if any failures happen
  3. deterministic ConfigDB, mostly likely running config will be compared with golden config, and any diff will be monitored in a large deployer system.

### Design Requirements

- **Configurability**: Administrators must be able to define time range intervals that will activate and deactivate configurations. These configurations can update the CONFIG DB during activation and deactivation.
- **Generic**: The design should be able to configure all existing configurations available in CONFIG DB, such as ACL, QoS, port shutdown, etc...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the changes are limited to these tables, we may consider extended exist table fields to accept time-varying variable (TIME_RANGE entry) as value:

    "PORT": {
        "Ethernet1": {
            "lanes": "25,26,27,28",
            "alias": "etp7",
            "index": "1",
            "speed": "10000",
            // "admin_status": "up"/"down",
            "admin_status_variable: "MaintenanceWindow",
            "mtu": "9100"
        }
    },
    "TIME_RANGE": {
        "MaintenanceWindow": {
            "start": "30 6 25 DEC *",
            "end": "0 8 2 JAN *",
            "start_year": "2024",
            "end_year": "2025",
            "start_value": "up",
            "end_value": "down"
        }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 In Plan Features
Development

Successfully merging this pull request may close these issues.

5 participants