Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config reload]: On dual ToR systems, cache ARP and FDB tables #1465

Merged

Conversation

theasianpianist
Copy link
Contributor

Signed-off-by: Lawrence Lee lawlee@microsoft.com

What I did

On dual ToR systems, before performing config reload cache the ARP and FDB tables (similar to fast reboot) so that the standby ToR is able to resume normal operation afterwards.

Note: will require a separate change in sonic-buildimage for the docker_image_ctl.j2 file for the SWSS container

How I did it

Add an optional flag to disable caching behavior (enabled by default for dual ToR systems, disabled for all others).

Use the fast-reboot-dump and filter_fdb_entries scripts to cache the current ARP and FDB tables in /host/config-reload. Also create a file in the same directory to indicate to SWSS that it should restore from the cache.

How to verify it

  • Run config reload -d on a dual ToR system. Confirm that no cache is created in /host/config-reload.
  • Run config reload on a dual ToR system. Confirm a cache IS created in /host/config-reload. There should be an arp.json, fdb.json, and default_routes.json (this one will be unused)
  • Run config reload and config reload -d on a dual ToR system. Confirm that no cache is created in /host/config-reload.

Previous command output (if the output of a command-line utility has changed)

admin@sonic:~$ sudo config reload -y
Executing stop of service telemetry...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Executing stop of service swss...
Executing stop of service lldp...
Executing stop of service pmon...
Executing stop of service bgp...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Executing reset-failed of service bgp...
Executing reset-failed of service dhcp_relay...
Executing reset-failed of service hostname-config...
Executing reset-failed of service interfaces-config...
Executing reset-failed of service lldp...
Executing reset-failed of service ntp-config...
Executing reset-failed of service pmon...
Executing reset-failed of service radv...
Executing reset-failed of service rsyslog-config...
Executing reset-failed of service snmp...
Executing reset-failed of service swss...
Executing reset-failed of service syncd...
Executing reset-failed of service teamd...
Executing reset-failed of service telemetry...
Executing restart of service hostname-config...
Executing restart of service interfaces-config...
Executing restart of service ntp-config...
Executing restart of service rsyslog-config...
Executing restart of service swss...
Executing restart of service bgp...
Executing restart of service pmon...
Executing restart of service lldp...
Executing restart of service telemetry...
Reloading Monit configuration ...
Reinitializing monit daemon

New command output (if the output of a command-line utility has changed)

admin@sonic:~$ sudo config reload -y
Caching ARP table to /host/config-reload                  <-------- New message, only on dual ToR systems
Executing stop of service telemetry...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Executing stop of service swss...
Executing stop of service lldp...
Executing stop of service pmon...
Executing stop of service bgp...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Executing reset-failed of service bgp...
Executing reset-failed of service dhcp_relay...
Executing reset-failed of service hostname-config...
Executing reset-failed of service interfaces-config...
Executing reset-failed of service lldp...
Executing reset-failed of service ntp-config...
Executing reset-failed of service pmon...
Executing reset-failed of service radv...
Executing reset-failed of service rsyslog-config...
Executing reset-failed of service snmp...
Executing reset-failed of service swss...
Executing reset-failed of service syncd...
Executing reset-failed of service teamd...
Executing reset-failed of service telemetry...
Executing restart of service hostname-config...
Executing restart of service interfaces-config...
Executing restart of service ntp-config...
Executing restart of service rsyslog-config...
Executing restart of service swss...
Executing restart of service bgp...
Executing restart of service pmon...
Executing restart of service lldp...
Executing restart of service telemetry...
Reloading Monit configuration ...
Reinitializing monit daemon

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

# If we are able to successfully cache ARP table info, signal SWSS to restore from our cache
# by creating /host/config-reload/needs-restore
if success:
Copy link
Contributor

@prsunny prsunny Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this required? in a similar way, swss can check if the (arp.json and fdb.json) file exists and restore, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SWSS checks for file existence, but also checks /proc/cmdline to make sure fast-reboot actually occurred. I wanted to have a similar check for config reload for redundancy.

Copy link
Contributor

@tahmed-dev tahmed-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! minor question.

config/main.py Show resolved Hide resolved
@theasianpianist theasianpianist merged commit 10de91d into sonic-net:master Mar 4, 2021
@theasianpianist theasianpianist deleted the dual-tor-config-reload-arp branch March 4, 2021 01:09
yxieca pushed a commit that referenced this pull request Mar 4, 2021
Use the fast-reboot-dump and filter_fdb_entries scripts to cache the current ARP and FDB tables in /host/config-reload. Also create a file in the same directory to indicate to SWSS that it should restore from the cache.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
vaibhavhd added a commit that referenced this pull request Apr 29, 2022
Fix the path to config_db.json in config-reload for dual-tor device.
Presently, there exists a typo in config_db.json filename that is sent as an argument to filter_fdb_entries. This causes failure during config reload.
Related PR: #1465
theasianpianist added a commit to theasianpianist/sonic-utilities that referenced this pull request Oct 27, 2022
sonic-net#1465)"

This reverts commit 10de91d.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit to theasianpianist/sonic-utilities that referenced this pull request Oct 27, 2022
sonic-net#1465)"

This reverts commit 10de91d.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit to theasianpianist/sonic-utilities that referenced this pull request Oct 27, 2022
sonic-net#1465)"

This reverts commit 10de91d.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit to theasianpianist/sonic-utilities that referenced this pull request Oct 27, 2022
sonic-net#1465)"

This reverts commit 10de91d.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit that referenced this pull request Oct 27, 2022
#2461)

…s (#1465)"

- This reverts commit 10de91d.
- Also removes '--disable-arp-cache' option from config reload tests that was added in #2325 

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit that referenced this pull request Oct 28, 2022
#2460)

…s (#1465)"

- This reverts commit 10de91d.
- Also removes '--disable-arp-cache' option from config reload tests that was added in #2325 

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit to theasianpianist/sonic-utilities that referenced this pull request Nov 10, 2022
sonic-net#2460)

…s (sonic-net#1465)"

- This reverts commit 10de91d.
- Also removes '--disable-arp-cache' option from config reload tests that was added in sonic-net#2325

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
preetham-singh pushed a commit to preetham-singh/sonic-utilities that referenced this pull request Nov 21, 2022
sonic-net#2460)

…s (sonic-net#1465)"

- This reverts commit 10de91d.
- Also removes '--disable-arp-cache' option from config reload tests that was added in sonic-net#2325 

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
theasianpianist added a commit that referenced this pull request Dec 21, 2022
…DB tables (#1465)" (#2490)

- This reverts commit 10de91d.
- Also removes '--disable-arp-cache' option from config reload tests that was added in #2325

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants