[bgpcfg]: Batch bgp updates #6006

pavel-shirshov · 2020-11-23T17:38:55Z

- Why I did it
vtysh -f command is slow. It sometimes takes about 3 seconds.
When we need to run many vtysh -f commands that slows down the system.
With this fix bgcfgd configuration time reduced from 10.58 sec to 2.80 sec on my dut.

- How I did it

Read as many messages as possible from the database
Run through bgpcfgd all the messages, collecting frr updates;
commit all updates as a batch

- How to verify it
Build an image and run on your dut. All tests must be working

- Which release branch to backport (provide reason below if selected)

201811
201911
202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

vtysh -f command is slow. It is sometimes takes about 3 seconds. When we need to run many vtysh -f commands that slows down the system. Batch vtysh -f updates.

lguohan · 2020-11-23T18:59:35Z

@qiluo-msft , can you review this? @pavel-shirshov , do you have the performance number before and after?

pavel-shirshov · 2020-11-23T21:10:23Z

@lguohan I added performance change to the description

qiluo-msft · 2020-11-23T22:38:30Z

src/sonic-bgpcfgd/bgpcfgd/frr.py

+                return
+            else:
+                log_warn("Can't read daemon status from FRR: %s" % str(err))
+            time.sleep(0.1)  # sleep 100 ms


time.sleep(0.1) # sleep 100 ms [](start = 12, length = 31)

What happens if no sleep?

Higher cpu usage. I think it is better to wait for 0.1 than to burn cpu by requests.

qiluo-msft

runner.py LGTM. Not familiar with other parts.

lguohan · 2020-11-24T07:31:57Z

src/sonic-bgpcfgd/tests/test_ipv6_nexthop_global.py

@@ -90,14 +90,13 @@ def check_routemap_in_file(filename, route_map_name):
            found_entry = False
            found_seq_no = None
        if route_map_re.match(line):
-            found_seq_no = None


this should go to the other pr.

lguohan · 2020-11-24T07:46:08Z

src/sonic-bgpcfgd/bgpcfgd/config.py

+        if self.changes.strip() == "":
+            return True
+        rc_write = self.frr.write(self.changes)
+        rc_restart = self.frr.restart_peer_groups(self.peer_groups_to_restart)


i am not sure if it is always correct to restart peer group after we make any change. for example, if we shutdown a bgp session, which peer group to restart?

self.peer_groups_to_restart would be empty in 'shutdown a bgp session" situation and restart_peer_groups will do nothing in this case. I update self.peer_groups_to_restart in two cases now: BBR and allow_prefix.

lguohan · 2020-11-24T07:49:03Z

src/sonic-bgpcfgd/bgpcfgd/frr.py

+        ret_code, out, err = run_command(command)
+        if ret_code != 0:
+            err_tuple = tmp_filename, ret_code, out, err
+            log_err("ConfigMgr::commit(): can't push configuration from file='%s', rc='%d', stdout='%s', stderr='%s'" % err_tuple)


do we want to keep this tmp file, or just put the tmpfile content into syslog?

The tmp file content it is full config on the start. Kind of a lot. Do we want to see it in the syslog?

lguohan · 2020-11-24T08:00:46Z

if we are doing this config bgp shutdown all, which managers are we triggering? I am not able to identify.

pavel-shirshov · 2020-11-24T16:41:42Z

@lguohan
managers_bgp.py
It subscribed on BGP peer object change
https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-bgpcfgd/bgpcfgd/main.py#L46
https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-bgpcfgd/bgpcfgd/managers_bgp.py#L153

This reverts commit 85e6ce2.

pavel-shirshov · 2020-11-24T22:22:34Z

retest vs please

* [bgpcfgd]: Batch bgp updates. vtysh -f command is slow. It is sometimes takes about 3 seconds. When we need to run many vtysh -f commands that slows down the system. Batch vtysh -f updates. * Use correct file to import run_command

The issue was a typo introduced in sonic-net#6006. In that change, the BGP allow list configuration manager was updated to use a method of common ConfigMgr for restarting peer groups. However, the method name 'restart_peers' was used instead of the correct 'restart_peer_groups'. This change updated the managers_allow_list.py to use correct method 'restart_peer_groups' for restarting peer groups. Signed-off-by: Xin Wang <xiwang5@microsoft.com>

The issue was a typo introduced in #6006. In that change, the BGP allow list configuration manager was updated to use a method of common ConfigMgr for restarting peer groups. However, the method name 'restart_peers' was used instead of the correct 'restart_peer_groups'. This change updated the managers_allow_list.py to use correct method 'restart_peer_groups' for restarting peer groups. Signed-off-by: Xin Wang <xiwang5@microsoft.com>

* [bgpcfgd]: Batch bgp updates. vtysh -f command is slow. It is sometimes takes about 3 seconds. When we need to run many vtysh -f commands that slows down the system. Batch vtysh -f updates. * Use correct file to import run_command

…ic-net#6088) The issue was a typo introduced in sonic-net#6006. In that change, the BGP allow list configuration manager was updated to use a method of common ConfigMgr for restarting peer groups. However, the method name 'restart_peers' was used instead of the correct 'restart_peer_groups'. This change updated the managers_allow_list.py to use correct method 'restart_peer_groups' for restarting peer groups. Signed-off-by: Xin Wang <xiwang5@microsoft.com>

Pavel Shirshov added 2 commits November 20, 2020 18:35

[bgpcfgd]: Batch bgp updates.

5530bab

vtysh -f command is slow. It is sometimes takes about 3 seconds. When we need to run many vtysh -f commands that slows down the system. Batch vtysh -f updates.

Use correct file to import run_command

5b8a376

pavel-shirshov self-assigned this Nov 23, 2020

pavel-shirshov requested a review from lguohan November 23, 2020 17:39

pavel-shirshov added bgp Enhancement ➕ Request for 201911 Branch Request for 202006 Branch Scaling Unit Tests labels Nov 23, 2020

lguohan requested a review from qiluo-msft November 23, 2020 18:59

pavel-shirshov marked this pull request as ready for review November 23, 2020 21:10

qiluo-msft reviewed Nov 23, 2020

View reviewed changes

qiluo-msft previously approved these changes Nov 23, 2020

View reviewed changes

lguohan reviewed Nov 24, 2020

View reviewed changes

Update test_ipv6_nexthop_global.py

85e6ce2

pavel-shirshov dismissed qiluo-msft’s stale review via 85e6ce2 November 24, 2020 16:45

Pavel Shirshov added 2 commits November 24, 2020 08:52

Revert "Update test_ipv6_nexthop_global.py"

c9c243c

This reverts commit 85e6ce2.

Revert changes in the test

4aa96f1

lguohan approved these changes Nov 24, 2020

View reviewed changes

pavel-shirshov merged commit 148436d into sonic-net:master Nov 25, 2020

abdosi added the Included in 201911 Branch label Nov 26, 2020

wangxin mentioned this pull request Dec 2, 2020

Fix bgp crash after BGP allow list configuration is added #6088

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bgpcfg]: Batch bgp updates #6006

[bgpcfg]: Batch bgp updates #6006

pavel-shirshov commented Nov 23, 2020 •

edited

Loading

lguohan commented Nov 23, 2020

pavel-shirshov commented Nov 23, 2020

qiluo-msft Nov 23, 2020

pavel-shirshov Nov 23, 2020

qiluo-msft left a comment

lguohan Nov 24, 2020

lguohan Nov 24, 2020

pavel-shirshov Nov 24, 2020

lguohan Nov 24, 2020

pavel-shirshov Nov 24, 2020

lguohan commented Nov 24, 2020

pavel-shirshov commented Nov 24, 2020

pavel-shirshov commented Nov 24, 2020

[bgpcfg]: Batch bgp updates #6006

[bgpcfg]: Batch bgp updates #6006

Conversation

pavel-shirshov commented Nov 23, 2020 • edited Loading

lguohan commented Nov 23, 2020

pavel-shirshov commented Nov 23, 2020

qiluo-msft Nov 23, 2020

Choose a reason for hiding this comment

pavel-shirshov Nov 23, 2020

Choose a reason for hiding this comment

qiluo-msft left a comment

Choose a reason for hiding this comment

lguohan Nov 24, 2020

Choose a reason for hiding this comment

lguohan Nov 24, 2020

Choose a reason for hiding this comment

pavel-shirshov Nov 24, 2020

Choose a reason for hiding this comment

lguohan Nov 24, 2020

Choose a reason for hiding this comment

pavel-shirshov Nov 24, 2020

Choose a reason for hiding this comment

lguohan commented Nov 24, 2020

pavel-shirshov commented Nov 24, 2020

pavel-shirshov commented Nov 24, 2020

pavel-shirshov commented Nov 23, 2020 •

edited

Loading