Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bgpcfg]: Batch bgp updates #6006

Merged
merged 5 commits into from
Nov 25, 2020

Conversation

pavel-shirshov
Copy link
Contributor

@pavel-shirshov pavel-shirshov commented Nov 23, 2020

- Why I did it
vtysh -f command is slow. It sometimes takes about 3 seconds.
When we need to run many vtysh -f commands that slows down the system.
With this fix bgcfgd configuration time reduced from 10.58 sec to 2.80 sec on my dut.

- How I did it

  1. Read as many messages as possible from the database
  2. Run through bgpcfgd all the messages, collecting frr updates;
  3. commit all updates as a batch

- How to verify it
Build an image and run on your dut. All tests must be working

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Pavel Shirshov added 2 commits November 20, 2020 18:35
vtysh -f command is slow. It is sometimes takes about 3 seconds.
When we need to run many vtysh -f commands that slows down the system.
Batch vtysh -f updates.
@lguohan
Copy link
Collaborator

lguohan commented Nov 23, 2020

@qiluo-msft , can you review this? @pavel-shirshov , do you have the performance number before and after?

@pavel-shirshov
Copy link
Contributor Author

@lguohan I added performance change to the description

@pavel-shirshov pavel-shirshov marked this pull request as ready for review November 23, 2020 21:10
return
else:
log_warn("Can't read daemon status from FRR: %s" % str(err))
time.sleep(0.1) # sleep 100 ms
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.sleep(0.1) # sleep 100 ms [](start = 12, length = 31)

What happens if no sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher cpu usage. I think it is better to wait for 0.1 than to burn cpu by requests.

qiluo-msft
qiluo-msft previously approved these changes Nov 23, 2020
Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runner.py LGTM. Not familiar with other parts.

@@ -90,14 +90,13 @@ def check_routemap_in_file(filename, route_map_name):
found_entry = False
found_seq_no = None
if route_map_re.match(line):
found_seq_no = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go to the other pr.

if self.changes.strip() == "":
return True
rc_write = self.frr.write(self.changes)
rc_restart = self.frr.restart_peer_groups(self.peer_groups_to_restart)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if it is always correct to restart peer group after we make any change. for example, if we shutdown a bgp session, which peer group to restart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.peer_groups_to_restart would be empty in 'shutdown a bgp session" situation and restart_peer_groups will do nothing in this case. I update self.peer_groups_to_restart in two cases now: BBR and allow_prefix.

ret_code, out, err = run_command(command)
if ret_code != 0:
err_tuple = tmp_filename, ret_code, out, err
log_err("ConfigMgr::commit(): can't push configuration from file='%s', rc='%d', stdout='%s', stderr='%s'" % err_tuple)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep this tmp file, or just put the tmpfile content into syslog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tmp file content it is full config on the start. Kind of a lot. Do we want to see it in the syslog?

@lguohan
Copy link
Collaborator

lguohan commented Nov 24, 2020

if we are doing this config bgp shutdown all, which managers are we triggering? I am not able to identify.

@pavel-shirshov
Copy link
Contributor Author

retest vs please

@pavel-shirshov pavel-shirshov merged commit 148436d into sonic-net:master Nov 25, 2020
abdosi pushed a commit that referenced this pull request Nov 25, 2020
* [bgpcfgd]: Batch bgp updates.

vtysh -f command is slow. It is sometimes takes about 3 seconds.
When we need to run many vtysh -f commands that slows down the system.
Batch vtysh -f updates.

* Use correct file to import run_command
wangxin added a commit to wangxin/sonic-buildimage that referenced this pull request Dec 2, 2020
The issue was a typo introduced in sonic-net#6006. In that change, the BGP allow list
configuration manager was updated to use a method of common ConfigMgr
for restarting peer groups. However, the method name 'restart_peers' was
used instead of the correct 'restart_peer_groups'.

This change updated the managers_allow_list.py to use correct method
'restart_peer_groups' for restarting peer groups.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
lguohan pushed a commit that referenced this pull request Dec 2, 2020
The issue was a typo introduced in #6006. In that change, the BGP allow list
configuration manager was updated to use a method of common ConfigMgr
for restarting peer groups. However, the method name 'restart_peers' was
used instead of the correct 'restart_peer_groups'.

This change updated the managers_allow_list.py to use correct method
'restart_peer_groups' for restarting peer groups.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
abdosi pushed a commit that referenced this pull request Dec 3, 2020
The issue was a typo introduced in #6006. In that change, the BGP allow list
configuration manager was updated to use a method of common ConfigMgr
for restarting peer groups. However, the method name 'restart_peers' was
used instead of the correct 'restart_peer_groups'.

This change updated the managers_allow_list.py to use correct method
'restart_peer_groups' for restarting peer groups.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
* [bgpcfgd]: Batch bgp updates.

vtysh -f command is slow. It is sometimes takes about 3 seconds.
When we need to run many vtysh -f commands that slows down the system.
Batch vtysh -f updates.

* Use correct file to import run_command
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
…ic-net#6088)

The issue was a typo introduced in sonic-net#6006. In that change, the BGP allow list
configuration manager was updated to use a method of common ConfigMgr
for restarting peer groups. However, the method name 'restart_peers' was
used instead of the correct 'restart_peer_groups'.

This change updated the managers_allow_list.py to use correct method
'restart_peer_groups' for restarting peer groups.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants