Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display large number of FDB entries fast from state_db and appl_db #968

Closed
wants to merge 1 commit into from

Conversation

JiangboHe
Copy link
Contributor

- What I did
I change the mothod by showing mac table from state FDB_TABLE to display large number of fdb tables faster.
The time cost to display change from 50 seconds to 2 seconds for 8000 fdb tables or 20 seconds to 1 seconds for 5000 fdb tables.

- How I did it
The key search and judge of fdb tables in asic table is too complex. It will cost 20 seconds for 5000 fdb tables or 50 seconds for 8000 fdb tables. It is not good for users or script running.

- How to verify it
Learn thounds of fdb tables and show them with fdbshow

- Previous command output (if the output of a command-line utility has changed)

- New command output (if the output of a command-line utility has changed)

@JiangboHe JiangboHe changed the base branch from master to 201911 June 28, 2020 10:00
@jleveque jleveque requested a review from prsunny June 29, 2020 22:09
@renukamanavalan
Copy link
Contributor

retest this please

@prsunny
Copy link
Contributor

prsunny commented Jun 30, 2020

@anilkpandey , can you take a look? There was a performance optimization done as part of #529, and it is a different approach.

@JiangboHe
Copy link
Contributor Author

I see the improvement from 10 minutes to 17-20 seconds in Layer 2 Forwarding Enhancements. What I do is to show 8K mac in 2 seconds.

@prsunny
Copy link
Contributor

prsunny commented Jun 30, 2020

retest this please

@JiangboHe
Copy link
Contributor Author

I have test again.
It took 2s instead of 30s for displaying 8K fdb tables(or 13s for 5k, 7s for 3k).

@prsunny
Copy link
Contributor

prsunny commented Jul 2, 2020

I have test again.
It took 2s instead of 30s for displaying 8K fdb tables(or 13s for 5k, 7s for 3k).

ok, @qiluo-msft , can you take a look as well? improvement looks good to me

@qiluo-msft qiluo-msft requested a review from kcudnik July 3, 2020 01:36
scripts/fdbshow Outdated
@@ -93,7 +93,24 @@ class FdbShow(object):
self.bridge_mac_list.sort(key = lambda x: x[0])
return


def fetch_state_fdb_data(self):
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_state_fdb_data [](start = 8, length = 20)

fetch_state_fdb_data [](start = 8, length = 20)

@kcudnik Please help check.
The StateDB FDB is learned from syncd notification. So is it the same as AsicDB, for example, how about static fdb entries programmed? #Closed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, asic db learns those from fdb notification, static entries are created by user explicitly, so those comes from OA

Copy link
Contributor Author

@JiangboHe JiangboHe Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It does not process static fdb tables. It is a problem. #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you cannot find static FDB entries in StateDB, it will be inside AppDB. FYI if you want to further explore.


In reply to: 450189315 [](ancestors = 450189315)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, thats a good reason to have an option to display from state_db based on need. By default, and for small number of entries, it can use the existing option. But for large number, user can specify to display from state_db

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good idea. I will add the option for the command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add option "-f" to for displaying large number of FDB entries fastly from state_db and appl_db.
The static FDB entries are also displayed from appl_db in fast option.

root@sonic:/home/admin# fdbshow -h
usage: fdbshow [-h] [-p PORT] [-v VLAN] [-f]

Display ASIC FDB entries

optional arguments:
-h, --help show this help message and exit
-p PORT, --port PORT FDB learned on specific port: Ethernet0
-v VLAN, --vlan VLAN FDB learned on specific Vlan: 1001
-f, --fast Display large number of FDB entries fastly
root@sonic:/home/admin#
root@sonic:/home/admin# fdbshow -f
No. Vlan MacAddress Port Type


1     100  00:00:00:00:00:01  Ethernet1  dynamic
2     100  11:00:00:00:00:01  Ethernet1  static
3     100  11:00:00:00:00:02  Ethernet1  static
4     100  11:00:00:00:00:03  Ethernet1  static

Total number of entries 4

scripts/fdbshow Outdated

def __init__(self):
def __init__(self,fast):
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, [](start = 21, length = 1)

, [](start = 21, length = 1)

Add one space after , #Closed

scripts/fdbshow Outdated
super(FdbShow,self).__init__()
self.db = SonicV2Connector(host="127.0.0.1")
self.if_name_map, \
self.if_oid_map = port_util.get_interface_oid_map(self.db)
self.if_br_oid_map = port_util.get_bridge_port_map(self.db)
self.fetch_fdb_data()
self.fast=fast
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= [](start = 17, length = 1)

= [](start = 17, length = 1)

Add one blank before and after = #Closed

scripts/fdbshow Outdated
super(FdbShow,self).__init__()
self.db = SonicV2Connector(host="127.0.0.1")
self.if_name_map, \
self.if_oid_map = port_util.get_interface_oid_map(self.db)
self.if_br_oid_map = port_util.get_bridge_port_map(self.db)
self.fetch_fdb_data()
self.fast=fast
if (self.fast is True):
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (self.fast is True):
if self.fast:
``` #Closed

scripts/fdbshow Outdated
@@ -139,10 +175,11 @@ def main():
formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument('-p', '--port', type=str, help='FDB learned on specific port: Ethernet0', default=None)
parser.add_argument('-v', '--vlan', type=str, help='FDB learned on specific Vlan: 1001', default=None)
parser.add_argument('-f', '--fast', action='store_true', help='Display large number of FDB entries fastly')
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fastly [](start = 103, length = 6)

fastly [](start = 103, length = 6)

fast #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have changed codes as reviewed.

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

@JiangboHe JiangboHe changed the title show mac table from state FDB_TABLE to display large number of fdb tables faster Display large number of FDB entries fast from state_db and appl_db Sep 8, 2020
@JiangboHe
Copy link
Contributor Author

Hi, Do you know what's wrong with the check result "default — Build finished. No test results found." ?

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please keep old commits in future if you iterate in one PR. We have some internal tools to compare between commits. If you force push, the tool will confuse and lose the context of existing comments. Thanks!

@qiluo-msft
Copy link
Contributor

Retest this please

@prsunny
Copy link
Contributor

prsunny commented Sep 8, 2020

retest this please

@JiangboHe
Copy link
Contributor Author

Hi qiluo,
I have fix LGTM failure. But what should I do to fix the check result "default — Build finished. No test results found." ? I have tried by pushing a empty commit to rebuild, but get the same result.

@prsunny
Copy link
Contributor

prsunny commented Sep 9, 2020

retest this please

@qiluo-msft
Copy link
Contributor

08:56:41  Copied 323 artifacts from "vs » buildimage-vs-all" build number 420

This artifacts are wrong with old code, we already fixed in sonic-net/sonic-buildimage@d4fc8e5

@liat-grozovik
Copy link
Collaborator

@JiangboHe will this PR will be further handled and taken to 201911?

@JiangboHe
Copy link
Contributor Author

retest this please

@JiangboHe
Copy link
Contributor Author

@JiangboHe will this PR will be further handled and taken to 201911?
Yes, we found and fixed it in 201911 branch. It can also be merged to master.

@JiangboHe
Copy link
Contributor Author

I am confused by the default failure after "retest this please". I have tried several times. Do you known what caused it?
The output of build console is below:
15:59:29 Adding one-line test results to commit status...
15:59:29 Setting status of 1e93983 to FAILURE with url https://sonic-jenkins.westus2.cloudapp.azure.com/job/common/job/sonic-utilities-build-pr/3236/ and message: 'Build finished. No test results found.'
15:59:30 hudson.remoting.ProxyException: java.io.IOException: No space left on device
15:59:30 at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
15:59:30 at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
15:59:30 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
15:59:30 at sun.nio.ch.IOUtil.write(IOUtil.java:65)
15:59:30 at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
15:59:30 at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
15:59:30 at java.nio.channels.Channels.writeFully(Channels.java:101)
15:59:30 at java.nio.channels.Channels.access$000(Channels.java:61)
15:59:30 at java.nio.channels.Channels$1.write(Channels.java:174)
15:59:30 at java.io.OutputStream.write(OutputStream.java:75)
15:59:30 at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:255)
15:59:30 at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158)
15:59:30 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
15:59:30 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
15:59:30 at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
15:59:30 at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
15:59:30 Caused: java.io.IOException: Pipe is already closed
15:59:30 at hudson.remoting.PipeWindow.checkDeath(PipeWindow.java:122)
15:59:30 at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:220)
15:59:30 at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:124)
15:59:30 at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:108)
15:59:30 at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2315)
15:59:30 at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270)
15:59:30 at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291)
15:59:30 at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246)
15:59:30 at hudson.FilePath.copyFrom(FilePath.java:987)
15:59:30 at hudson.plugins.copyartifact.CopyArtifact.copyOne(CopyArtifact.java:617)
15:59:30 Caused: java.io.IOException: Failed to copy file:/var/jenkins_home/jobs/vs/jobs/buildimage-vs-201911/builds/215/archive/target/docker-nat.gz to /data/johnar/workspace/common/sonic-utilities-build-pr/buildimage/target/docker-nat.gz
15:59:30 at hudson.plugins.copyartifact.CopyArtifact.copyOne(CopyArtifact.java:634)
15:59:30 at hudson.plugins.copyartifact.CopyArtifact.copy(CopyArtifact.java:577)
15:59:30 at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:537)
15:59:30 at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:475)
15:59:30 at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80)
15:59:30 at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67)
15:59:30 at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
15:59:30 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
15:59:30 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
15:59:30 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
15:59:30 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
15:59:30 at java.lang.Thread.run(Thread.java:748)
15:59:30 Finished: FAILURE

@prsunny
Copy link
Contributor

prsunny commented Nov 30, 2020

@JiangboHe , is this fix already in master? Just wondering why this is raised to 201911?

@JiangboHe
Copy link
Contributor Author

JiangboHe commented Dec 1, 2020

@JiangboHe , is this fix already in master? Just wondering why this is raised to 201911?
No, we found and fixed it in 201911 branch. I can fixed it in master later, I have commit a pull request Azure/sonic-utilities/#1273 in master.

@prsunny
Copy link
Contributor

prsunny commented Dec 1, 2020

ok, in that case, lets get the master PR first and cherry-pick to 201911. We can use this PR, if the cherry-pick has conflicts. For now, you can close this.

@qiluo-msft
Copy link
Contributor

Close and favor the PR to master branch.

@qiluo-msft qiluo-msft closed this Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants