Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge last master #19

Merged
merged 51 commits into from
May 17, 2022
Merged

Merge last master #19

merged 51 commits into from
May 17, 2022

Conversation

VadymYashchenko
Copy link
Owner

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

alexrallen and others added 30 commits April 29, 2022 20:50
…10675)

Update SDK/FW to 4.5.1500/2010.1500 and SAI version to 1.21.1.1

SDK/FW features:
1. Added support for Finisar DR4 (FTCD4523E2PCM) on Spectrum-2 and Spectrum-3 systems.

SAI Features:
1. ECMP overlay support for IPv6
2. BFD offloading / 4K scale
3. Host interface user traps + improved trap registration (table entry)
4. gcc11 compilation fixes
5. Read support for ACL redirect action
6. Optimize ECMP DB size
7. Buffer descriptors new defaults
8. Updated port mapping for SN2201

SAI Fixes:
1. Debug counter removal when configured with all drop reasons

- Why I did it
Upgrade Mellanox SDK and SAI versions to latest

- How I did it
Updated submodule pointers

- How to verify it
Regression tested
Add the following commits:

- [orchagent, crm]: Reset crm threshold exceed count when threshold type changed 5ba6a54
- [pbh] [aclorch] Fixed a bug causes by updating the flow-counter value for the PBH rule 841f003
- [ACL]Avoid incrementing crm count when ACL rule create fails 3d3364f
- set remote vtep the netdev down before delete 7f53db7
- Removing Vnet with scope default 2ea8581
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
Why I did it
The buffer pool & profile setting in buffer template was not correct and caused the errors like the following:

ERR swss#orchagent: :- parseReference: malformed reference:[BUFFER_PROFILE|ingress_lossless_profile]. Must not be surrounded by [ ]

How I did it
Fix the buffer pool & profile setting by removing "[]".

How to verify it
Loaded image with this fix in a switch and made sure the error was not seen anymore.
#### Why I did it
Need to pass LY_CTX_DISABLE_SEARCHDIR_CWD to Context in order to disable automatically searching for schemas in current working directory (which is by default searched automatically)

#### How I did it
add additional attribute into YANG context

#### How to verify it
Create some invalid link on switch :
1) **ln -s /usr/abc xxx**
2) run **spm list**
--> There should not be these messages:
```
libyang[1]: Unable to get information about "xxx" file in "/tmp" when searching for (sub)modules (No such file or directory)
libyang[1]: Unable to get information about "xxx" file in "/tmp" when searching for (sub)modules (No such file or directory)
libyang[1]: Unable to get information about "xxx" file in "/tmp" when searching for (sub)modules (No such file or directory)
libyang[1]: Unable to get information about "xxx" file in "/tmp" when searching for (sub)modules (No such file or directory)
```
Co-authored-by: ecsonic <ecsonic@edge-core.com>
Why I did it
The customer report of the PCIe Bus Errors upon the SDK initialization of as7816-64x.

How I did it
Based on the internal info and discussion, update "pcie_aspm=off" into ONIE_PLATFORM_EXTRA_CMDLINE_LINUX of installer.conf to resolve it.
…10687)

- Why I did it
To add support for 800G speed for port in the yang.

- How I did it
Change limitation from 400G to 800G.

- How to verify it
Set a port speed to 800G and run the yang DB validation. e.g. by using dynamic port breakout.
Why I did it
When lldpmgrd handled events of other tables besides PORT_TABLE, error message was printed to log.

How I did it
Handle event according to its file descriptor instead of looping all registered selectables for each coming event.

How to verify it
I verified same events are being handled by printing events key and operation, before and after the change.
Also, before the change, in init flow after config reload, when lldpmgrd handled events of other tables besides PORT_TABLE, error messages were printed to log, this issue is solved now.
closes #10157

Why I did it
Add yang model for the bgp_internal_neighbor table in config_db

How I did it
Add new yang model file and unit tests

How to verify it
UT and compile sonic_yang_models-1.0-py3-none-any.whl and sonic_yang_mgmt-1.0-py3-none-any.whl

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com

Why I did it
Fixes #10158

How I did it
Add yang model for config_db table BGP_VOQ_CHASSIS_NEIGHBOR and UT
Signed-off-by: Neetha John nejo@microsoft.com

Why I did it
Address build failures due to sonic config engine unit tests failing. Failures are due to referencing format used in Arista 7800 sample output for buffer template

How I did it
Remove referencing format

How to verify it
Sonic config engine wheel should be built successfully
#### Why I did it

Fix issue: Non compliant leaf list in config_db schema: #9801

#### How I did it

The basic flow of DPB is like:
1.	Transfer config db json value to YANG json value, name it “yangIn”
2.	Validate “yangIn” by libyang
3.	Generate a YANG json value to represent the target configuration, name it “yangTarget”
4.	Do diff between “yangIn” and “yangTarget”
5.	Apply the diff to CONFIG DB json and save it back to DB
 
The fix:
•	For step #1, If value of a leaf-list field string type, transfer it to a list by splitting it with “,” the purpose here is to make step#2 happy. We also need to save <table_name>.<key>.<field_name> to a set named “leaf_list_with_string_value_set”.
•	For step#5, loop “leaf_list_with_string_value_set” and change those fields back to a string.


#### How to verify it

1. Manual test
2. Changed sample config DB and unit test passed
Includes below commits
```
1477c36 2022-05-04 | Fix: if routestr does not exist, skip (#257) [Qi Luo]
5c08435 2022-05-04 | Add VoQ Recirc interface (i.e., Ethernet-Rec) to interface maps for S… (#244) [Song Yuan]
57f1af6 2022-05-02 | Fix: not to use blocking get_all() after keys() (#255) [Qi Luo]
33fdf9d 2022-04-06 | [RFC2737, RFC3433] Exclude RJ45 port from Entity MIB and Entity sensor MIB (#247) [Kebo Liu]
```
…rvice (#10573)

Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:

Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.

How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.

How to verify it
Manual test.
Why I did it
Support OpenSSL FIPS 140-3, see design doc: https://github.com/Azure/SONiC/blob/master/doc/fips/SONiC-OpenSSL-FIPS-140-3.md.

How I did it
Install the fips packages.
To build the fips packages, see https://github.com/Azure/sonic-fips
Azure pipelines: https://dev.azure.com/mssonic/build/_build?definitionId=412

How to verify it
Validate the SymCrypt engine:

admin@sonic:~$ dpkg-query -W | grep openssl
openssl 1.1.1k-1+deb11u1+fips
symcrypt-openssl        0.1

admin@sonic:~$ openssl engine -v | grep -i symcrypt
(symcrypt) SCOSSL (SymCrypt engine for OpenSSL)
admin@sonic:~$
Some places were not correctly setting the HTTPS proxy, and were only
setting the HTTP proxy. This was fine until Docker 20.10.10, which then
started using `https_proxy` for HTTPS connections.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Python 2 support for sonic-pcied was removed, and the Python 2 version
of the variable no longer exists.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>

Why I did it
submodule update for the following commits

7a203b1  [chassis] Add new tables in counter db for Voq counter support. (#530)
5effea3 add new table schema for bgp profile (#608)
130dca5 [ci] Update azure pipeline branch variable reference.
708ed39 [ci] Parameterize pipeline and improve azure pipeline (#599)
9c08456 Added new P4RT tables. (#604)
[master][sonic-linkmgrd] submodule updates

df51322 Longxiang Lyu   Fri May 6 10:01:46 2022 +0800   Add `ActiveActiveStateMachine` implementation (#64)
e721ceb Jing Zhang      Wed May 4 10:07:14 2022 -0700   Add doc for default route related changes  (#63)
7bb06fb Jing Zhang      Tue May 3 09:48:28 2022 -0700   Add Cli support to enable or disable default route related feature (#68)
e4b02cb Jing Zhang      Mon May 2 13:27:54 2022 -0700   Reset WaitActiveUp count before switching to active (#70)
212d960 Jing Zhang      Wed Apr 27 10:35:05 2022 -0700  lower log level to warning (#69)
48abc9e Jing Zhang      Thu Apr 14 16:50:04 2022 -0700  Add support to enable switchover time measurement (with link prober interval decreased to 10ms) feature  (#61)
c4858a6 Jing Zhang      Thu Apr 14 11:27:55 2022 -0700  Avoid proactively switching to `active` if default route is missing  (#62)

sign-off: Jing Zhang zhangjing@microsoft.com
#### Why I did it
This function is critical for is_multi_asic() and SonicDBConfig initializing. No explicit reading ConfigDB. Otherwise it will implicitly trigger SonicDBConfig initializing.

#### How I did it
1. No explicit reading ConfigDB in get_asic_conf_file_path()
2. Collect asic_conf_path_candidates lazily to prevent any unnecessary side effect and improve the performance
#### Why I did it

To pick up new commits:
*  60d2467 Add depends to p4rt debian package

#### How I did it

update sonic-p4rt/sonic-pins submodule pointer

#### How to verify it

should be able to build with p4rt enabled.
Signed-off-by: Ze Gan <ganze718@gmail.com>

#### Why I did it
The SSCI is wrong in the output of MACsec so that the virtual SAI cannot parse the output corretly.
The wrong output:
```
142: macsec_eth1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
    cipher suite: GCM-AES-XPN-256, using ICV length 16
    TXSC: 5254008f4f1c0001 on SA 0
        0: PN 103, state on, key 12cbc4b64e26c9a1ba14d810da20d16e
 SSCI 33554432,    RXSC: 525400edac5b0001, state on
        0: PN 107, state on, key 12cbc4b64e26c9a1ba14d810da20d16e
    offload: off
```
Expected
```
142: macsec_eth1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
    cipher suite: GCM-AES-XPN-256, using ICV length 16
    TXSC: 5254008f4f1c0001 on SA 0
        0: PN 252, state on, SSCI 33554432, key 12cbc4b64e26c9a1ba14d810da20d16e
    RXSC: 525400edac5b0001, state on
        0: PN 264, state on, key 12cbc4b64e26c9a1ba14d810da20d16e
```
#### How I did it
Move SSCI before the key so that SSCI will not be the front of SC information.
Why I did it
The image size is too large, when there are multiple lazy packages and multiple platforms. It is not necessary to keep the lazy installation packages in multiple copies.
For cisco image, the image size will reduce from 3.5G to 1.7G.

How I did it
Use symbol links to only keep one package for each of the lazy package.
Make a new folder fsroot/platform/common
Copy the lazy packages into the folder.
When using a package in each of the platform, such as x86_64-grub, x86_64-8800_rp-r0, x86_64-8201_on-r0, etc, only make a symbol link to the package in the common folder.
* [PDDF] Rename temp for 7816/7326/7726

Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>

* Change naming to pddf device

Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
FuzailBrcm and others added 21 commits May 9, 2022 10:49
* [caclmgrd]Added logic to allow BFD port numbers
… in chassis when system is reboot (#10106)

Supervisor card fails to load config_db#.json in chassis when system reboot. 
This is an intermittent issue, fixes #10105
Why I did it
Fixes some pmon errors/warnings by providing missing configuration files

How I did it
Add missing pcie.yaml and sensors.conf for supported linecards

How to verify it
pcie-check should pass
sensors should display proper sensor names
… folder (#10777)

… folder

#### Why I did it
Update template to point to configuration.md in yang-model folder.
In Makefile.cache, for $(1)_DEP_PKGS_SHA, the intention is to include
the DEP_MOD_SHA and MOD_HASH of each of the current package's
dependencies. However, there's a level of dereferencing missing; instead
of grabbing the value of $(dfile)_DEP_MOD_SHA, it is literally using the
variable name $(dfile)_DEP_MOD_SHA. This means that the value of this
variable will not change when some dependency changes.

The impact of this is in transitive dependencies. For a specific
example, if there is some change in sairedis, then sairedis will be
rebuilt (because there's a change within that component), and swss will
be rebuilt (because it's a direct dependency), but
docker-swss-layer-buster will not get rebuilt, because only the direct
dependencies are effectively being checked, and those aren't changing.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
…and pmon to bullseye (#10580)

Fixes #9279

- Why I did it
Part of larger effort to move all SONiC systems to bullseye

- How I did it
1. Update container makefiles with correct dependencies
2. Update container Dockerfile with correct base image
3. Update container Dockerfile with correct apt dependencies
4. Update any other makefiles with dependencies to remove python2 support
5. Minor changes to support bullseye / python3

- How to verify it
Run regression on the switch:
1. Verify PTF community tests work
2. Verify syncd runs and all ports come up / pass traffic
3. Verify all platform tests succeed
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
Created SONiC Yang model for Kdump
Tables: KDUMP

#### How I did it
Defined Yang models for NAT based on Guideline doc:
https://github.com/Azure/SONiC/blob/master/doc/mgmt/SONiC_YANG_Model_Guidelines.md
and
https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md

#### How to verify it
Added test cases to verify it.
…moval issue (#10751)

Signed-off-by: mlok <marty.lok@nokia.com>
288c2d8 Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" (#2161)
bce4694 [autoneg] add support for remote speed advertisement (#2124)
a73f156 [show][vrf]Fixing show vrf to include vlan subinterface (#2158)
7a06457 [auto_ts] Enable register/de-register auto_ts config for APP Extension (#2139)
083ebcc Add transceiver-info items advertised for cmis-supported moddules (#2135)
0811214 Validate destination port is not LAG (#2053)
6ab1c51 [minigraph]  Consume golden_config_db.json while loading minigraph (#2140)
c37a957 [Kdump] Remove the duplicate logic if Kdump was disabled (#2128)
1143869 Ordering fix for sfpshow eeprom (#2113)
fdb79b8 Allow fw update for other boot type against on the previous "none" boot fw update (#2040)
a54a091 [GCU] Supressing YANG errors from libyang while sorting (#1991)
fbfa8bc [GCU] Enabling AddRack and adding RemoveRack tests (#2143)
d012be9 [Command-Reference] Add CLI docs for route flow counter (#2069)
8c07d59 [Mellanox] [reboot] [asan] stop asan-enabled containers on reboot (#2107)
697aae3 Fix speed parsing when speed is NOT fetched from APPL_DB (#2138)
22a388b [show] fix get routing stack routine (#2137)
cb3a047 Support option --ports of config qos reload for reloading ports' QoS and buffer configuration to default (#2125)
154a801 Enhance "config interface type/advertised-type" to be blocked on RJ45 ports  (#2112)
3732ac5 Add CLI for route flow counter feature (#2031)
29771e7 [techsupport] improve robustness (#2117)
f9dc681 [intfutil] Display RJ45 port and portchannel speed in 'M' instead of 'G' when it's <= 1000M (#2110)
781ae9f [config] Do not enable pfcwd for BmcMgmtToRRouter (#2136)
23e9398 [scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)
576c9ef [scripts/fast-reboot] stop timers in advance (#2131)
4dad79c bugfix: incorrect command for portchannel creation (#2134)
c17b1f4 [show][muxcable] Decrease the timeout for show mux status/hwmode (#2130)
49d61f8 [scripts/fast-reboot] cleanup (#2132)
52ca324 [config/config_mgmt.py]: Fix dpb issue with upper case mac in (#2066)
9e2fbf4 Update db_migrator to support `pfcwd_sw_enable` (#2087)
4010bd0 FGNHG CLI changes (#1588)
6bd54d0 Fix 'show mac' output when FDB entry for default vlan is None instead of 1 (#2126)
Signed-off-by: Ze Gan <ganze718@gmail.com>

Existing pools cannot test MACsec scenario, So I add sonictest-sonic-t0 pool that can run MACsec testcases.
Why I did it
To upgrade SSD firmware in initramfs while rebooting from SONiC to SONiC and during NOS to SONiC migration.

How I did it
New option 'ssd-upgrader-part’ is introduced in grub command line, to indicate the partition and its filesystem type in which the SSD firmware updater is present. ‘ssd-upgrader-part’ syntax is ssd-upgrader-part=<partition>,<filesystem type>. Example: ssd-upgrader-part=/dev/sda8,ext4

A new initramfs script ‘ssd-upgrade’ is included in init-premount and it invokes the SSD firmware updater (ssd-fw-upgrade) present in the partition indicated by the boot option 'ssd-upgrader-part'

How to verify it
In SONiC, the SSD firmware updater is copied to “/host/” directory.
Fast-reboot is to be initiated with the ‘-u’ option ([scripts/fast-reboot] Add option to include ssd-upgrader-part boot option with SONiC partition sonic-utilities#2150)
After reboot, while booting into SONiC the SSD firmware updater will be executed in initramfs.
…are status' (#10493)

Why I did it
To include ONIE version in show platform firmware status command output in DellEMC S6100 and Z9332f platforms.

How I did it
Include ‘ONIE’ in the list of components provided by platform APIs in DellEMC S6100 and Z9332f.
Unmount ONIE-BOOT if mounted using fast/soft/warm-reboot plugins in DellEMC S6100.
Why I did it
Fixes #10793

How I did it
Removed the switch_type validation from the Yang model.

How to verify it
compile sonic_yang_mgmt-1.0-py3-none-any.whl and sonic_yang_mgmt-1.0-py3-none-any.whl

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
…10806)

- Why I did it
Platform_reboot files for simx doesn't do aything different apart from calling /sbin/reboot. which is anyway done in the /usr/local/bin/reboot script i.e. the parent script which calls the platform specific reboot scripts if present.

Moreover, /sbin/reboot invoked in the platform specific reboot script is a non-blocking call and thus it returns back to the original script (although /sbin/reboot does it job in the background) and we see messages like this.

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Why I did it
Previous subport unit tests uses port channel names like PortChannel01, so for subport name generated PortChannel01.10, it exceeds Linux network interface name 15 char limit.

Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it
Modify PortChannel01 to PortChannel1.
This is part of HLD  sonic-net/SONiC#925

#### Why I did it
Add link-training support

#### How I did it
Update SONiC YANG for port link-training support

#### Description for the changelog
Add "link_training" to sonic-port.yang

#### Link to config_db schema for YANG module changes

https://github.com/sonic-net/SONiC/wiki/Configuration#port
…to storage (#10820)

Why I did it
Support to trigger a pipeline to download and publish artifacts to storage and container registry.
Support to specify the patterns which docker images to upload.

How I did it
Pass the pipeline information and the artifact information by pipeline parameters to the pipeline which will be triggered a new build. It is to decouple the artifacts generation and the publish logic, how and where the artifacts/docker images will be published, depends on the triggered pipeline.

How to verify it
Why I did it
Config db schema generated by minigraph should run yang validation.

How I did it
Modify run_script to add yang validation.

How to verify it
Run sonic-config-engine unit test.

Signed-off-by: Gang Lv ganglv@microsoft.com
@VadymYashchenko VadymYashchenko merged commit 89b47d1 into VadymYashchenko:master May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.