Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sfp-refactoring] xcvrd: add initial support for CMIS application initialization #217

Merged
merged 14 commits into from
Dec 9, 2021

Conversation

ds952811
Copy link
Contributor

@ds952811 ds952811 commented Oct 4, 2021

Signed-off-by: Dante Su dante.su@broadcom.com

Why I did it

Add initial support for CMIS application initialization

How I did it

Add CmisManagerTask to handle the per-port CMIS application initialization in parallel

How to verify it

  1. Bring up Quanta IX9 with the default configuration and CMIS 4.0/5.0 QSFPDD modules
  2. Issue 'show logging xcvrd' to check if the application initialization succeeded
  3. Issue 'show interfaces transceiver eeprom --dom' to verify both the static and DOM information parsers
  4. Issue 'sudo sfputil show eeprom --dom' to verify both the static and DOM information parsers

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

Repo PR title State
SONiC [sfp-refactoring] Initial support for CMIS application initialization
sonic-platform-common [sfp-refactoring] Initial support for CMIS application initialization
sonic-platform-daemon [sfp-refactoring] Initial support for CMIS application initialization

A picture of a cute animal (not mandatory but encouraged)

@lgtm-com
Copy link

lgtm-com bot commented Oct 4, 2021

This pull request introduces 3 alerts when merging f2b0adc into 6e4cf6f - view on LGTM.com

new alerts:

  • 3 for Except block handles 'BaseException'

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
@ds952811 ds952811 changed the title sfp: add initial support for CMIS [sfp-refactoring] xcvrd: add initial support for CMIS application initialization Nov 3, 2021
@lgtm-com
Copy link

lgtm-com bot commented Nov 5, 2021

This pull request introduces 1 alert when merging 999e156 into ddb22b9 - view on LGTM.com

new alerts:

  • 1 for Except block handles 'BaseException'

@lgtm-com
Copy link

lgtm-com bot commented Nov 9, 2021

This pull request introduces 1 alert when merging a61387c into 1fd3d16 - view on LGTM.com

new alerts:

  • 1 for Except block handles 'BaseException'

@lgtm-com
Copy link

lgtm-com bot commented Nov 12, 2021

This pull request introduces 1 alert when merging abe64de into 7b95211 - view on LGTM.com

new alerts:

  • 1 for Except block handles 'BaseException'

vendor_pn_str = transceiver_dict['model']
vendor_key = (vendor_name_str.strip() + '#' + vendor_pn_str.strip()).upper()

for k, v in g_dict[CMIS_MEDIA_SETTINGS_KEY].items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see that this could is optic specific(vendor_name_str + part number)?.
Can it be made it as generic?.
This code skips application code for flat memory/media types other than QSFP_DD, so can we apply the settings for other media types directly?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does all the CMIS vendors also need to specify "CMIS_MEDIA_SETTINGS" in media_settings.json in their platforms?.

Copy link
Contributor Author

@ds952811 ds952811 Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If the attached xcvr is not specified in CMIS_MEDIA_SETTINGS, or the CMIS_MEDIA_SETTINGS is missing, the CMIS application selection defaults to be enabled
  2. The CMIS registers of application selection are at page 0x10 and 0x11, hence it's not available on modules with Flat Memory.
  3. The idea for module-specific controls is for certain modules that have paged memory support while application selection is not necessary

Copy link
Contributor

@aravindmani-1 aravindmani-1 Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may include a condition to return failure for flat memory media. you can make use of the function defined in SFP refactored code. https://github.com/Azure/sonic-platform-common/blob/c8eceec53625b6c021b7c6619b2555d0c7a1b38d/sonic_platform_base/sonic_xcvr/api/public/cmis.py#L291

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"> 3. The idea for module-specific controls is for certain modules that have paged memory support while application selection is not necessary
"
How do you determine whether the specific media needs application selection?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thanks for feedback, although we already have a checker for flat media in the xcvrd, I agree it's better to have this in the cmis.py as well.
  2. Actually, the CMIS_MEDIA_SETTINGS is a backup plan for the faulty transceiver if any. And as of now, we don't really need this, please feel free to let me know if we should have it removed, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for a record, the CMIS_MEDIA_SETTINGS is already removed.


has_cmis = False
for sfp in platform_chassis.get_all_sfps():
if sfp.get_port_type() in [sfp.SFP_PORT_TYPE_QSFPDD]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more than just QSFPDD supports CMIS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, only QSFPDD (type=18h) is verified, we could circle back on OSFP(type=19h) once the sample is available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for a record, the get_port_type() checker is already removed in the latest commit

has_cmis = True
break

if not has_cmis:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modules may be plugged in or removed dynamically

Copy link
Contributor Author

@ds952811 ds952811 Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check here is to validate the cage type rather than the media type of the attached transceiver, hence it does not matter what's the module attached to right now.
The main purpose is to reduce the CPU/memory usage when the service is not appliable to the target unit.
e.g. For a box with only SFP and QSFP28 cages, the CMIS Manager will not be activated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for a record, the get_port_type() checker is already removed in the latest commit

if conf.get('application_selection', True):
self.log_notice("{}: application update for {}G, {}-lanes".format(
lport, int(speed/1000), len(lanes)))
ret = sfp.set_cmis_application(speed, len(lanes))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure that this is done after the ASIC SerDes taps are configured?

Copy link
Contributor Author

@ds952811 ds952811 Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're referring to the pre-emphasis configuration on the MAC, it's handled by the orchagent when the serdes tap parameters are uploaded to the APPL_DB, hence it's outside the scope of this PR, the serdes tap parameters are independent with this PR, and the framework of media_settings.json needs to be revised to support speed/mode switches between PAM4 and NRZ if necessary.
If you're referring to the pre-emphasis configuration on the PHY, it's not supported as of now, we could circle back on this when the module-specific parameters are available and needs to be applied to the CMIS PHY registers.

Copy link
Contributor Author

@ds952811 ds952811 Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: In the case of Broadocm ASIC, we have different value range for PAM4 and NRZ, hence having NRZ/PAM4 pre-emphasise applied to SerDes in PAM4/NRZ mode may or may not lead to checker failure in SAI, and in turn, the orchagent may or may not crash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that (per my understanding) the ASIC SerDes pre/main/post setitngs should be configured before the CMIS ApSel sequence is started. There doesn't seem to be anything that enforces this sequencing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need a synchronization mechanism between syncd/SAI and the xcvrd to gracefully switch in between the applications. Prince @ MSFT and I had this discussion yesterday, and he will be working on this, and we could deal with this in another PR

@lgtm-com
Copy link

lgtm-com bot commented Nov 18, 2021

This pull request introduces 4 alerts when merging a1d8e10 into e53ff26 - view on LGTM.com

new alerts:

  • 4 for Unused local variable

@@ -83,21 +88,28 @@ def logical_port_name_to_physical_port_list(self, port_name):
else:
return None

def subscribe_port_config_change(db_list=['CONFIG_DB']):
port_tbl_map = {
'APPL_DB': swsscommon.APP_PORT_TABLE_NAME,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll listen to APPL_DB for the PORT CONFIG changes, and STATE_DB for XCVR INSERTION/REMOVAL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should listen to PORT_TABLE in STATE_TB instead of PORT_TABLE in APP_DB because, when you get SET event for a port in APP_DB, the orchagent has still not called the sai api to create the port yet. When SET event is received for a port in STATE_DB then its confirmed the PORT create has completed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trick is, not all the fields in APPL_DB will be updated to STATE_DB by the orchagent as of now, it's part of synchronization issue between syncd and orchagent, it's outside the scope of this PR

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Nov 19, 2021

This pull request introduces 4 alerts when merging 2bf4580 into e53ff26 - view on LGTM.com

new alerts:

  • 4 for Unused local variable

@@ -8,7 +8,10 @@
class PortChangeEvent:
PORT_ADD = 0
PORT_REMOVE = 1
def __init__(self, port_name, port_index, asic_id, event_type):
PORT_SET = 2
PORT_DEL = 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it hard to differentiate the four actions here from the name, would you mind rephrasing?

Copy link
Contributor Author

@ds952811 ds952811 Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I could update the namings, but do you have any suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I could surely update these namings, but I don't know which one is better, could you please help me on this?


self.dbg_print("Stopped")

def task_run(self, y_cable_presence):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of having "y_cable_presence" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right, I'll remove it

return sel, asic_context


def handle_port_config_change(sel, asic_context, stop_event, port_mapping, logger, port_change_event_handler):
def handle_port_config_change(sel, asic_context, stop_event, port_mapping, logger, port_change_event_handler, promiscuous=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a separate handler for handling PORT_TABLE update events in APP_DB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the latest commit, the requested changes are already done

@@ -83,21 +88,28 @@ def logical_port_name_to_physical_port_list(self, port_name):
else:
return None

def subscribe_port_config_change(db_list=['CONFIG_DB']):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add another function to subscribe

subscribe_port_table_update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's subscribe_port_update_events() in the latest commit, do we need to have it rephrased to subscribe_port_table_update()?

port_mapping.get_logical_to_physical(key)[0],
asic_context[port_tbl],
PortChangeEvent.PORT_REMOVE)
if promiscuous:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the promiscuous check for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dead code that's already removed, please check the latest commit

has_cmis = False
for sfp in platform_chassis.get_all_sfps():
try:
ptype = sfp.get_port_type()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this check for checking the cage type. Don't want platforms to implement this new API for enabling CMIS datapath initialization. Its OK for one thread running and doing nothing (CMIS state will be in failed state for those platform having only SFP).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already removed in the earlier commit, please check it out

@@ -83,21 +88,28 @@ def logical_port_name_to_physical_port_list(self, port_name):
else:
return None

def subscribe_port_config_change(db_list=['CONFIG_DB']):
port_tbl_map = {
'APPL_DB': swsscommon.APP_PORT_TABLE_NAME,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should listen to PORT_TABLE in STATE_TB instead of PORT_TABLE in APP_DB because, when you get SET event for a port in APP_DB, the orchagent has still not called the sai api to create the port yet. When SET event is received for a port in STATE_DB then its confirmed the PORT create has completed.


self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_DP_DEINIT
elif state == self.CMIS_STATE_DP_DEINIT:
if api.get_module_state() != 'ModuleReady':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need timeout, if the module does NOT settle for 'ModuleReady' see page 1 byte 167

done = False
continue
if not done:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please bail out if not ConfigSuccess, otherwise we will keep continuing even though the configuration has failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

done = False
continue
if not done:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please bail out if datapath init failed. There is a timeout in page 1, byte 144

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is not available in the mem map as of now, hence we'll use a default expiration time for now, let's do the fine-grained expiration in phase 2 (DPB support)


# D.2.2 Software Deinitialization
api.set_datapath_deinit(host_lanes)
api.set_lpmode(True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure module indeed transitioned to low power.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but it looks like one of my CMIS4 optics will never be transitioned to ModuleLowPwr, hence I add 'ModuleReady' in the following checker

Comment on lines 1193 to 1310
if not done:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please bail out if the datapath has not transitioned to DataPathActivated . see page 1 , byte 168 for timeout for completing datapath tx turn ON.

commit a1d8e10
Author: Dante Su <dante.su@broadcom.com>
Date:   Thu Nov 18 12:45:01 2021 +0000

    okay

    Signed-off-by: Dante Su <dante.su@broadcom.com>

commit 11b6850
Author: Dante Su <dante.su@broadcom.com>
Date:   Fri Nov 12 07:39:34 2021 +0000

    Add custom media setting for CMIS

    Signed-off-by: Dante Su <dante.su@broadcom.com>

commit 6dd3608
Author: Dante Su <dante.su@broadcom.com>
Date:   Tue Nov 9 11:34:10 2021 +0000

    add missing unittest for CMIS

    Signed-off-by: Dante Su <dante.su@broadcom.com>

commit ceea223
Author: Dante Su <dante.su@broadcom.com>
Date:   Tue Nov 9 03:34:59 2021 +0000

    fix port remove event

    Signed-off-by: Dante Su <dante.su@broadcom.com>

commit 9be5135
Author: Dante Su <dante.su@broadcom.com>
Date:   Tue Nov 9 01:46:27 2021 +0000

    port_mapping: drop the hacks for CMIS

    Signed-off-by: Dante Su <dante.su@broadcom.com>

commit 501134b
Author: Dante Su <dante.su@broadcom.com>
Date:   Fri Nov 5 09:08:59 2021 +0000

    [sfp-refactoring] xcvrd: add initial support for CMIS application initialization

    Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
… type checker

Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
@prgeor prgeor merged commit 2b0acfb into sonic-net:master Dec 9, 2021
@judyjoseph
Copy link
Contributor

@prgeor are we planning to to take this SFP refractoring into 202111, Is this needed for 400G in Chassis ?

@prgeor
Copy link
Collaborator

prgeor commented Jan 17, 2022

This change is not needed if the default 400G application is needed.

vdahiya12 added a commit to vdahiya12/sonic-platform-daemons that referenced this pull request Apr 4, 2022
…e 1.2 (sonic-net#217)

This release goes in sync with the following firmware version of Broadcom Y cable, which is consistent with release 6
{
"version_nic_active": "D203.9.D103.1",
"version_nic_inactive": "D203.9.D103.1",
"version_nic_next": "D203.9.D103.1",
"version_peer_active": "D302.8",
"version_peer_inactive": "D302.8",
"version_peer_next": "D302.8",
"version_self_active": "D302.8",
"version_self_inactive": "D302.8",
"version_self_next": "D302.8"
}

Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com

Description
Basically a vendor specific implementation of abstract YCableBase class .
detailed design discussion can be found https://github.com/Azure/SONiC/pull/757/files

Motivation and Context
to support the Y-Cable API required to support Broadcom's Y-Cable.
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants