Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ycabled] fix exception-handling logic for ycabled #312

Merged
merged 8 commits into from
Nov 11, 2022

Conversation

vdahiya12
Copy link
Contributor

@vdahiya12 vdahiya12 commented Nov 2, 2022

Cherry-pick conflict
#306

Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
This PR is changing how exception handling is handled in python threading.
In python3.7 if child thread gets an exception, there is no way of knowing an exception occured, unless a join is called. The problem is resolved in this PR
Also task classes are structured this way

class YcableInfoUpdateTask(threading.Thread):

    def __init__(self):
        threading.Thread.__init__(self)
        self.exc = None
        
   def run(self):
        if self.task_stopping_event.is_set():
            return

        self.task_thread = threading.Thread(target=self.task_worker, args=(y_cable_presence,))
        self.task_thread.start()
        try:
            self.task_worker(self.y_cable_presence)
        except Exception as e:
            helper_logger.log_error("Exception occured at child thread YcableInfoUpdateTask due to {} {}".format(repr(e), traceback.format_exc()))

            self.exc = e

    def join(self):
        threading.Thread.join(self)

        if self.exc:
            raise self.exc

This allows helping catching the exception thrown by a child thread get caught along with traceback.
Main thread has a while loop, which monitors each thread, if not running will kill the whole process and supervisord will bring back the ycabled up.

Description

Motivation and Context

How Has This Been Tested?

Unit-tests and deploying changes on testbed

Additional Information (Optional)

sonic-mgmt tests which pass with this change

/var/src/sonic-mgmt-int/tests/logs/dualtor/test_ipinip.py::test_decap_active_tor
/var/src/sonic-mgmt-int/tests/logs/dualtor/test_ipinip.py::test_decap_standby_tor
/var/src/sonic-mgmt-int/tests/logs/dualtor/test_toggle_mux
/var/src/sonic-mgmt-int/dualtor_io/test_link_failure.py::test_standby_tor_downlink_down_downstream_active
/var/src/sonic-mgmt-int/tests/logs/platform_tests/api/test_chassis.py
In addition ran this script for testing the change on some Gemini devices
as well as switchover regression
time cost distribution

key: time cost

value: counts

i.e. 10:272 means 272 toggles took 0~10ms

{10: 272, 20: 217, 30: 3, 50: 3, 100: 3, 101: 3}

average (ms)

11.204035928143714

sudo cp <fw> /usr/share/sonic/firmware/.

declare -a PORTS='Ethernet0 Ethernet4 Ethernet8 Ethernet12 Ethernet16 Ethernet20 Ethernet40 Ethernet44 Ethernet48 Ethernet52 Ethernet56 Ethernet60 Ethernet64 Ethernet68 Ethernet72 Ethernet76 Ethernet80 Ethernet84 Ethernet104 Ethernet108 Ethernet112 Ethernet116 Ethernet120 Ethernet124'

declare -a PORTS1='Ethernet4'

for port in ${PORTS[@]}
do
  echo -e "\n\nname: $port"
  sudo show muxcable firmware version $port
  sleep 1
  sudo config muxcable firmware download <fw> $port
  sleep 10
  sudo show muxcable firmware version $port

  sudo config muxcable firmware activate $port
  sleep 10
  sudo show muxcable firmware version $port

  show mux status
  sleep 1
  sudo config mux mode auto $port
  sudo config mux mode active $port
  sleep 1
  show mux metrics $port
  sudo config mux mode auto $port
  sudo config mux mode standby $port
  sleep 1
  show mux metrics $port
  sleep 1
  show mux cableinfo $port
  sleep 1
  show mux alivecablestatus $port
  sleep 1
  show mux eyeinfo $port NIC
  show mux eyeinfo $port TORA
  show mux eyeinfo $port TORB
  sleep 1
  show mux pcsstatistics $port NIC
  sleep 1
  show mux pcsstatistics $port TORA
  sleep 1
  show mux pcsstatistics $port TORB
  sleep 1
  show mux fecstatistics $port NIC
  sleep 1
  show mux fecstatistics $port TORA
  sleep 1
  show mux fecstatistics $port TORB
  sleep 1
  show mux get-fec-anlt-speed $port
  sleep 1
  show mux hwmode mux $port
  sleep 1
  show mux hwmode switch $port
  sleep 1
  redis-cli -n 6 hgetall "HW_MUX_CABLE_TABLE|$port"
  sleep 1
  redis-cli -n 6 hgetall "MUX_CABLE_INFO|$port"
  sleep 1
  redis-cli -n 6 hgetall "MUX_CABLE_STATIC_INFO|$port"
  sleep 1
  redis-cli -n 6 hset "TRANSCEIVER_STATUS|$port" "status" "0"
  sleep 1
  sudo config mux mode auto $port
  sudo config mux mode active $port
  sleep 1
  show mux metrics $port
  sudo config mux mode auto $port
  sudo config mux mode standby $port
  sleep 1
  show mux metrics $port
  sleep 1
  show mux cableinfo $port
  sleep 1
  show mux alivecablestatus $port
  sleep 1
  show mux eyeinfo $port NIC
  show mux eyeinfo $port TORA
  show mux eyeinfo $port TORB
  sleep 1
  show mux pcsstatistics $port NIC
  sleep 1
  show mux pcsstatistics $port TORA
  sleep 1
  show mux fecstatistics $port NIC
  sleep 1
  show mux fecstatistics $port TORA
  sleep 1
  show mux fecstatistics $port TORB
  sleep 1
  show mux get-fec-anlt-speed $port
  sleep 1
  show mux hwmode mux $port
  sleep 1
  show mux hwmode switch $port
  sleep 1
  redis-cli -n 6 hgetall "HW_MUX_CABLE_TABLE|$port"
  sleep 1
  redis-cli -n 6 hgetall "MUX_CABLE_INFO|$port"
  sleep 1
  redis-cli -n 6 hgetall "MUX_CABLE_STATIC_INFO|$port"
  sleep 1
  redis-cli -n 6 hset "TRANSCEIVER_STATUS|$port" "status" "1"
  sleep 1
  show mux status
  sleep 1
  sudo config mux mode auto $port
  sudo config mux mode active $port
  sleep 1
  show mux metrics $port
  sudo config mux mode auto $port
  sudo config mux mode standby $port
  sleep 1
  show mux metrics $port
  sleep 1
  show mux cableinfo $port
  sleep 1
  show mux alivecablestatus $port
  sleep 1
  show mux eyeinfo $port NIC
  show mux eyeinfo $port TORA
  show mux eyeinfo $port TORB
  sleep 1
  show mux pcsstatistics $port NIC
  sleep 1
  show mux pcsstatistics $port TORA
  sleep 1
  show mux pcsstatistics $port TORB
  sleep 1
  show mux fecstatistics $port NIC
  sleep 1
  show mux fecstatistics $port TORA
  sleep 1
  show mux fecstatistics $port TORB

done

No exceptions observed, ycabled seems healthy with true thread count

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 2, 2022

This pull request introduces 4 alerts when merging bf9881f into aacb772 - view on LGTM.com

new alerts:

  • 4 for Illegal raise

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 7, 2022

This pull request introduces 4 alerts when merging ebd7d4f into 3d5470d - view on LGTM.com

new alerts:

  • 4 for Illegal raise

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 7, 2022

This pull request introduces 4 alerts when merging b1b40eb into 3d5470d - view on LGTM.com

new alerts:

  • 4 for Illegal raise

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 9, 2022

This pull request introduces 4 alerts when merging 35a8345 into 3d5470d - view on LGTM.com

new alerts:

  • 4 for Illegal raise

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 9, 2022

This pull request introduces 4 alerts when merging 8480ff3 into 3d5470d - view on LGTM.com

new alerts:

  • 4 for Illegal raise

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 10, 2022

This pull request introduces 4 alerts when merging 57626c8 into 3d5470d - view on LGTM.com

new alerts:

  • 4 for Illegal raise

@vdahiya12 vdahiya12 merged commit 8ec96c0 into sonic-net:202012 Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants