Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvrf_ip_rule_priority_change_to_32765: made changes to interfaces.j2 … #4058

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

mvrf_ip_rule_priority_change_to_32765: made changes to interfaces.j2 … #4058

wants to merge 2 commits into from

Conversation

kannankvs
Copy link
Collaborator

When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template. This code exists from the beginning; it is not clear on why this is required.

This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system.
When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]".
This l3mdev IP rule is never getting deleted even if VRF is deleted.
Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ".
This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue551.
More details and possible solutions are explained as comments in the Issue551.
This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address.
Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address.

@kannankvs
Copy link
Collaborator Author

retest please

@jleveque
Copy link
Contributor

@kannankvs: Can you update the PR to make it more easily understandable?

@jleveque
Copy link
Contributor

Retest this please

@kannankvs
Copy link
Collaborator Author

@kannankvs: Can you update the PR to make it more easily understandable?

@jleveque : Sure Joe. I think it will be easier if the issue, analysis, root cause and the possible solutions are listed here as follows.

** Issue **
Configure IP address for eth0 and then do "config vrf add mgmt" and "config mvrf del mgmt". When we log into the device with console port and Ping the IP of eth0 port directly, ping to self IP fails.

** Analysis **
Issue is not specific to management VRF. Issue is observed even with data vrf as follows.
a. By default, when SONiC is rebooted, following three ip rules are present.

root@sonic:~# ip rule show
1001:    from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default 

b. When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as follows.

root@sonic:~# ip link add vrf-blue type vrf table 10
root@sonic:~# ip rule show
1000:    from all lookup [l3mdev-table] 
1001:    from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default 
root@sonic:~#

c. Now, if any IP address is configured and if we try to ping it, it does not work. Configuring the IP address is adding an IP rule for eth0’s IP address, which creates this issue. This rule should not be having higher priority than the rule “1001:from all lookup local “. Self originated packets with source IP as eth0 IP will end up in looking up the “default” table (it should actually lookup “local” table first) which will result in wrong routing and hence ping will fail.

root@sonic:~# config interface ip add eth0 100.104.47.74/24 100.104.47.254
root@sonic:~# ping 100.104.47.74
PING 100.104.47.74 (100.104.47.74) 56(84) bytes of data.
 
root@sonic:~# ip rule show
1000:    from all lookup [l3mdev-table] 
1000:    from 100.104.47.74 lookup default 
1001:    from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default 
root@sonic:~#

** Root Cause **
Root cause is due to the “eth0 ip rule” that is getting added from interfaces.j2 (this code exists from beginning) when IP address is configured for eth0. The priority for this “eth0 rule” varies based on the already existing IP rules. When VRF was never created, this “eth0 rule” is getting added with priority 32765. This way, all self-originated packets will lookup into “local” routing table (rule priority is 1001) before looking up the “default” routing table (rule priority is 32767). When VRF is created for very first time, kernel is adding a rule for “l3mdev” with priority as 1000 which is never getting deleted even when VRF is deleted. Once if any VRF is created, the “eth0 rule” is getting added with priority 1000, which goes with higher priority than “1001:from all lookup local”, which results in ping failure.

** Possible Solutions: **
a. Option1: Remove the “eth0 ip rule” specific to interface IP address.
b. Option2: Retain the “eth0 rule” remain, but add priority as “32765” (instead of not specifying the preference), so that the rule is always having lower priority than the rule “1001:from all lookup local”.
c. Option3: Lower the priority for "1001: from all lookup local" to 0.

This PR is based on Option2 to retain the eth0 ip rule, but with priority 32765. If any other solution need to be chosen instead of Option2, let me know.

Let me know if you have further comments.

@kannankvs
Copy link
Collaborator Author

Retest this please

@tylerlinp
Copy link
Contributor

** Analysis **

a. Acctually the rule pref 1001 is modified by vrfmgrd from original pref 0 for lower than l3mdev-table(1000).
b. Yes.
c. Yes, If no vrf created, the pref would be 32765, then ping would be ok.
1001: from all lookup local
32765: from 100.104.47.74 lookup default
32766: from all lookup main
32767: from all lookup default

** Root Cause **

So I agree with you.

And there may be another issue when having mgmt-vrf, then ip rule will be:
1000: from all lookup [l3mdev-table]
1000: from 100.104.47.74 lookup 5000
1001: from all lookup local
32766: from all lookup main
32767: from all lookup default
Now eth0 in vrf mgmt, given same ip to another interface such as Ethernet1, then it would not work - the neighbor attached to Ethernet1 would ping failed. Because ip rule only check ip address regardless of vrf. So this rule 1000: from 100.104.47.74 lookup 5000 would affect global lookup.

** Possible Solutions: **

I prefer option 1 than 2. I think this rule is useless even harmful. Eventhough option 2 lower priority, it would not deal with the issue above very well. That causes mgmt-vrf affect global lookup. And option 3 cannot chose because it is a revert for earlier bug fix, higher priority is to avoid global lookup earlier than vrf.

@lguohan
Copy link
Collaborator

lguohan commented Apr 30, 2020

is this still needed?

@prsunny
Copy link
Contributor

prsunny commented Oct 30, 2020

@kannankvs , could you please resolve the conflict for the change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants