Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building NAT Gateway in load balancing mode based on ECMP does not work fine #112

Open
zhanrox2 opened this issue Jan 4, 2022 · 4 comments

Comments

@zhanrox2
Copy link

zhanrox2 commented Jan 4, 2022

We plan to build two NAT Instances in load balancing mode based on ECMP, but during our testing, we found that ECMP has flaws and needs to be confirmed。Packets from the same session ( based five-tuple) will be assigned to multiple different nexthops by ECMP,this can an be reproduced by the following operations.

		         r2 (port: r2-edge, distribute gateway port, bind to ha-chassis-group grp001 )     |       -- br-ex (Chassis A)
		       /                                                                                   |
		     /                                                                                     |
 vm -- sw1 -- r1                                                                                          edge
		     \                                                                                     |
		       \                                                                                   |
		          r3 (port: r3-edge, distribute gateway port, bind to ha-chassis-group grp002 )    |       -- br-ex (Chassis B)

1. Build network topology -- Routers And Switches

# ovn-nbctl lr-add r1
# ovn-nbctl lrp-add r1 r1-sw1 00:00:c0:a8:01:fe 192.168.1.254/24

# ovn-nbctl ls-add sw1
# ovn-nbctl lsp-add sw1 port1
# ovn-nbctl lsp-set-addresses port1 "00:00:c0:a8:01:01 192.168.1.1"
# ovn-nbctl lsp-add sw1 sw1-r1
# ovn-nbctl lsp-set-type sw1-r1 router
# ovn-nbctl lsp-set-addresses sw1-r1 00:00:c0:a8:01:fe
# ovn-nbctl lsp-set-options sw1-r1 router-port=r1-sw1

2. Build network topology -- Simulate VM C01

# ip netns add c01
# ip link add veth0 type veth peer name veth1
# ip link set veth1 netns c01
# ip netns exec c01 ip link set lo up
# ip netns exec c01 ip link set veth1 up
# ip netns exec c01 ip link set veth1 address 00:00:c0:a8:01:01
# ip netns exec c01 ip addr add 192.168.1.1/24 dev veth1
# ip netns exec c01 ip route add default via 192.168.1.254
# ip link set veth0 up
# ovs-vsctl add-port br-int veth0
# ovs-vsctl set Interface veth0 external_ids:iface-id=port1

3. Build network topology -- Build Two NAT Gateway Instances R2&R3

# ovn-nbctl lrp-add r1 r1-r2 00:00:a9:fe:01:01 169.254.1.1/30 peer=r2-r1
# ovn-nbctl lrp-add r1 r1-r3 00:00:a9:fe:01:05 169.254.1.5/30 peer=r3-r1

# ovn-nbctl lr-add r2
# ovn-nbctl lrp-add r2 r2-r1 00:00:a9:fe:01:02 169.254.1.2/30 peer=r1-r2

# ovn-nbctl lr-add r3
# ovn-nbctl lrp-add r3 r3-r1 00:00:a9:fe:01:06 169.254.1.6/20 peer=r1-r3

# ovn-nbctl ls-add edge
# ovn-nbctl lsp-add edge localnet-port
# ovn-nbctl lsp-set-addresses localnet-port unknown
# ovn-nbctl lsp-set-type localnet-port localnet
# ovn-nbctl lsp-set-options localnet-port network_name=underlay

# ovn-nbctl lsp-add edge edge-r2
# ovn-nbctl lsp-set-type edge-r2 router
# ovn-nbctl lsp-set-addresses edge-r2 00:00:10:01:01:01
# ovn-nbctl lsp-set-options edge-r2 router-port=r2-edge

# ovn-nbctl lsp-add edge edge-r3
# ovn-nbctl lsp-set-type edge-r3 router
# ovn-nbctl lsp-set-addresses edge-r3 00:00:10:01:01:05
# ovn-nbctl lsp-set-options edge-r3 router-port=r3-edge

// Create ha-chassis-group grp001,add chassis A to grp001
# ovn-nbctl ha-chassis-group-add grp001
# ovn-nbctl ha-chassis-group-add-chassis grp001 464ab7f4-06ff-4417-8424-b9e8bdbd922c 30

// Create ha-chassis-group grp002,add chassis B to grp002
# ovn-nbctl ha-chassis-group-add grp002
# ovn-nbctl ha-chassis-group-add-chassis grp002 f5e3a5f7-b600-4763-af99-eb4e54f8c1f6 30

// Distribute Gateway Router Port r2-edge bind to ha-chassis-group grp001
# ovn-nbctl lrp-add r2 r2-edge 00:00:10:01:01:01 10.1.1.1/30
# ovn-nbctl set Logical_Router_Port  r2-edge ha_chassis_group=af7d9487-a1dd-4473-ae20-5384926ffbb7

// Distribute Gateway Router Port r3-edge bind to ha-chassis-group grp002
# ovn-nbctl lrp-add r3 r3-edge 00:00:10:01:01:05 10.1.1.5/30
# ovn-nbctl set Logical_Router_Port  r3-edge ha_chassis_group=59399aa1-a316-4538-892b-087e17d0e4c3

// Distribute Gateway Router r2-edge & r3-edge both enable SNAT
# ovn-nbctl lr-nat-add r2 snat 10.1.1.1 192.168.1.0/24
# ovn-nbctl lr-nat-add r3 snat 10.1.1.5 192.168.1.0/24

// Add outbound routing& backhaul routing on R2 & R3
# ovn-nbctl lr-route-add r2 "0.0.0.0/0" 10.1.1.2
# ovn-nbctl lr-route-add r2 "192.168.1.0/24" 169.254.1.1
# ovn-nbctl lr-route-add r3 "0.0.0.0/0" 10.1.1.6
# ovn-nbctl lr-route-add r3 "192.168.1.0/24" 169.254.1.5

4. Build network topology -- Connected To Underlay Network

Chassis A

# ovs-vsctl add-br br-ex
# ovs-vsctl add-port br-ex ens38
# ip link set ens38 up
# ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=underlay:br-ex

Chassis B

# ovs-vsctl add-br br-ex
# ovs-vsctl add-port br-ex ens38
# ip link set ens38 up
# ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=underlay:br-ex

5. Build network topology -- Configure ECMP routing on R1, Point to R2&R3 at the same time

# ovn-nbctl --policy=dst-ip --ecmp lr-route-add r1 172.16.138.0/24 169.254.1.2
# ovn-nbctl --policy=dst-ip --ecmp lr-route-add r1 172.16.138.0/24 169.254.1.6

6. Test

We test the connection to the remote server 172.16.138.254 in the virtual machine c01 on Chassis A,At the same time, we capture packets on Chaasis A and Chassis B。 We found that the same session (the same source port 55346) discovered from the virtual machine c01 will come out from R2 (external ip 10.1.1.1) and R3 (external ip 10.1.1.5) respectively, resulting in abnormal telnet connection,as follows:

VM C01

#  ip netns exec c01 bash
#  telnet 172.16.138.110 22
Trying 172.16.138.110...
Connected to 172.16.138.110.
Escape character is '^]'.
Connection closed by foreign host.

Chassis A

# tcpdump -eni ens38 -vvv
16:12:21.873921 00:00:10:01:01:01 > 00:0c:29:dd:94:49, ethertype IPv4 (0x0800), length 74: (tos 0x10, ttl 62, id 1594, offset 0, flags [DF], proto TCP (6), length 60)
	10.1.1.1.55346 > 172.16.138.110.ssh: Flags [S], cksum 0x26eb (correct), seq 3898881646, win 29200, options [mss 1460,sackOK,TS val 3233592747 ecr 0,nop,wscale 7], length 0
16:12:21.874196 00:0c:29:dd:94:49 > 00:00:10:01:01:01, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
	172.16.138.110.ssh > 10.1.1.1.55346: Flags [S.], cksum 0xec1d (correct), seq 3286967432, ack 3898881647, win 28960, options [mss 1460,sackOK,TS val 3293744869 ecr 3233592747,nop,wscale 7], length 0
16:12:22.883901 00:0c:29:dd:94:49 > 00:00:10:01:01:01, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
	172.16.138.110.ssh > 10.1.1.1.55346: Flags [S.], cksum 0xe82b (correct), seq 3286967432, ack 3898881647, win 28960, options [mss 1460,sackOK,TS val 3293745879 ecr 3233592747,nop,wscale 7], length 0
16:12:24.931402 00:0c:29:dd:94:49 > 00:00:10:01:01:01, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
	172.16.138.110.ssh > 10.1.1.1.55346: Flags [S.], cksum 0xe02c (correct), seq 3286967432, ack 3898881647, win 28960, options [mss 1460,sackOK,TS val 3293747926 ecr 3233592747,nop,wscale 7], length 0
16:12:24.931640 00:00:10:01:01:01 > 00:0c:29:dd:94:49, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 40)
	10.1.1.1.55346 > 172.16.138.110.ssh: Flags [R], cksum 0x7743 (correct), seq 3898881647, win 0, length 0
16:12:27.112644 00:0c:29:dd:94:49 > 00:00:10:01:01:01, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.1.1 tell 10.1.1.2, length 46
16:12:27.115094 00:00:10:01:01:01 > 00:0c:29:dd:94:49, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.1.1.1 is-at 00:00:10:01:01:01, length 46

Chassis B

# tcpdump -eni ens38 -vvv
16:12:22.329406 00:00:10:01:01:05 > 00:0c:29:dd:94:53, ethertype IPv4 (0x0800), length 66: (tos 0x10, ttl 62, id 1595, offset 0, flags [DF], proto TCP (6), length 52)
	10.1.1.5.55346 > 172.16.138.110.ssh: Flags [.], cksum 0x8b20 (correct), seq 3898881647, ack 3286967433, win 229, options [nop,nop,TS val 3233592748 ecr 3293744869], length 0
16:12:22.329727 00:0c:29:dd:94:53 > 00:00:10:01:01:05, ethertype IPv4 (0x0800), length 60: (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
	172.16.138.110.ssh > 10.1.1.5.55346: Flags [R], cksum 0xad9e (correct), seq 3286967433, win 0, length 0
16:12:23.338420 00:00:10:01:01:05 > 00:0c:29:dd:94:53, ethertype IPv4 (0x0800), length 66: (tos 0x10, ttl 62, id 1596, offset 0, flags [DF], proto TCP (6), length 52)
	10.1.1.5.55346 > 172.16.138.110.ssh: Flags [.], cksum 0x872f (correct), seq 0, ack 1, win 229, options [nop,nop,TS val 3233593757 ecr 3293744869], length 0
16:12:23.338678 00:0c:29:dd:94:53 > 00:00:10:01:01:05, ethertype IPv4 (0x0800), length 60: (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
	172.16.138.110.ssh > 10.1.1.5.55346: Flags [R], cksum 0xad9e (correct), seq 3286967433, win 0, length 0
16:12:27.566972 00:0c:29:dd:94:53 > 00:00:10:01:01:05, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.1.5 tell 10.1.1.6, length 46
16:12:27.567659 00:00:10:01:01:05 > 00:0c:29:dd:94:53, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.1.1.5 is-at 00:00:10:01:01:05, length 46

Can anyone tell me whether ECMP can be used in this scenario or is there a problem with the configuration ? @hzhou8

@oilbeater
Copy link
Contributor

oilbeater commented Jan 6, 2022

We meet similar problems. A further investigation show that the packet in the same session might be hashed to a different value when tcp session is closing.

I'm not sure if dp_hash is based on five-tuple, I guess other metada also join the hash processing.

@zhanrox2
Copy link
Author

zhanrox2 commented Jan 6, 2022

One more thing, I tested it based on [ OVN 20.12 + OVS 2.15.0 ] 、 [ OVN 21.12 + OVS 2.16.9 ], and the problems described above will appear. In addition when using ECMP, snat action occasionally does not convert.

@oilbeater
Copy link
Contributor

@zhanrox2 I suspect that dp_hash might based on some flow metadata and might hash to a different value when the flows change. I try to change the hash method from dp_hash to hash with src_ip in kubeovn@71f831b and it works in my environment. You can have a try

oilbeater added a commit to kubeovn/kube-ovn that referenced this issue Feb 10, 2022
oilbeater added a commit to kubeovn/kube-ovn that referenced this issue Feb 11, 2022
hongzhen-ma pushed a commit to kubeovn/kube-ovn that referenced this issue Feb 28, 2022
hongzhen-ma added a commit to kubeovn/kube-ovn that referenced this issue Feb 28, 2022
hongzhen-ma added a commit to kubeovn/kube-ovn that referenced this issue Feb 28, 2022
@dceara
Copy link
Collaborator

dceara commented Jul 13, 2023

Sorry for the long time to get this fixed, it's indeed a problem with dp-hash in the kernel. It's not a "real" stable hash. I posted a fix to change to the L4 symmetric hash algorithm (which now is supported in the kernel too):
https://patchwork.ozlabs.org/project/ovn/patch/20230713143810.347542-1-dceara@redhat.com/

ovsrobot pushed a commit to ovsrobot/ovn that referenced this issue Jul 13, 2023
Regular dp-hash is not a canonical L4 hash (at least with the netlink
datapath).  If the datapath supports l4 symmetrical dp-hash use that one
instead.

Reported-at: ovn-org#112
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2188679
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
dceara added a commit to dceara/ovn that referenced this issue Jul 17, 2023
Regular dp-hash is not a canonical L4 hash (at least with the netlink
datapath).  If the datapath supports l4 symmetrical dp-hash use that one
instead.

Reported-at: ovn-org#112
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2188679
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Ales Musil <amusil@redhat.com>
dceara added a commit to dceara/ovn that referenced this issue Aug 15, 2023
Regular dp-hash is not a canonical L4 hash (at least with the netlink
datapath).  If the datapath supports l4 symmetrical dp-hash use that one
instead.

Reported-at: ovn-org#112
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2188679
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Ales Musil <amusil@redhat.com>
(cherry picked from commit 596ea7a)
dceara added a commit to dceara/ovn that referenced this issue Aug 15, 2023
Regular dp-hash is not a canonical L4 hash (at least with the netlink
datapath).  If the datapath supports l4 symmetrical dp-hash use that one
instead.

Reported-at: ovn-org#112
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2188679
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Ales Musil <amusil@redhat.com>
(cherry picked from commit 596ea7a)
dceara added a commit to dceara/ovn that referenced this issue Sep 27, 2023
Regular dp-hash is not a canonical L4 hash (at least with the netlink
datapath).  If the datapath supports l4 symmetrical dp-hash use that one
instead.

Reported-at: ovn-org#112
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2188679
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Ales Musil <amusil@redhat.com>
(cherry picked from commit 596ea7a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants