Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Multiple IPv6 MPREACH attributes in one UPDATE with group-updates true #493

Closed
subsecond opened this issue Sep 11, 2016 · 15 comments
Closed
Assignees
Labels

Comments

@subsecond
Copy link

Hi Thomas

I am using ExaBGP for a customer project and would like to express my gratitude at this point. ExaBGP does a great job when it comes to injecting routes into BGP. Thanks a lot for all your efforts!

Observed behavior
Recently, our customer started using IPv6 and injecting a first /128 route using ExaBGP. Everything worked well. Then the customer started to inject two more /128 routes with the same next-hop IPv6 address. Still, everything worked well.
Finally, the customer started setting different next-hops for each route and suddenly, the neighbor device kept learning just one route while simply ignoring the other two. I suspected this to be a bug in the firmware of the neighbor device and started digging.

Long story short
Setting "group-updates true" - which, according to the log will become the default in ExaBGP soon - leads to an incorrect update message sent by ExaBGP.

ExaBGP Version

root@exabgp-test:~# exabgp --version
ExaBGP : 3.4.16
Python : 2.7.12 (default, Jul  1 2016, 15:12:24)  [GCC 5.4.0 20160609]
Uname  : #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016

Example Config

group example {
 router-id <A.B.C.D>;
 local-as 65002;
 peer-as 65001;
 md5 ExamplePassword123;
 hold-time 180;
 group-updates true;

 neighbor <neighbor-ipv6-address> {
   description "IPv6 eBGP session with example";
   local-address <exabgp-ipv6-address>;

   family {
     ipv6 unicast;
   }

   static {
     route 2001:db8::1/128 next-hop <nexthop1-ipv6-address>;
     route 2001:db8::2/128 next-hop <nexthop2-ipv6-address>;
   }
 }
}

With group-updates enabled (group-updates true)

IP6 (flowlabel 0x93575, hlim 64, next-header TCP (6) payload length: 156) <exabgp-ipv6-address>.45712 > <neighbor-ipv6-address>.179: Flags [P.], cksum 0xbdb4 (correct), seq 65:181, ack 65, win 225, options [nop,nop,...], length 116: BGP, length: 116
    Update Message (2), length: 116
      Origin (1), length: 1, Flags [T]: IGP
      AS Path (2), length: 4, Flags [T]: 65002
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop1-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::1/128
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop2-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::2/128

With group-updates disabled (group-updates false)

IP6 (flowlabel 0xa3adf, hlim 64, next-header TCP (6) payload length: 115) <exabgp-ipv6-address>.44929 > <neighbor-ipv6-address>.179: Flags [P.], cksum 0x9436 (correct), seq 65:140, ack 65, win 225, options [nop,nop,...], length 75: BGP, length: 75
    Update Message (2), length: 75
      Origin (1), length: 1, Flags [T]: IGP
      AS Path (2), length: 4, Flags [T]: 65002
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop1-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::1/128
[...]
IP6 (flowlabel 0xa3adf, hlim 64, next-header TCP (6) payload length: 115) <exabgp-ipv6-address>.44929 > <neighbor-ipv6-address>.179: Flags [P.], cksum 0x133e (correct), seq 140:215, ack 65, win 225, options [nop,nop,...], length 75: BGP, length: 75
    Update Message (2), length: 75
      Origin (1), length: 1, Flags [T]: IGP
      AS Path (2), length: 4, Flags [T]: 65002
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop2-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::2/128

Important RFCs

RFC 4271 (BGP-4), section 6.3 states:

If any attribute appears more than once in the UPDATE message, then
the Error Subcode MUST be set to Malformed Attribute List.

RFC 7606 (Revised Error Handling for BGP), section 3.g. states:

If the MP_REACH_NLRI attribute or the MP_UNREACH_NLRI [RFC4760]
attribute appears more than once in the UPDATE message, then a
NOTIFICATION message MUST be sent with the Error Subcode
"Malformed Attribute List". If any other attribute (whether
recognized or unrecognized) appears more than once in an UPDATE
message, then all the occurrences of the attribute other than the
first one SHALL be discarded and the UPDATE message will continue
to be processed.

Conclusion

It appears that both the neighbor device as well as ExaBGP are not entirely following the RFCs:

  • ExaBGP should not have multiple MPREACH attributes in the same UPDATE message.
  • The neighbor device should return a "Malformed Attribute List" Error instead of processing the update message and discarding (some of) the routes.

May I kindly ask you to take a quick look at this and let me know if you agree? I have advised the customer to set "group-updates false" for the time being, so we are in no hurry at all.

Let me know if you need anything else from me.

Cheers,
Manuel

@subsecond
Copy link
Author

Referencing #298 as related.

@thomas-mangin
Copy link
Member

Thank you very much for this very clear bug report. I will fix the code soon.

@thomas-mangin thomas-mangin self-assigned this Sep 11, 2016
@thomas-mangin
Copy link
Member

The neighbor device should return a "Malformed Attribute List" Error instead of processing the update message and discarding (some of) the routes.
The peer is most likely implementing https://tools.ietf.org/html/rfc7606

@subsecond
Copy link
Author

The peer is most likely implementing https://tools.ietf.org/html/rfc7606

Well, I disagree. Here is why:

Section 3.g. in RFC 7606 starts with...

If the MP_REACH_NLRI attribute [...] appears more than once
in the UPDATE message

=> true for ExaBGP version 3.4.16 with "group-updates true".

then a NOTIFICATION message MUST be sent with the Error
Subcode "Malformed Attribute List".

=> false. I did neither see that NOTIFICATION message in the logs nor when I ran tcpdump on the neighbor / peer device. Therefore I expect the vendor to add that NOTIFICATION message to comply with RFC 7606.

Do you agree with me here?

@thomas-mangin
Copy link
Member

Yes, I had not read the RFC - just looked like a good hypothesis. I stand corrected.

@thomas-mangin
Copy link
Member

Could you please confirm if this patch fixes the issue (sorry I mistakenly wrote it for master, I will backport). I think the patch should apply again 3.4.

7ff86d0

@thomas-mangin
Copy link
Member

backported 69b2107

@subsecond
Copy link
Author

Thanks for the patch!

I have tested the backport to 3.4 and unfortunately, it still looks the same:

IP6 (flowlabel 0xa18ff, hlim 64, next-header TCP (6) payload length: 197) <exabgp-ipv6-address>.41485 > <neighbor-ipv6-address>.179: Flags [P.], cksum 0x2dc4 (correct), seq 65:222, ack 65, win 225, options [nop,nop,md5valid], length 157: BGP, length: 157
    Update Message (2), length: 157
      Origin (1), length: 1, Flags [T]: IGP
      AS Path (2), length: 4, Flags [T]: 65002 
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop1-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::1/128
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop2-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::2/128
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop3-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::3/128

while I would expect the packet to look something like this:

IP6 (flowlabel 0xa18ff, hlim 64, next-header TCP (6) payload length: 197) <exabgp-ipv6-address>.41485 > <neighbor-ipv6-address>.179: Flags [P.], cksum 0x2dc4 (correct), seq 65:222, ack 65, win 225, options [nop,nop,md5valid], length 157: BGP, length: 157
    Update Message (2), length: 157
      Origin (1), length: 1, Flags [T]: IGP
      AS Path (2), length: 4, Flags [T]: 65002 
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop1-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::1/128
        nexthop: <nexthop2-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::2/128
        nexthop: <nexthop3-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::3/128

I double-checked ./lib/exabgp/rib/store.py for "AFI" and "SAFI" to make sure that I have the latest code.

As stated before: We have a workaround in place so there is really no need for you to hurry fixing this.

Please let me know if you need anything from me.

@thomas-mangin
Copy link
Member

Thank you - I will need to look into it in more depth as I was pretty sure that this patch would have done the trick (it disabled update grouping when there is families other than IPv4 Unicast) but I must have missed something.

@thomas-mangin
Copy link
Member

The approach was right, the details were wrong. The bug is fixed on 3.4
daf7c34
The issue does not exist on master as we already checked the family type before performing the grouping.

@subsecond
Copy link
Author

Thanks, Thomas!

Yes, I can confirm that it is now working on branch 3.4.
I see one MPREACH per UPDATE message with group-updates true:

IP6 (flowlabel 0x542a1, hlim 64, next-header TCP (6) payload length: 115) <exabgp-ipv6-address>.33002 > <neighbor-ipv6-address>.179: Flags [P.], cksum 0x977e (correct), seq 65:140, ack 65, win 225, options [nop,nop,md5valid], length 75: BGP, length: 75
    Update Message (2), length: 75
      Origin (1), length: 1, Flags [T]: IGP
      AS Path (2), length: 4, Flags [T]: 65003 
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop3-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::3/128

IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 40) <neighbor-ipv6-address>.179 > <exabgp-ipv6-address>.33002: Flags [.], cksum 0x3ed1 (correct), seq 65, ack 140, win 45, options [nop,nop,md5valid], length 0

IP6 (flowlabel 0x542a1, hlim 64, next-header TCP (6) payload length: 190) <exabgp-ipv6-address>.33002 > <neighbor-ipv6-address>.179: Flags [P.], cksum 0x73ab (correct), seq 140:290, ack 65, win 225, options [nop,nop,md5valid], length 150: BGP, length: 150
    Update Message (2), length: 75
      Origin (1), length: 1, Flags [T]: IGP
      AS Path (2), length: 4, Flags [T]: 65003 
          Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop2-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::2/128
    Update Message (2), length: 75
      Origin (1), length: 1, Flags [T]: IGP
      AS Path (2), length: 4, Flags [T]: 65003 
      Multi-Protocol Reach NLRI (14), length: 38, Flags [O]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: <nexthop1-ipv6-address>, nh-length: 16, no SNPA
          2001:db8::1/128

IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 40) <neighbor-ipv6-address>.179 > <exabgp-ipv6-address>.33002: Flags [.], cksum 0x5c33 (correct), seq 65, ack 290, win 54, options [nop,nop,md5valid], length 0

For me it's case closed. Many thanks for the quick fix; it was a pleasure to work with you!

One last question remains:
I am not really used to python-pip, but in this case it seems like the easiest solution to use. When can we expect the pip package to be updated to this commit?

Cheers,
Manuel

@thomas-mangin
Copy link
Member

https://github.com/Exa-Networks/exabgp/releases/tag/3.4.17
https://pypi.python.org/pypi/exabgp/3.4.17

@pwo
Copy link

pwo commented Jul 4, 2017

Did this fix get lost in 0be49f6 ?

@thomas-mangin
Copy link
Member

will check. Thank you

@thomas-mangin thomas-mangin reopened this Jul 4, 2017
thomas-mangin added a commit to thomas-mangin/exabgp that referenced this issue Jul 4, 2017
@thomas-mangin
Copy link
Member

Can you please re-check - and if required re-open but I think it should be ok now :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants