Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: Assertion failed during shutdown. #4816

Merged
merged 1 commit into from
Aug 11, 2019

Conversation

NaveenThanikachalam
Copy link
Contributor

A race condition causes the failure.
The function "make_info()" sets the path info's peer to
bgp instance's "peer_self" which is created when BGP is first
configured and deleted only when BGP is brought down completely.
A race condition causes the bgp instances's "peer_self" to be
removed before the routes are being pulled off from the aggregate
address.

If the bgp instance's "peer_self" is NULL or, if BGP is being deleted,
the aggregate route must not be reinstalled.

Signed-off-by: NaveenThanikachalam nthanikachal@vmware.com

A race condition causes the failure.
The function "make_info()" sets the path info's peer to
bgp instance's "peer_self" which is created when BGP is first
configured and deleted only when BGP is brought down completely.
A race condition causes the bgp instances's "peer_self" to be
removed before the routes are being pulled off from the aggregate
address.

If the bgp instance's "peer_self" is NULL or, if BGP is being deleted,
the aggregate route must not be reinstalled.

Signed-off-by: NaveenThanikachalam nthanikachal@vmware.com
@polychaeta polychaeta added the bgp label Aug 11, 2019
@NaveenThanikachalam
Copy link
Contributor Author

The BT is as below:

(gdb) bt #0 0x000003c8f9394428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x000003c8f939602a in __GI_abort () at abort.c:89 #2 0x000003c8fa09e1f4 in _zlog_assert_failed (assertion=assertion@entry=0x632accdf67 "peer && (peer->lock >= 0)", file=file@entry=0x632accdb9f "bgpd.c", line=line@entry=1092, function=function@entry=0x632accef40 <__func__.19125> "peer_lock_with_caller") at lib/log.c:710 #3 0x000000632ac05e33 in peer_lock_with_caller (peer=<optimized out>, name=0x632acda898 <__FUNCTION__.19213> "bgp_info_add") at bgpd.c:1092 #4 0x000000632ac0ff80 in peer_lock_with_caller (name=name@entry=0x632acda898 <__FUNCTION__.19213> "bgp_info_add", peer=<optimized out>) at bgpd.c:1101 #5 0x000000632ac2e5e0 in bgp_info_add (rn=rn@entry=0x63310d3240, ri=<optimized out>) at bgp_route.c:314 #6 0x000000632ac335d8 in bgp_aggregate_install (bgp=bgp@entry=0x632ea454a0, afi=afi@entry=AFI_IP6, safi=safi@entry=SAFI_UNICAST, p=p@entry=0x632de51890, origin=<optimized out>, aspath=0x0, community=0x0, lcommunity=0x0, aggregate=0x632de51940, atomic_aggregate=0 '\000') at bgp_route.c:5458 #7 0x000000632ac338fe in bgp_remove_route_from_aggregate (aggr_p=0x632de51890, aggregate=0x632de51940, pi=0x63310d30e0, safi=SAFI_UNICAST, afi=AFI_IP6, bgp=0x632ea454a0) at bgp_route.c:5652 #8 bgp_aggregate_decrement (bgp=0x632ea454a0, p=p@entry=0x63310d3030, del=del@entry=0x63310d30e0, afi=AFI_IP6, safi=SAFI_UNICAST) at bgp_route.c:5722 #9 0x000000632ac33b2e in bgp_aggregate_decrement (safi=SAFI_UNICAST, afi=AFI_IP6, del=0x63310d30e0, p=0x63310d3030, bgp=<optimized out>) at bgp_route.c:2703 #10 bgp_rib_remove (rn=0x63310d3030, ri=0x63310d30e0, peer=0x632dcef080, afi=AFI_IP6, safi=SAFI_UNICAST) at bgp_route.c:2681 #11 0x000000632ac33bf1 in bgp_clear_route_node (wq=<optimized out>, data=<optimized out>) at bgp_route.c:3801 #12 0x000003c8fa0c4bac in work_queue_run (thread=0x3ead211db50) at lib/workqueue.c:282 #13 0x000003c8fa0be530 in thread_call (thread=thread@entry=0x3ead211db50) at lib/thread.c:1498 #14 0x000003c8fa09b7f8 in frr_run (master=0x632d448530) at lib/libfrr.c:879 #15 0x000000632ac07552 in main (argc=6, argv=0x3ead211dda8) at bgp_main.c:424 (gdb)

@NaveenThanikachalam
Copy link
Contributor Author

Another PR addresses a similar issue under a different circumstance.
PR: #4499

@LabN-CI
Copy link
Collaborator

LabN-CI commented Aug 11, 2019

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/4816 dfb6fd1
Date 08/11/2019
Start 07:15:32
Finish 07:37:17
Run-Time 21:45
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2019-08-11-07:15:32.txt
Log autoscript-2019-08-11-07:16:26.log.bz2
Memory 425 408 360

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-8583/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.


CLANG Static Analyzer Summary

  • Github Pull Request 4816, comparing to Git base SHA ab0ef7a

No Changes in Static Analysis warnings compared to base

1 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-8583/artifact/shared/static_analysis/index.html

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine to me

@donaldsharp donaldsharp merged commit 269bca5 into FRRouting:master Aug 11, 2019
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Oct 31, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash
Signed-off-by: sudhanshu kumar sudhanshu.kumar@broadcom.com
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 2, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash
Signed-off-by: sudhanshu kumar sudhanshu.kumar@broadcom.com
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 2, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash
Signed-off-by: sudhanshu kumar sudhanshu.kumar@broadcom.com
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 2, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash
Signed-off-by: sudhanshukumar22 sudhanshu.kumar@broadcom.com
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 2, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash
Signed-off-by: sudhanshukumar22 sudhanshu.kumar@broadcom.com
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 2, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash
Signed-off-by: sudhanshukumar22 sudhanshu.kumar@broadcom.com
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 2, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 5, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 5, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Nov 5, 2020
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Jan 15, 2021
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Jan 18, 2021
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Jan 18, 2021
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
sudhanshukumar22 added a commit to sudhanshukumar22/frr that referenced this pull request Jan 18, 2021
Description:
aggregate member route was enqueued for recalculation
    while bgp instance was deleted.
    As part of aggregate member route deletion, the aggregate route is
    reinstalled with self-peer as source, but self-peer is already removed.
    Assert() for null peer pointer is path attribute aborts bgp.
Problem Description/Summary :
BGP crashed while cleaning up aggregate route as part of bgp instance deletion.
-----------------------
Leaf-4(config)#
Leaf-4(config)# no router bgp 65179 vrf Vrf-red
Leaf-4(config)# no router bgp 65179
Leaf-4(config)#
Leaf-4(config)#
Leaf-4(config)# root@Leaf-4:~#

Sep 26 15:38:21.257554 System is not ready - Core services are down
------------
router bgp 65179
bgp router-id 100.2.0.3
no bgp default ipv4-unicast
bgp network import-check
neighbor LeafToHostv4 peer-group
neighbor LeafToHostv4 remote-as 65003
neighbor LeafToHostv6 peer-group
neighbor LeafToHostv6 remote-as 65003
neighbor LeafToSpinev4 peer-group
neighbor LeafToSpinev4 remote-as 65134
neighbor LeafToSpinev4 bfd
neighbor LeafToSpinev6 peer-group
neighbor LeafToSpinev6 remote-as 65134
neighbor LeafToSpinev6 bfd
neighbor WindowsServer peer-group
neighbor WindowsServer remote-as 65201
neighbor 155.1.0.4 peer-group LeafToSpinev4
neighbor 155.2.0.4 peer-group LeafToSpinev4
neighbor 2000:155:1::4 peer-group LeafToSpinev6
neighbor 2000:155:2::4 peer-group LeafToSpinev6
neighbor 172.16.11.2 peer-group WindowsServer
neighbor 172.16.1.2 remote-as 65101
neighbor 2000:172:16:1::2 remote-as 65101
bgp listen limit 400
bgp listen range 133.3.0.0/16 peer-group LeafToHostv4
bgp listen range 2000:133:3::/48 peer-group LeafToHostv6
!
address-family ipv4 unicast
aggregate-address 133.1.0.0/16 as-set
aggregate-address 133.2.0.0/16 as-set
aggregate-address 133.3.0.0/16 as-set
aggregate-address 133.4.0.0/16 as-set
redistribute connected
neighbor LeafToHostv4 activate
neighbor LeafToSpinev4 activate
neighbor LeafToSpinev4 allowas-in 1
neighbor LeafToSpinev4 route-map spine_v4_export out
neighbor WindowsServer activate
neighbor 172.16.1.2 activate
exit-address-family
!
address-family ipv6 unicast
aggregate-address 2000:133:1::/48 as-set
aggregate-address 2000:133:2::/48 as-set
aggregate-address 2000:133:3::/48 as-set
aggregate-address 2000:133:4::/48 as-set
redistribute connected
..
------------
(gdb) bt
name=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add")
at bgpd/bgpd.c:1159
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
pi=<optimized out>) at bgpd/bgp_route.c:313
afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST,
p=p@entry=0x55607f1c4e10, origin=<optimized out>, aspath=0x55607f4bc8a0,
community=<optimized out>, ecommunity=<optimized out>,
lcommunity=<optimized out>, atomic_aggregate=0 '\000',
aggregate=0x55607f1c4ee0) at bgpd/bgp_route.c:5926
aggr_p=<optimized out>, aggregate=<optimized out>, pi=0x55607f41f9f0,
safi=SAFI_UNICAST, afi=AFI_IP, bgp=0x55607eeba5d0) at bgpd/bgp_route.c:6385
del=del@entry=0x55607f41f9f0, afi=afi@entry=AFI_IP,
--Type <return> to continue, or q <return> to quit--
safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_route.c:6446
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
data=<optimized out>) at bgpd/bgp_route.c:4125
at lib/workqueue.c:291
at lib/thread.c:1540
at bgpd/bgp_main.c:498
(gdb) fr 5
name=name@entry=0x55607dd49090 <_FUNCTION_.23915> "bgp_path_info_add",
peer=<optimized out>) at bgpd/bgpd.c:1158
1158 bgpd/bgpd.c: No such file or directory.
(gdb) fr 10
pi=0x55607f41f9f0, peer=0x55607ef22c10, afi=AFI_IP, safi=SAFI_UNICAST)
at bgpd/bgp_route.c:2885
2885 bgpd/bgp_route.c: No such file or directory.
(gdb) p peer->lock
$2 = 210
(gdb) p peer->status
$3 = 8
(gdb)
(gdb) p bgp
$11 = (struct bgp *) 0x56121ba315d0
(gdb) p bgp->peer_self
$12 = (struct peer *) 0x0
(gdb) p bgp->name
$13 = 0x0
(gdb) p bgp->name_pretty
$14 = 0x56121bb046a0 "VRF default"
(gdb) p bgp->inst_type
$15 = BGP_INSTANCE_TYPE_DEFAULT
(gdb)

bgp_aggregate_install():
5920
5921 new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, 0,
5922 bgp->peer_self, attr, rn);
5923
5924 SET_FLAG(new->flags, BGP_PATH_VALID);
5925
5926 bgp_path_info_add(rn, new);
5927 bgp_process(bgp, rn, afi, safi);

299 void bgp_path_info_add(struct bgp_node *rn, struct bgp_path_info *pi):
...
310
311 bgp_path_info_lock(pi);
312 bgp_lock_node(rn);
313 peer_lock(pi->peer); /* bgp_path_info peer reference */ <<< This points to bgp->peer_self = NULL
314 }

1573 #define peer_lock(B) peer_lock_with_caller(_FUNCTION_, (B))

1156 /* increase reference count on a struct peer */
1157 struct peer *peer_lock_with_caller(const char *name, struct peer *peer)
1158 {
1159 assert(peer && (peer->lock >= 0)); <<< asserted here
1160

Similar issue was fixed in community and we already have the fix:
FRRouting#4816
root@sr407497_lxc2:/home/ubuntu/frr_repo/frr/bgpd# git diff dfb6fd1~ dfb6fd1
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index abad1db..a372568 100644
— a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -5332,6 +5332,13 @@ static void bgp_purge_af_static_redist_routes(struct bgp *bgp, afi_t afi,
struct bgp_node *rn;
struct bgp_path_info *pi;

+ /* Do not install the aggregate route if BGP is in the
+ * process of termination.
+ */
+ if (bgp_flag_check(bgp, BGP_FLAG_DELETE_IN_PROGRESS) ||
+ (bgp->peer_self == NULL))
+ return;
+
table = bgp->rib[afi][safi];
for (rn = bgp_table_top(table); rn; rn = bgp_route_next(rn)) {
for (pi = bgp_node_get_bgp_path_info(rn); pi; pi = pi->next) {

But looks like similar handling is required at other places as well:

Expected Behavior :
BGP daemon should not crash

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants