Keelan10 [Tue, 22 Aug 2023 21:00:46 +0000 (01:00 +0400)]
ospfd: fix area range memory leak
Addressed a memory leak in OSPF by fixing the improper deallocation of
area range nodes when removed from the table. Introducing a new function,
`ospf_range_table_node_destroy` for proper node cleanup, resolved the issue.
The ASan leak log for reference:
```
Direct leak of 56 byte(s) in 2 object(s) allocated from:
#0 0x7faf661d1d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x7faf65bce1e9 in qcalloc lib/memory.c:105
#2 0x55a66e0b61cd in ospf_area_range_new ospfd/ospf_abr.c:43
#3 0x55a66e0b61cd in ospf_area_range_set ospfd/ospf_abr.c:195
#4 0x55a66e07f2eb in ospf_area_range ospfd/ospf_vty.c:631
#5 0x7faf65b51548 in cmd_execute_command_real lib/command.c:993
#6 0x7faf65b51f79 in cmd_execute_command_strict lib/command.c:1102
#7 0x7faf65b51fd8 in command_config_read_one_line lib/command.c:1262
#8 0x7faf65b522bf in config_from_file lib/command.c:1315
#9 0x7faf65c832df in vty_read_file lib/vty.c:2605
#10 0x7faf65c83409 in vty_read_config lib/vty.c:2851
#11 0x7faf65bb0341 in frr_config_read_in lib/libfrr.c:977
#12 0x7faf65c6cceb in event_call lib/event.c:1979
#13 0x7faf65bb1488 in frr_run lib/libfrr.c:1213
#14 0x55a66dfb28c4 in main ospfd/ospf_main.c:249
#15 0x7faf651c9c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
SUMMARY: AddressSanitizer: 56 byte(s) leaked in 2 allocation(s).
```
Mark Stapp [Fri, 1 Sep 2023 14:06:10 +0000 (10:06 -0400)]
lib,zebra: add tx queuelen to interface struct
Add the txqlen attribute to the common interface struct. Capture
the value in zebra, and distribute it through the interface lib
module's zapi messaging.
The command "show bgp all rpki notfound" includes not only RPKI
notfound routes but also RPKI valid and invalid routes in its results.
Fix the code to display only RPKI notfound routes.
Old output:
```
frr# show bgp all rpki notfound
For address family: IPv4 Unicast
BGP table version is 0, local router ID is 10.0.0.1, vrf id 0
Default local pref 100, local AS 64512
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
N x.x.x.0/18 a.a.a.a 100 0 64513 i
V y.y.y.0/19 a.a.a.a 200 0 64513 i
I z.z.z.0/16 a.a.a.a 10 0 64513 i
Displayed 3 routes and 3 total paths
```
New output:
```
frr# show bgp all rpki notfound
For address family: IPv4 Unicast
BGP table version is 0, local router ID is 10.0.0.1, vrf id 0
Default local pref 100, local AS 64512
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
N x.x.x.0/18 a.a.a.a 100 0 64513 i
Donald Sharp [Wed, 30 Aug 2023 11:25:06 +0000 (07:25 -0400)]
bgpd: Add peers back to peer hash when peer_xfer_conn fails
It was noticed that occassionally peering failed in a testbed
upon investigation it was found that the peer was not in the
peer hash and we saw these failure messages:
Aug 25 21:31:15 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: %NOTIFICATION: sent to neighbor 2001:cafe:1ead:4::4 4/0 (Hold Timer Expired) 0 bytes
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] %bgp_getsockname() failed for peer 2001:cafe:1ead:4::4 fd 27 (from_peer fd -1)
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 33554464] %Neighbor failed in xfer_conn
Upon looking at the code the peer_xfer_conn function can fail
and the bgp_establish code will then return before adding the
peer back to the peerhash.
This is only part of the failure. The peer also appears to
be in a state where it is no longer initiating connection attempts
but that will be another commited fix when we figure that one out.
Donald Sharp [Wed, 30 Aug 2023 14:33:29 +0000 (10:33 -0400)]
ospfd: Prevent use after free( and crash of ospf ) when no router ospf
Consider this config:
router ospf
redistribute kernel
Then you issue:
no router ospf
ospf will crash with a use after free.
The problem is that the event's associated with the
ospf pointer were shut off then the ospf_external_delete
was called which rescheduled the event. Let's just move
event deletion to the end of the no router ospf.
Donald Sharp [Wed, 30 Aug 2023 11:25:06 +0000 (07:25 -0400)]
bgpd: Add peers back to peer hash when peer_xfer_conn fails
It was noticed that occassionally peering failed in a testbed
upon investigation it was found that the peer was not in the
peer hash and we saw these failure messages:
Aug 25 21:31:15 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: %NOTIFICATION: sent to neighbor 2001:cafe:1ead:4::4 4/0 (Hold Timer Expired) 0 bytes
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] %bgp_getsockname() failed for peer 2001:cafe:1ead:4::4 fd 27 (from_peer fd -1)
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 33554464] %Neighbor failed in xfer_conn
Upon looking at the code the peer_xfer_conn function can fail
and the bgp_establish code will then return before adding the
peer back to the peerhash.
This is only part of the failure. The peer also appears to
be in a state where it is no longer initiating connection attempts
but that will be another commited fix when we figure that one out.
Donatas Abraitis [Tue, 29 Aug 2023 12:11:52 +0000 (15:11 +0300)]
bgpd: Add a warning for the operator that keepalive was changed
```
donatas-pc(config-router)# timers bgp 8 12
% keeplive value 8 is larger than 1/3 of the holdtime, setting to 4
donatas-pc(config-router)# do sh run | include timers bgp
timers bgp 4 12
donatas-pc(config-router)#
```
Philippe Guibert [Mon, 28 Aug 2023 10:23:24 +0000 (12:23 +0200)]
bgpd: fix redistribute table command after bgp restarts
When the BGP 'redistribute table' command is used for a given route
table, and BGP configuration is flushed and rebuilt, the redistribution
does not work.
Actually, when flushing the BGP configuration with the 'no router bgp'
command, the BGP redistribute entries related to the 'redistribute table'
entries are not flushed. Actually, at BGP deletion, the table number is
not given as parameter in bgp_redistribute_unset() function, and the
redistribution entry is not removed in zebra.
Fix this by adding some code to flush all the redistribute table
instances.
Fixes: 7c8ff89e9346 ("Multi-Instance OSPF Summary") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Donald Sharp [Fri, 25 Aug 2023 14:43:56 +0000 (10:43 -0400)]
bgpd: Prevent use after free
When bgp_stop finishes and it deletes the peer it is sending
back a return code stating that the peer was deleted, but
the code was operating like it was not deleted and continued
to access the data structure. Fix.
Donald Sharp [Fri, 25 Aug 2023 14:28:02 +0000 (10:28 -0400)]
bgpd: bgp_event_update switch to a switch
The return code from a event handling perspective
is an enum. Let's intentionally make it a switch
so that all cases are ensured to be covered now
and in the future.
Donatas Abraitis [Thu, 24 Aug 2023 15:06:17 +0000 (18:06 +0300)]
staticd: Accept full blackhole typed keywords for ip_route_cmd
Before this patch we allow entering next-hop interface address as any string.
Like, we can type: `ip route 10.10.10.10/32 bla`, but this will create a blackhole
route instead of using an interface `bla`.
The same is with reject.
After the patch:
```
$ vtysh -c 'con' -c 'ip route 10.10.10.100/32 bla'
ERROR: SET_CONFIG request failed, Error: nexthop interface name must be (reject, blackhole)
$ ip link show dev bla
472: bla: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
link/ether fa:45:bd:f1:f8:f0 brd ff:ff:ff:ff:ff:ff
$ vtysh -c 'sh run | include ip route'
$ vtysh -c 'con' -c 'ip route 10.10.10.100/32 blac'
$ vtysh -c 'sh run | include ip route'
ip route 10.10.10.100/32 blackhole
$ vtysh -c 'con' -c 'no ip route 10.10.10.100/32 blac'
$ vtysh -c 'sh run | include ip route'
$ vtysh -c 'con' -c 'ip route 10.10.10.100/32 blackhole'
$ vtysh -c 'sh run | include ip route'
ip route 10.10.10.100/32 blackhole
$ vtysh -c 'con' -c 'no ip route 10.10.10.100/32 blackhole'
$ vtysh -c 'sh run | include ip route'
$ vtysh -c 'con' -c 'ip route 10.10.10.100/32 Null0'
$ vtysh -c 'sh run | include ip route'
ip route 10.10.10.100/32 Null0
$ vtysh -c 'con' -c 'no ip route 10.10.10.100/32 Null0'
$ vtysh -c 'sh run | include ip route'
$
```
乐倚 [Wed, 23 Aug 2023 08:42:33 +0000 (08:42 +0000)]
configure.ac: fix protobuf config
Bug description: frr_init load zebra_fpm.so error. Zebra can't
find function `zfpm_protobuf_encode_route` in symbol table.
Bug trigger condition ( CI have this set ):
./configure --enable-protobuf=no --enable-fpm=yes
/usr/lib/frr/zebra -M fpm
Cause: Macro `HAVE_PROTOBUF` and compile condition variable
`HAVE_PROTOBUF` in `configure.ac ` is not consistent. When
configure `disable-protobuf`, compile condition variable
`HAVE_PROTOBUF` is 0, but the macro is 1. It leads to zebra
load protobuf module, but protobuf module is not linked.
Fix: add a same condition statement to the macro define.
Keelan10 [Wed, 23 Aug 2023 05:23:48 +0000 (09:23 +0400)]
ospf6d: Free Newly Created LSA when Non-Self-Originated Grace LSA is Discarded
The newly created LSA `new` is now properly freed to prevent memory leaks when
a non-self-originated Grace LSA which is not in LSDB is received.
The ASan leak log for reference:
```
Direct leak of 400 byte(s) in 2 object(s) allocated from:
#0 0x7f70e984bd28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x7f70e92481c5 in qcalloc lib/memory.c:105
#2 0x55b35068c975 in ospf6_lsa_alloc ospf6d/ospf6_lsa.c:710
#3 0x55b35068c9f9 in ospf6_lsa_create ospf6d/ospf6_lsa.c:725
#4 0x55b35065ab2c in ospf6_receive_lsa ospf6d/ospf6_flood.c:912
#5 0x55b3506a1413 in ospf6_lsupdate_recv ospf6d/ospf6_message.c:1621
#6 0x55b3506a1413 in ospf6_read_helper ospf6d/ospf6_message.c:1896
#7 0x55b3506a1413 in ospf6_receive ospf6d/ospf6_message.c:1925
#8 0x7f70e92e6ccb in event_call lib/event.c:1979
#9 0x7f70e922b488 in frr_run lib/libfrr.c:1213
#10 0x55b35064345e in main ospf6d/ospf6_main.c:250
#11 0x7f70e8843c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 72 byte(s) in 2 object(s) allocated from:
#0 0x7f70e984bb40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#1 0x7f70e9247ee5 in qmalloc lib/memory.c:100
#2 0x55b35068c987 in ospf6_lsa_alloc ospf6d/ospf6_lsa.c:711
#3 0x55b35068c9f9 in ospf6_lsa_create ospf6d/ospf6_lsa.c:725
#4 0x55b35065ab2c in ospf6_receive_lsa ospf6d/ospf6_flood.c:912
#5 0x55b3506a1413 in ospf6_lsupdate_recv ospf6d/ospf6_message.c:1621
#6 0x55b3506a1413 in ospf6_read_helper ospf6d/ospf6_message.c:1896
#7 0x55b3506a1413 in ospf6_receive ospf6d/ospf6_message.c:1925
#8 0x7f70e92e6ccb in event_call lib/event.c:1979
#9 0x7f70e922b488 in frr_run lib/libfrr.c:1213
#10 0x55b35064345e in main ospf6d/ospf6_main.c:250
#11 0x7f70e8843c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
SUMMARY: AddressSanitizer: 472 byte(s) leaked in 4 allocation(s).
```
Keelan10 [Tue, 22 Aug 2023 13:19:51 +0000 (17:19 +0400)]
lib: Fix memory leaks in LS Update Functions
Previously when updating vertices, edges and subnets, when no update was required
due to existing value matching the new one, memory associated with the new object
was not being freed leading to memory leaks. This commit fixes memory leak by
freeing memory associated with new object when update is unnecessary.
The ASan leak log for reference:
```
Direct leak of 312 byte(s) in 3 object(s) allocated from:
#0 0x7faf3afbfa37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7faf3ab5dbcf in qcalloc ../lib/memory.c:105
#2 0x7faf3ab42e00 in ls_parse_prefix ../lib/link_state.c:1323
#3 0x7faf3ab43c87 in ls_parse_msg ../lib/link_state.c:1373
#4 0x7faf3ab476a5 in ls_stream2ted ../lib/link_state.c:1885
#5 0x564e045046aa in sharp_opaque_handler ../sharpd/sharp_zebra.c:792
#6 0x7faf3aca35a9 in zclient_read ../lib/zclient.c:4410
#7 0x7faf3ac47474 in event_call ../lib/event.c:1979
#8 0x7faf3ab318b4 in frr_run ../lib/libfrr.c:1213
#9 0x564e044fdc6f in main ../sharpd/sharp_main.c:177
#10 0x7faf3a6f4d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: 312 byte(s) leaked in 3 allocation(s).
```
bgpd: Convert from struct bgp_node to struct bgp_dest
This is based on @donaldsharp's work
The current code base is the struct bgp_node data structure.
The problem with this is that it creates a bunch of
extra data per route_node.
The table structure generates ‘holder’ nodes
that are never going to receive bgp routes,
and now the memory of those nodes is allocated
as if they are a full bgp_node.
After splitting up the bgp_node into bgp_dest and route_node,
the memory of ‘holder’ node which does not have any bgp data
will be allocated as the route_node, not the bgp_node,
and the memory usage is reduced.
The memory usage of BGP node will be reduced from 200B to 96B.
The total memory usage optimization of this part is ~16.00%.
Signed-off-by: Donald Sharp <sharpd@nvidia.com> Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>
Donald Sharp [Mon, 21 Aug 2023 19:37:32 +0000 (15:37 -0400)]
zebra: Prevent protodown_rc from going Bzonkas
The code that handles the protodown_rc setting for
VRRP interfaces in zebra is sending a interface
to be set into a protodown state *before* the
interface has been learned by the kernel. Resulting
in crashes when the data plane sends the ctx back
to us saying hey man you are uncool.
Additionally change the protodown code to refuse
to send any protodown_rc codes *until* the interface
has actually been learned about from the kernel.
Ticket: 3582375 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Keelan10 [Sat, 19 Aug 2023 21:16:48 +0000 (01:16 +0400)]
lib: Clear Computed Path Pointer to Destination on Clean
This commit ensures proper cleanup by clearing the `algo->pdst` pointer if it points to a path that is being deleted.
It addresses memory leaks by freeing memory held by `algo->pdst` that might not have been released during the cleanup of processed paths.
The ASan leak log for reference:
```
Direct leak of 96 byte(s) in 1 object(s) allocated from:
#0 0x7fbffcec9a37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7fbffca67a81 in qcalloc ../lib/memory.c:105
#2 0x7fbffc9d1a54 in cpath_new ../lib/cspf.c:44
#3 0x7fbffc9d2829 in cspf_init ../lib/cspf.c:256
#4 0x7fbffc9d295d in cspf_init_v4 ../lib/cspf.c:287
#5 0x5601dcd34d3f in show_sharp_cspf_magic ../sharpd/sharp_vty.c:1262
#6 0x5601dcd2c2be in show_sharp_cspf sharpd/sharp_vty_clippy.c:1869
#7 0x7fbffc9afd61 in cmd_execute_command_real ../lib/command.c:993
#8 0x7fbffc9b00ee in cmd_execute_command ../lib/command.c:1052
#9 0x7fbffc9b0dc0 in cmd_execute ../lib/command.c:1218
#10 0x7fbffcb611c7 in vty_command ../lib/vty.c:591
#11 0x7fbffcb660ac in vty_execute ../lib/vty.c:1354
#12 0x7fbffcb6c4aa in vtysh_read ../lib/vty.c:2362
#13 0x7fbffcb51324 in event_call ../lib/event.c:1979
#14 0x7fbffca3b872 in frr_run ../lib/libfrr.c:1213
#15 0x5601dcd11c6f in main ../sharpd/sharp_main.c:177
#16 0x7fbffc5ffd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Indirect leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7fbffcec9a37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7fbffca67a81 in qcalloc ../lib/memory.c:105
#2 0x7fbffca3c108 in list_new ../lib/linklist.c:49
#3 0x7fbffc9d1acc in cpath_new ../lib/cspf.c:47
#4 0x7fbffc9d2829 in cspf_init ../lib/cspf.c:256
#5 0x7fbffc9d295d in cspf_init_v4 ../lib/cspf.c:287
#6 0x5601dcd34d3f in show_sharp_cspf_magic ../sharpd/sharp_vty.c:1262
#7 0x5601dcd2c2be in show_sharp_cspf sharpd/sharp_vty_clippy.c:1869
#8 0x7fbffc9afd61 in cmd_execute_command_real ../lib/command.c:993
#9 0x7fbffc9b00ee in cmd_execute_command ../lib/command.c:1052
#10 0x7fbffc9b0dc0 in cmd_execute ../lib/command.c:1218
#11 0x7fbffcb611c7 in vty_command ../lib/vty.c:591
#12 0x7fbffcb660ac in vty_execute ../lib/vty.c:1354
#13 0x7fbffcb6c4aa in vtysh_read ../lib/vty.c:2362
#14 0x7fbffcb51324 in event_call ../lib/event.c:1979
#15 0x7fbffca3b872 in frr_run ../lib/libfrr.c:1213
#16 0x5601dcd11c6f in main ../sharpd/sharp_main.c:177
#17 0x7fbffc5ffd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Donald Sharp [Sun, 20 Aug 2023 22:43:48 +0000 (18:43 -0400)]
tests: static_simple gives up after 3 seconds
Under heavy system load we can see that the static_simple
test is giving up too early in this micronet run:
8-17 15:00:27,105 DEBUG: topo: Waiting for [0.1]s as initial delay
2023-08-17 15:00:27,206 DEBUG: r1: cmd_status("/bin/bash -c 'ip -4 route show'")
2023-08-17 15:00:28,209 DEBUG: r1:
stdout: 101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1
2023-08-17 15:00:28,209 DEBUG: topo: checking kernel routing table:
101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1
2023-08-17 15:00:28,210 INFO: topo: Function raised exception: Failed to find
'10.0.0.0/8(?: nhid [0-9]+)? via 101.0.0.2 dev r1-eth0 proto (static|196) metric 20'
in
'101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1
'
assert None
+ where None = <function search at 0x7f405b7bb0a0>('10.0.0.0/8(?: nhid [0-9]+)? via 101.0.0.2 dev r1-eth0 proto (static|196) metric 20', '101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1 \n')
+ where <function search at 0x7f405b7bb0a0> = re.search
2023-08-17 15:00:28,210 DEBUG: topo: Sleeping 2s until next retry with 3.0 retry time left
2023-08-17 15:00:30,211 DEBUG: r1: cmd_status("/bin/bash -c 'ip -4 route show'")
2023-08-17 15:00:31,703 DEBUG: r1:
stdout: 101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1
2023-08-17 15:00:31,703 DEBUG: topo: checking kernel routing table:
101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1
2023-08-17 15:00:31,704 INFO: topo: Function raised exception: Failed to find
'10.0.0.0/8(?: nhid [0-9]+)? via 101.0.0.2 dev r1-eth0 proto (static|196) metric 20'
in
'101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1
'
assert None
+ where None = <function search at 0x7f405b7bb0a0>('10.0.0.0/8(?: nhid [0-9]+)? via 101.0.0.2 dev r1-eth0 proto (static|196) metric 20', '101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1 \n')
+ where <function search at 0x7f405b7bb0a0> = re.search
2023-08-17 15:00:31,704 INFO: topo: Retry timeout of 3s reached
2023-08-17 15:00:31,704 INFO: topo: Spawn collection of support bundle for r1
2023-08-17 15:00:31,704 DEBUG: r1: cmd_status("/bin/bash -c 'mkdir -p /tmp/topotests/static_simple.test_static_simple/r1/support_bundles/test_static_cli'")
2023-08-17 15:00:31,710 DEBUG: r1: popen("/usr/lib/frr/generate_support_bundle.py --log-dir=/tmp/topotests/static_simple.test_static_simple/r1/support_bundles/test_static_cli")
2023-08-17 15:00:31,711 DEBUG: topo: Waiting on support bundle for r1
2023-08-17 15:00:31,751 DEBUG: topo: RETRY DIAG: [failure] Sleeping 2s until next retry with 2.2 retry time left - too see if timeout was too short
2023-08-17 15:00:33,751 DEBUG: r1: cmd_status("/bin/bash -c 'ip -4 route show'")
2023-08-17 15:00:35,137 DEBUG: r1:
stdout: 10.0.0.0/8 nhid 12 via 101.0.0.2 dev r1-eth0 proto 196 metric 20...
2023-08-17 15:00:35,137 DEBUG: topo: checking kernel routing table:
10.0.0.0/8 nhid 12 via 101.0.0.2 dev r1-eth0 proto 196 metric 20
101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1
2023-08-17 15:00:35,137 DEBUG: topo: Function returned None
2023-08-17 15:00:35,138 WARN: topo: RETRY DIAGNOSTIC: SUCCEED after FAILED with requested timeout of 3.0s; however, succeeded in 8.0s, investigate timeout timing
2023-08-17 15:00:35,138 INFO: topo: Function raised exception: Failed to find
'10.0.0.0/8(?: nhid [0-9]+)? via 101.0.0.2 dev r1-eth0 proto (static|196) metric 20'
in
'101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1
'
assert None
+ where None = <function search at 0x7f405b7bb0a0>('10.0.0.0/8(?: nhid [0-9]+)? via 101.0.0.2 dev r1-eth0 proto (static|196) metric 20', '101.0.0.0/24 dev r1-eth0 proto kernel scope link src 101.0.0.1 \n')
+ where <function search at 0x7f405b7bb0a0> = re.search
2023-08-17 15:00:35,138 DEBUG: topo: RETRY DIAG: [failure] Sleeping 2s until next retry with 0.2 retry time left - too see if timeout was too short
2023-08-17 15:00:37,139 DEBUG: r1: cmd_status("/bin/bash -c 'ip -4 route show'")
2023-08-17 15:00:37,247 DEBUG: r1:
stdout: 10.0.0.0/8 nhid 12 via 101.0.0.2 dev r1-eth0 proto 196 metric 20...
Of course it works in the extra couple of times it tries but the test still fails.
Donatas Abraitis [Sun, 20 Aug 2023 21:01:42 +0000 (00:01 +0300)]
bgpd: Do not explicitly print MAXTTL value for ebgp-multihop vty output
1. Create /etc/frr/frr.conf
```
frr version 7.5
frr defaults traditional
hostname centos8.localdomain
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
line vty
router bgp 4250001000
neighbor 192.168.122.207 remote-as 65512
neighbor 192.168.122.207 ebgp-multihop
```
2. Start FRR
`# systemctl start frr
`
3. Show running configuration. Note that FRR explicitly set and shows the default TTL (225)
```
Building configuration...
Current configuration:
!
frr version 7.5
frr defaults traditional
hostname centos8.localdomain
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 4250001000
neighbor 192.168.122.207 remote-as 65512
neighbor 192.168.122.207 ebgp-multihop 255
!
line vty
!
end
```
4. Copy initial frr.conf to frr.conf.new (no changes)
`# cp /etc/frr/frr.conf /root/frr.conf.new
`
5. Run frr-reload.sh:
```
$ /usr/lib/frr/frr-reload.py --test /root/frr.conf.new
2023-08-20 20:15:48,050 INFO: Called via "Namespace(bindir='/usr/bin', confdir='/etc/frr', daemon='', debug=False, filename='/root/frr.conf.new', input=None, log_level='info', overwrite=False, pathspace=None, reload=False, rundir='/var/run/frr', stdout=False, test=True, vty_socket=None)"
2023-08-20 20:15:48,050 INFO: Loading Config object from file /root/frr.conf.new
2023-08-20 20:15:48,124 INFO: Loading Config object from vtysh show running
Lines To Delete
===============
router bgp 4250001000
no neighbor 192.168.122.207 ebgp-multihop 255
Donatas Abraitis [Fri, 18 Aug 2023 08:28:03 +0000 (11:28 +0300)]
bgpd: Make sure we have enough data to read two bytes when validating AIGP
Found when fuzzing:
```
==3470861==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xffff77801ef7 at pc 0xaaaaba7b3dbc bp 0xffffcff0e760 sp 0xffffcff0df50
READ of size 2 at 0xffff77801ef7 thread T0
0 0xaaaaba7b3db8 in __asan_memcpy (/home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/bgpd/bgpd+0x363db8) (BuildId: cc710a2356e31c7f4e4a17595b54de82145a6e21)
1 0xaaaaba81a8ac in ptr_get_be16 /home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/./lib/stream.h:399:2
2 0xaaaaba819f2c in bgp_attr_aigp_valid /home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/bgpd/bgp_attr.c:504:3
3 0xaaaaba808c20 in bgp_attr_aigp /home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/bgpd/bgp_attr.c:3275:7
4 0xaaaaba7ff4e0 in bgp_attr_parse /home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/bgpd/bgp_attr.c:3678:10
```
Keelan10 [Thu, 17 Aug 2023 19:54:33 +0000 (23:54 +0400)]
pbrd: Correct Handling of Sequence Deletion
This commit ensures that sequence data
and associated structures are correctly deleted to prevent memory leaks
The ASan leak log for reference:
```
Direct leak of 432 byte(s) in 1 object(s) allocated from:
#0 0x7f911ebaba37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7f911e749a4e in qcalloc ../lib/memory.c:105
#2 0x564fd444b2d3 in pbrms_get ../pbrd/pbr_map.c:527
#3 0x564fd443a82d in pbr_map ../pbrd/pbr_vty.c:90
#4 0x7f911e691d61 in cmd_execute_command_real ../lib/command.c:993
#5 0x7f911e6920ee in cmd_execute_command ../lib/command.c:1052
#6 0x7f911e692dc0 in cmd_execute ../lib/command.c:1218
#7 0x7f911e843197 in vty_command ../lib/vty.c:591
#8 0x7f911e84807c in vty_execute ../lib/vty.c:1354
#9 0x7f911e84e47a in vtysh_read ../lib/vty.c:2362
#10 0x7f911e8332f4 in event_call ../lib/event.c:1979
#11 0x7f911e71d828 in frr_run ../lib/libfrr.c:1213
#12 0x564fd4425795 in main ../pbrd/pbr_main.c:168
#13 0x7f911e2e1d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Keelan10 [Sat, 19 Aug 2023 14:38:14 +0000 (18:38 +0400)]
ospfd: Delete `q_space->vertex_list` on No Backup Path
In scenarios where no backup paths are available, ensure proper
memory management by deleting `q_space->vertex_list`. This prevents
memory leaks.
The ASan leak log for reference:
```
Direct leak of 80 byte(s) in 2 object(s) allocated from:
#0 0x7fcf8c70aa37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7fcf8c2a8a45 in qcalloc ../lib/memory.c:105
#2 0x7fcf8c27d0cc in list_new ../lib/linklist.c:49
#3 0x55d6e8385e35 in ospf_spf_init ../ospfd/ospf_spf.c:540
#4 0x55d6e838c30d in ospf_spf_calculate ../ospfd/ospf_spf.c:1736
#5 0x55d6e83933cf in ospf_ti_lfa_generate_q_spaces ../ospfd/ospf_ti_lfa.c:673
#6 0x55d6e8394214 in ospf_ti_lfa_generate_p_space ../ospfd/ospf_ti_lfa.c:812
#7 0x55d6e8394c63 in ospf_ti_lfa_generate_p_spaces ../ospfd/ospf_ti_lfa.c:923
#8 0x55d6e8396390 in ospf_ti_lfa_compute ../ospfd/ospf_ti_lfa.c:1101
#9 0x55d6e838ca48 in ospf_spf_calculate_area ../ospfd/ospf_spf.c:1811
#10 0x55d6e838cd73 in ospf_spf_calculate_areas ../ospfd/ospf_spf.c:1840
#11 0x55d6e838cfb0 in ospf_spf_calculate_schedule_worker ../ospfd/ospf_spf.c:1871
#12 0x7fcf8c3922e4 in event_call ../lib/event.c:1979
#13 0x7fcf8c27c828 in frr_run ../lib/libfrr.c:1213
#14 0x55d6e82eeb6d in main ../ospfd/ospf_main.c:249
#15 0x7fcf8bd59d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Keelan10 [Sat, 19 Aug 2023 10:00:17 +0000 (14:00 +0400)]
bgpd: Free memory in set_aspath_exclude_access_list
Properly free the dynamically allocated memory held by `str` after its use.
The change also maintains the return value of `nb_cli_apply_changes` by using `ret` variable.
The ASan leak log for reference:
```
Direct leak of 55 byte(s) in 2 object(s) allocated from:
#0 0x7f16f285f867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x7f16f23fda11 in qmalloc ../lib/memory.c:100
#2 0x7f16f23a01a0 in frrstr_join ../lib/frrstr.c:89
#3 0x7f16f23418c7 in argv_concat ../lib/command.c:183
#4 0x55aba24731f2 in set_aspath_exclude_access_list_magic ../bgpd/bgp_routemap.c:6327
#5 0x55aba2455cf4 in set_aspath_exclude_access_list bgpd/bgp_routemap_clippy.c:836
#6 0x7f16f2345d61 in cmd_execute_command_real ../lib/command.c:993
#7 0x7f16f23460ee in cmd_execute_command ../lib/command.c:1052
#8 0x7f16f2346dc0 in cmd_execute ../lib/command.c:1218
#9 0x7f16f24f7197 in vty_command ../lib/vty.c:591
#10 0x7f16f24fc07c in vty_execute ../lib/vty.c:1354
#11 0x7f16f250247a in vtysh_read ../lib/vty.c:2362
#12 0x7f16f24e72f4 in event_call ../lib/event.c:1979
#13 0x7f16f23d1828 in frr_run ../lib/libfrr.c:1213
#14 0x55aba2269e52 in main ../bgpd/bgp_main.c:510
#15 0x7f16f1dbfd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
```
Donald Sharp [Tue, 10 Aug 2021 03:43:46 +0000 (23:43 -0400)]
bgpd: Convert `struct peer_connection` to dynamically allocated
As part of the conversion to a `struct peer_connection` it will
be desirable to have 2 pointers one for when we open a connection
and one for when we receive a connection. Start this actual
conversion over to this in `struct peer`. If this sounds confusing
take a look at the bgp state machine for connections and how
it resolves the processing of this router opening -vs- this
router receiving an open. At some point in time the state
machine decides that we are keeping one of the two connections.
Future commits will allow us to untangle the peer/doppelganger
duality with this abstraction.
Donald Sharp [Fri, 30 Apr 2021 18:55:40 +0000 (14:55 -0400)]
bgpd: Start abstraction of `struct peer_connection`
BGP tracks connections based upon the peer. But the problem
with this is that the doppelganger structure for it is being
created. This has introduced a bunch of fragileness in that
the peer exists independently of the connections to it.
The whole point of the doppelganger structure was to allow
BGP to both accept and initiate tcp connections and then
when we get one to a `good` state we collapse into the
appropriate one. The problem with this is that having
2 peer structures for this creates a situation where
we have to make sure we are configing the `right` one
and also make sure that we collapse the two independent
peer structures into 1 acting peer. This makes no sense
let's abstract out the peer into having 2 connection
one for incoming connections and one for outgoing connections
then we can easily collapse down without having to do crazy
stuff. In addition people adding new features don't need
to have to go touch a million places in the code.
This is the start of this abstraction. In this commit
we'll just pull out the fd and input/output buffers
into a connection data structure. Future commits
will abstract further.