Donald Sharp [Sun, 10 Sep 2023 13:22:47 +0000 (09:22 -0400)]
bgpd: bgp_connected_delete needs to ensure dest is still there
Again coverity believes that dest could be freed by a call
into bgp_dest_unlock_node, and it can if the lock count
is wrong. Let's fix that assumption for coverity
Donald Sharp [Sun, 10 Sep 2023 13:12:26 +0000 (09:12 -0400)]
bgpd: delete_evpn_route ensure that dest is not freed before usage
There exist two spots in this function where the dest could be
freed, but is not due to locking, but coverity thinks it might
so let's make the function happy.
Donald Sharp [Sun, 10 Sep 2023 13:09:08 +0000 (09:09 -0400)]
bgpd: evpn_cleanup_local_non_best_route could free dest
But never really does due to locking, but since it can
we need to treat it like it does and ensure that FRR
is not making a mistake, by using memory after it
has been freed.
Donald Sharp [Sun, 10 Sep 2023 12:53:36 +0000 (08:53 -0400)]
bgpd: bgp_clear_adj_in|remove dest may be freed
dest will not be freed due to lock but coverity does not know
that. Give it a hint. This change includes modifying bgp_dest_unlock_node
to return the dest pointer so that we can determine if we should
continue working on dest or not with an assert. Since this
is lock based we should be ok.
Donald Sharp [Wed, 6 Sep 2023 12:39:02 +0000 (08:39 -0400)]
zebra: Prevent Null pointer deref
If the kernel sends us bad data then the kind_str
will be NULL and a later strcmp operation will
cause a crash.
As a note: If the kernel is not sending us properly
formated netlink messages then we got bigger problems
than zebra crashing. But at least let's prevent zebra
from crashing.
Reported-by: Iggy Frankovic <iggyfran@amazon.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Keelan10 [Tue, 22 Aug 2023 21:00:46 +0000 (01:00 +0400)]
ospfd: fix area range memory leak
Addressed a memory leak in OSPF by fixing the improper deallocation of
area range nodes when removed from the table. Introducing a new function,
`ospf_range_table_node_destroy` for proper node cleanup, resolved the issue.
The ASan leak log for reference:
```
Direct leak of 56 byte(s) in 2 object(s) allocated from:
#0 0x7faf661d1d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x7faf65bce1e9 in qcalloc lib/memory.c:105
#2 0x55a66e0b61cd in ospf_area_range_new ospfd/ospf_abr.c:43
#3 0x55a66e0b61cd in ospf_area_range_set ospfd/ospf_abr.c:195
#4 0x55a66e07f2eb in ospf_area_range ospfd/ospf_vty.c:631
#5 0x7faf65b51548 in cmd_execute_command_real lib/command.c:993
#6 0x7faf65b51f79 in cmd_execute_command_strict lib/command.c:1102
#7 0x7faf65b51fd8 in command_config_read_one_line lib/command.c:1262
#8 0x7faf65b522bf in config_from_file lib/command.c:1315
#9 0x7faf65c832df in vty_read_file lib/vty.c:2605
#10 0x7faf65c83409 in vty_read_config lib/vty.c:2851
#11 0x7faf65bb0341 in frr_config_read_in lib/libfrr.c:977
#12 0x7faf65c6cceb in event_call lib/event.c:1979
#13 0x7faf65bb1488 in frr_run lib/libfrr.c:1213
#14 0x55a66dfb28c4 in main ospfd/ospf_main.c:249
#15 0x7faf651c9c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
SUMMARY: AddressSanitizer: 56 byte(s) leaked in 2 allocation(s).
```
This adds formatted input/output of binary integer numbers to the
printf(), scanf(), and strtol() families, including their wide-character
counterparts.
Ed Maste [Thu, 4 Aug 2022 20:52:23 +0000 (16:52 -0400)]
lib/printf: drop "All rights reserved" from Foundation copyrights
This has already been done for most files that have the Foundation as
the only listed copyright holder. Do it now for files that list
multiple copyright holders, but have the Foundation copyright in its own
section.
bgpd: Treat as4-path (17) attribute as withdraw if malformed
rfc7606 defines:
Attributes 17 (AS4_PATH), 18 (AS4_AGGREGATOR), 22 (PMSI_TUNNEL), 23 (Tunnel
Encapsulation Attribute), 26 (AIGP), 27 (PE Distinguisher Labels),
and 29 (BGP-LS Attribute) do have error handling consistent with
Section 8 and thus are not further discussed herein.
Section 8 defines:
The "treat-as-withdraw" approach is generally
preferred and the "session reset" approach is discouraged.
For any malformed attribute that is handled by the "attribute
discard" instead of the "treat-as-withdraw" approach, it is critical
to consider the potential impact of doing so.
Mark Stapp [Fri, 1 Sep 2023 14:06:10 +0000 (10:06 -0400)]
lib,zebra: add tx queuelen to interface struct
Add the txqlen attribute to the common interface struct. Capture
the value in zebra, and distribute it through the interface lib
module's zapi messaging.
A router that supports the PMSI Tunnel attribute considers this
attribute to be malformed if either (a) it contains an undefined
tunnel type in the Tunnel Type field of the attribute, or (b) the
router cannot parse the Tunnel Identifier field of the attribute as a
tunnel identifier of the tunnel types specified in the Tunnel Type
field of the attribute.
When a router that receives a BGP Update that contains the PMSI
Tunnel attribute with its Partial bit set determines that the
attribute is malformed, the router SHOULD treat this Update as though
all the routes contained in this Update had been withdrawn.
The command "show bgp all rpki notfound" includes not only RPKI
notfound routes but also RPKI valid and invalid routes in its results.
Fix the code to display only RPKI notfound routes.
Old output:
```
frr# show bgp all rpki notfound
For address family: IPv4 Unicast
BGP table version is 0, local router ID is 10.0.0.1, vrf id 0
Default local pref 100, local AS 64512
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
N x.x.x.0/18 a.a.a.a 100 0 64513 i
V y.y.y.0/19 a.a.a.a 200 0 64513 i
I z.z.z.0/16 a.a.a.a 10 0 64513 i
Displayed 3 routes and 3 total paths
```
New output:
```
frr# show bgp all rpki notfound
For address family: IPv4 Unicast
BGP table version is 0, local router ID is 10.0.0.1, vrf id 0
Default local pref 100, local AS 64512
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
N x.x.x.0/18 a.a.a.a 100 0 64513 i
Donald Sharp [Wed, 30 Aug 2023 11:25:06 +0000 (07:25 -0400)]
bgpd: Add peers back to peer hash when peer_xfer_conn fails
It was noticed that occassionally peering failed in a testbed
upon investigation it was found that the peer was not in the
peer hash and we saw these failure messages:
Aug 25 21:31:15 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: %NOTIFICATION: sent to neighbor 2001:cafe:1ead:4::4 4/0 (Hold Timer Expired) 0 bytes
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] %bgp_getsockname() failed for peer 2001:cafe:1ead:4::4 fd 27 (from_peer fd -1)
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 33554464] %Neighbor failed in xfer_conn
Upon looking at the code the peer_xfer_conn function can fail
and the bgp_establish code will then return before adding the
peer back to the peerhash.
This is only part of the failure. The peer also appears to
be in a state where it is no longer initiating connection attempts
but that will be another commited fix when we figure that one out.
Rajasekar Raja [Thu, 17 Aug 2023 07:47:05 +0000 (00:47 -0700)]
zebra: Fix zebra crash when replacing NHE during shutdown
During replace of a NHE from upper proto in zebra_nhg_proto_add(),
- rib_handle_nhg_replace() is invoked with old NHE where we walk all
RNs/REs & replace the re->nhe whose address points to old NHE.
- In this walk, if prev re->nhe refcnt is decremented to 0, we free up
the memory which the old NHE is pointing to.
Later in zebra_nhg_proto_add(), we end up accessing this freed memory
and crash.
Backtrace:
0 0x00007f833f5f48eb in raise () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007f833f5df535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007f833f636648 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
3 0x00007f833f63cd6a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
4 0x00007f833f63cfb4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
5 0x00007f833f63fbc8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
6 0x00007f833f64172a in malloc () from /lib/x86_64-linux-gnu/libc.so.6
7 0x00007f833f6c3fd2 in backtrace_symbols () from /lib/x86_64-linux-gnu/libc.so.6
8 0x00007f833f9013fc in zlog_backtrace_sigsafe (priority=priority@entry=2, program_counter=program_counter@entry=0x7f833f5f48eb <raise+267>) at lib/log.c:222
9 0x00007f833f901593 in zlog_signal (signo=signo@entry=6, action=action@entry=0x7f833f988ee8 "aborting...", siginfo_v=siginfo_v@entry=0x7ffee1ce4a30,
program_counter=program_counter@entry=0x7f833f5f48eb <raise+267>) at lib/log.c:154
10 0x00007f833f92dbd1 in core_handler (signo=6, siginfo=0x7ffee1ce4a30, context=<optimized out>) at lib/sigevent.c:254
11 <signal handler called>
12 0x00007f833f5f48eb in raise () from /lib/x86_64-linux-gnu/libc.so.6
13 0x00007f833f5df535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
14 0x00007f833f958f96 in _zlog_assert_failed (xref=xref@entry=0x7f833f9e4080 <_xref.10705>, extra=extra@entry=0x0) at lib/zlog.c:680
15 0x00007f833f905400 in mt_count_free (mt=0x7f833fa02800 <MTYPE_NH_LABEL>, ptr=0x51) at lib/memory.c:84
16 mt_count_free (ptr=0x51, mt=0x7f833fa02800 <MTYPE_NH_LABEL>) at lib/memory.c:80
17 qfree (mt=0x7f833fa02800 <MTYPE_NH_LABEL>, ptr=0x51) at lib/memory.c:140
18 0x00007f833f90799c in nexthop_del_labels (nexthop=nexthop@entry=0x56091d776640) at lib/nexthop.c:563
19 0x00007f833f907b91 in nexthop_free (nexthop=0x56091d776640) at lib/nexthop.c:393
20 0x00007f833f907be8 in nexthops_free (nexthop=<optimized out>) at lib/nexthop.c:408
21 0x000056091c21aa76 in zebra_nhg_free_members (nhe=0x56091d890840) at zebra/zebra_nhg.c:1628
22 zebra_nhg_free (nhe=0x56091d890840) at zebra/zebra_nhg.c:1628
23 0x000056091c21bab2 in zebra_nhg_proto_add (id=<optimized out>, type=9, instance=<optimized out>, session=0, nhg=nhg@entry=0x56091d7da028, afi=afi@entry=AFI_UNSPEC)
at zebra/zebra_nhg.c:3532
24 0x000056091c22bc4e in process_subq_nhg (lnode=0x56091d88c540) at zebra/zebra_rib.c:2689
25 process_subq (qindex=META_QUEUE_NHG, subq=0x56091d24cea0) at zebra/zebra_rib.c:3290
26 meta_queue_process (dummy=<optimized out>, data=0x56091d24d4c0) at zebra/zebra_rib.c:3343
27 0x00007f833f9492c8 in work_queue_run (thread=0x7ffee1ce55a0) at lib/workqueue.c:285
28 0x00007f833f93f60d in thread_call (thread=thread@entry=0x7ffee1ce55a0) at lib/thread.c:2008
29 0x00007f833f8f9888 in frr_run (master=0x56091d068660) at lib/libfrr.c:1223
30 0x000056091c1b8366 in main (argc=12, argv=0x7ffee1ce5988) at zebra/main.c:551
Donald Sharp [Wed, 30 Aug 2023 14:33:29 +0000 (10:33 -0400)]
ospfd: Prevent use after free( and crash of ospf ) when no router ospf
Consider this config:
router ospf
redistribute kernel
Then you issue:
no router ospf
ospf will crash with a use after free.
The problem is that the event's associated with the
ospf pointer were shut off then the ospf_external_delete
was called which rescheduled the event. Let's just move
event deletion to the end of the no router ospf.
Donatas Abraitis [Tue, 29 Aug 2023 07:02:14 +0000 (10:02 +0300)]
bgpd: Handle Graceful-Restart capability with dynamic capability
Graceful-Restart restart time is exchanged using OPEN messages. In order to
reduce restart time before doing an actual graceful restart, it might be useful
to increase the time, but this is not possible without resetting the session.
With this change, it's possible to send dynamic capability with a new value, and
GR will respect a new reset time value.
Donald Sharp [Wed, 30 Aug 2023 11:25:06 +0000 (07:25 -0400)]
bgpd: Add peers back to peer hash when peer_xfer_conn fails
It was noticed that occassionally peering failed in a testbed
upon investigation it was found that the peer was not in the
peer hash and we saw these failure messages:
Aug 25 21:31:15 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: %NOTIFICATION: sent to neighbor 2001:cafe:1ead:4::4 4/0 (Hold Timer Expired) 0 bytes
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] %bgp_getsockname() failed for peer 2001:cafe:1ead:4::4 fd 27 (from_peer fd -1)
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 33554464] %Neighbor failed in xfer_conn
Upon looking at the code the peer_xfer_conn function can fail
and the bgp_establish code will then return before adding the
peer back to the peerhash.
This is only part of the failure. The peer also appears to
be in a state where it is no longer initiating connection attempts
but that will be another commited fix when we figure that one out.
Donatas Abraitis [Tue, 29 Aug 2023 12:11:52 +0000 (15:11 +0300)]
bgpd: Add a warning for the operator that keepalive was changed
```
donatas-pc(config-router)# timers bgp 8 12
% keeplive value 8 is larger than 1/3 of the holdtime, setting to 4
donatas-pc(config-router)# do sh run | include timers bgp
timers bgp 4 12
donatas-pc(config-router)#
```
Philippe Guibert [Mon, 28 Aug 2023 10:23:24 +0000 (12:23 +0200)]
bgpd: fix redistribute table command after bgp restarts
When the BGP 'redistribute table' command is used for a given route
table, and BGP configuration is flushed and rebuilt, the redistribution
does not work.
Actually, when flushing the BGP configuration with the 'no router bgp'
command, the BGP redistribute entries related to the 'redistribute table'
entries are not flushed. Actually, at BGP deletion, the table number is
not given as parameter in bgp_redistribute_unset() function, and the
redistribution entry is not removed in zebra.
Fix this by adding some code to flush all the redistribute table
instances.
Fixes: 7c8ff89e9346 ("Multi-Instance OSPF Summary") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Donald Sharp [Fri, 25 Aug 2023 14:43:56 +0000 (10:43 -0400)]
bgpd: Prevent use after free
When bgp_stop finishes and it deletes the peer it is sending
back a return code stating that the peer was deleted, but
the code was operating like it was not deleted and continued
to access the data structure. Fix.
Donald Sharp [Fri, 25 Aug 2023 14:28:02 +0000 (10:28 -0400)]
bgpd: bgp_event_update switch to a switch
The return code from a event handling perspective
is an enum. Let's intentionally make it a switch
so that all cases are ensured to be covered now
and in the future.
Donatas Abraitis [Thu, 24 Aug 2023 15:06:17 +0000 (18:06 +0300)]
staticd: Accept full blackhole typed keywords for ip_route_cmd
Before this patch we allow entering next-hop interface address as any string.
Like, we can type: `ip route 10.10.10.10/32 bla`, but this will create a blackhole
route instead of using an interface `bla`.
The same is with reject.
After the patch:
```
$ vtysh -c 'con' -c 'ip route 10.10.10.100/32 bla'
ERROR: SET_CONFIG request failed, Error: nexthop interface name must be (reject, blackhole)
$ ip link show dev bla
472: bla: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
link/ether fa:45:bd:f1:f8:f0 brd ff:ff:ff:ff:ff:ff
$ vtysh -c 'sh run | include ip route'
$ vtysh -c 'con' -c 'ip route 10.10.10.100/32 blac'
$ vtysh -c 'sh run | include ip route'
ip route 10.10.10.100/32 blackhole
$ vtysh -c 'con' -c 'no ip route 10.10.10.100/32 blac'
$ vtysh -c 'sh run | include ip route'
$ vtysh -c 'con' -c 'ip route 10.10.10.100/32 blackhole'
$ vtysh -c 'sh run | include ip route'
ip route 10.10.10.100/32 blackhole
$ vtysh -c 'con' -c 'no ip route 10.10.10.100/32 blackhole'
$ vtysh -c 'sh run | include ip route'
$ vtysh -c 'con' -c 'ip route 10.10.10.100/32 Null0'
$ vtysh -c 'sh run | include ip route'
ip route 10.10.10.100/32 Null0
$ vtysh -c 'con' -c 'no ip route 10.10.10.100/32 Null0'
$ vtysh -c 'sh run | include ip route'
$
```
乐倚 [Wed, 23 Aug 2023 08:42:33 +0000 (08:42 +0000)]
configure.ac: fix protobuf config
Bug description: frr_init load zebra_fpm.so error. Zebra can't
find function `zfpm_protobuf_encode_route` in symbol table.
Bug trigger condition ( CI have this set ):
./configure --enable-protobuf=no --enable-fpm=yes
/usr/lib/frr/zebra -M fpm
Cause: Macro `HAVE_PROTOBUF` and compile condition variable
`HAVE_PROTOBUF` in `configure.ac ` is not consistent. When
configure `disable-protobuf`, compile condition variable
`HAVE_PROTOBUF` is 0, but the macro is 1. It leads to zebra
load protobuf module, but protobuf module is not linked.
Fix: add a same condition statement to the macro define.
Keelan10 [Wed, 23 Aug 2023 05:23:48 +0000 (09:23 +0400)]
ospf6d: Free Newly Created LSA when Non-Self-Originated Grace LSA is Discarded
The newly created LSA `new` is now properly freed to prevent memory leaks when
a non-self-originated Grace LSA which is not in LSDB is received.
The ASan leak log for reference:
```
Direct leak of 400 byte(s) in 2 object(s) allocated from:
#0 0x7f70e984bd28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x7f70e92481c5 in qcalloc lib/memory.c:105
#2 0x55b35068c975 in ospf6_lsa_alloc ospf6d/ospf6_lsa.c:710
#3 0x55b35068c9f9 in ospf6_lsa_create ospf6d/ospf6_lsa.c:725
#4 0x55b35065ab2c in ospf6_receive_lsa ospf6d/ospf6_flood.c:912
#5 0x55b3506a1413 in ospf6_lsupdate_recv ospf6d/ospf6_message.c:1621
#6 0x55b3506a1413 in ospf6_read_helper ospf6d/ospf6_message.c:1896
#7 0x55b3506a1413 in ospf6_receive ospf6d/ospf6_message.c:1925
#8 0x7f70e92e6ccb in event_call lib/event.c:1979
#9 0x7f70e922b488 in frr_run lib/libfrr.c:1213
#10 0x55b35064345e in main ospf6d/ospf6_main.c:250
#11 0x7f70e8843c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 72 byte(s) in 2 object(s) allocated from:
#0 0x7f70e984bb40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#1 0x7f70e9247ee5 in qmalloc lib/memory.c:100
#2 0x55b35068c987 in ospf6_lsa_alloc ospf6d/ospf6_lsa.c:711
#3 0x55b35068c9f9 in ospf6_lsa_create ospf6d/ospf6_lsa.c:725
#4 0x55b35065ab2c in ospf6_receive_lsa ospf6d/ospf6_flood.c:912
#5 0x55b3506a1413 in ospf6_lsupdate_recv ospf6d/ospf6_message.c:1621
#6 0x55b3506a1413 in ospf6_read_helper ospf6d/ospf6_message.c:1896
#7 0x55b3506a1413 in ospf6_receive ospf6d/ospf6_message.c:1925
#8 0x7f70e92e6ccb in event_call lib/event.c:1979
#9 0x7f70e922b488 in frr_run lib/libfrr.c:1213
#10 0x55b35064345e in main ospf6d/ospf6_main.c:250
#11 0x7f70e8843c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
SUMMARY: AddressSanitizer: 472 byte(s) leaked in 4 allocation(s).
```