bgpd
Check mandatory attributes more carefully for update message
Do not explicitly print maxttl value for ebgp-multihop vty output
Do not process nlris if the attribute length is zero
Don't read the first byte of orf header if we are ahead of stream
Ensure community data is freed in some cases.
Ensure that the correct aspath is free'd
Evpn code was not properly unlocking rd_dest
Fix error handling when receiving bgp prefix sid attribute
Fix null argument warning
Fix session reset issue caused by malformed core attributes
Fix use beyond end of stream of labeled unicast parsing
Handle mp_reach_nlri malformed packets with session reset
Ignore handling nlris if we received mp_unreach_nlri
Include unsuppress-map as a valid outgoing policy
Prevent from one more cve triggering this place
Treat eor as withdrawn to avoid unwanted handling of malformed attrs
Use enum bgp_create_error_code as argument in header
Use treat-as-withdraw for tunnel encapsulation attribute
isisd
Fix heap-after-free with prefix sid
need to link directly against libyang
lib
Fix evpn nexthop config order
Allow unsetting walltime-warning and cpu-warning
Make cmd_element->attr a bitmask & clarify
Replace deprecated ares_gethostbyname
Replace deprecated ares_process()
nhrpd
Fix nhrp_peer leak
Fix core dump on shutdown
ospf6d
Fix crash because neighbor structure was freed
Fix uninitialized warnings
Ospfv3 route change comparision fixed for asbr-only change
Stop crash in ospf6_write
Prevent heap-buffer-overflow with unknown type
ospfd
Check for nulls in vty code
Correct opaque lsa extended parser
Prevent use after free( and crash of ospf ) when no router ospf
Protect call to get_edge() in ospf_te.c
Solved crash in ospf te parsing
Solved crash in ri parsing with ospf te
pimd
Fix dr-priority range
Fix null register before aging out reg-stop
Fix order of operations for evaluating join
Re-evaluated s,g oils upon rp changes and for empty sg upstream oils
Fix crash when mixing ssm/any-source joins
ripd
Revert "cleanup memory allocations on shutdown"
ripngd
Revert "cleanup memory allocations on shutdown"
vtysh
Print uniq lines when parsing `no service ...`
zebra
Deny the routes if ip protocol cli refers to an undefined rmap
Fix connected route deletion when multiple entry exists
Keelan10 [Wed, 5 Jun 2024 10:34:18 +0000 (13:34 +0300)]
nhrpd: Fix nhrp_peer leak
- Addressed memory leak by removing `&c->peer_notifier` from the notifier list on termination. Retaining it caused the notifier list to stay active, preventing the deletion of `c->cur.peer`
thereby causing a memory leak.
- Reordered termination steps to call `vrf_terminate` before `nhrp_vc_terminate`, preventing a heap-use-after-free issue when `nhrp_vc_notify_del` is invoked in `nhrp_peer_check_delete`.
- Added an if statement to avoid passing NULL as hash to `hash_release`, which leads to a SIGSEGV.
The ASan leak log for reference:
```
***********************************************************************************
Address Sanitizer Error detected in nhrp_topo.test_nhrp_topo/r1.asan.nhrpd.20265
Direct leak of 112 byte(s) in 1 object(s) allocated from:
#0 0x7f80270c9b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#1 0x7f8026ac1eb8 in qmalloc lib/memory.c:100
#2 0x560fd648f0a6 in nhrp_peer_create nhrpd/nhrp_peer.c:175
#3 0x7f8026a88d3f in hash_get lib/hash.c:147
#4 0x560fd6490a5d in nhrp_peer_get nhrpd/nhrp_peer.c:228
#5 0x560fd648a51a in nhrp_nhs_resolve_cb nhrpd/nhrp_nhs.c:297
#6 0x7f80266b000f in resolver_cb_literal lib/resolver.c:234
#7 0x7f8026b62e0e in event_call lib/event.c:1969
#8 0x7f8026aa5437 in frr_run lib/libfrr.c:1213
#9 0x560fd6488b4f in main nhrpd/nhrp_main.c:166
#10 0x7f8025eb2c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
SUMMARY: AddressSanitizer: 112 byte(s) leaked in 1 allocation(s).
***********************************************************************************
***********************************************************************************
Address Sanitizer Error detected in nhrp_topo.test_nhrp_topo/r2.asan.nhrpd.20400
Direct leak of 112 byte(s) in 1 object(s) allocated from:
#0 0x7fb6e3ca5b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#1 0x7fb6e369deb8 in qmalloc lib/memory.c:100
#2 0x562652de40a6 in nhrp_peer_create nhrpd/nhrp_peer.c:175
#3 0x7fb6e3664d3f in hash_get lib/hash.c:147
#4 0x562652de5a5d in nhrp_peer_get nhrpd/nhrp_peer.c:228
#5 0x562652de1e8e in nhrp_packet_recvraw nhrpd/nhrp_packet.c:325
#6 0x7fb6e373ee0e in event_call lib/event.c:1969
#7 0x7fb6e3681437 in frr_run lib/libfrr.c:1213
#8 0x562652dddb4f in main nhrpd/nhrp_main.c:166
#9 0x7fb6e2a8ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
SUMMARY: AddressSanitizer: 112 byte(s) leaked in 1 allocation(s).
***********************************************************************************
```
Iggy Frankovic [Thu, 30 May 2024 11:59:54 +0000 (07:59 -0400)]
ospf6d: Prevent heap-buffer-overflow with unknown type
When parsing a osf6 grace lsa field and we receive an
unknown tlv type, ospf6d was not incrementing the pointer
to get beyond the tlv. Leaving a situation where ospf6d
would parse the packet incorrectly.
There is no reason to call `igmp_anysource_forward_stop()` inside a call to
`igmp_get_source_by_addr()`; not only it is not expected for a "get" function
to perform such an action, but also the decision to start/stop forwarding is
already handled correctly by pim outside `igmp_get_source_by_addr()`.
That call was left there from the days pim was initially imported into the sources.
The problem/crash was happening because `igmp_find_source_by_addr()` would fail to
find the group/source combo when mixing `(*, G)` and `(S, G)`. When having an existing
flow `(*, G)`, and a new `(S, G)` igmp is received, a new entry is correctly created.
`igmp_anysource_forward_stop(group)` always stops and eventually frees `(*, G)`, even
when the new igmp is `(S, G)`, leaving a bad state. I.e, the new entry for `(S, G)`
causes `(*, G)` to be deleted.
Tested the fix with multiple receivers on the same interface with several ssm and
any source senders and receivers with various combination of start/stop orders and
they all worked correctly.
Acee [Tue, 28 May 2024 14:02:27 +0000 (10:02 -0400)]
ospf6d: OSPFv3 route change comparision fixed for ASBR-only change
When a router route already exists in the area border routers table
as an ABR and it solely changes its ABR or ASBR status, the change
was missed and border route is not updated. This fixes the comparison
for the router_bits in the ospf6_path structure.
This fixes issue https://github.com/FRRouting/frr/issues/16053 although
the actual problem is not the computing router (r2) and not the OSPFv3
redistribution (r3).
During fuzzing, Iggy Frankovic discovered that get_edge() function in ospf_te.c
could return null pointer, in particular when the link_id or advertised router
IP addresses are fuzzed. As the null pointer returned by get_edge() function is
not handlei by calling functions, this could cause ospfd crash.
This patch introduces new verification of returned pointer by get_edge()
function and stop the processing in case of null pointer. In addition, link ID
and advertiser router ID are validated before calling ls_find_edge_by_key() to
avoid the creation of a new edge with an invalid key.
Iggy Frankovic discovered another ospfd crash when performing fuzzing of OSPF
LSA packets. The crash occurs in ospf_te_parse_ext_link() function when
attemping to read Segment Routing Adjacency SID subTLVs. The original code
doesn't check if the size of the Extended Link TLVs and subTLVs have the correct
length. In presence of erronous LSA, this will cause a buffer overflow and ospfd
crashes.
This patch introduces new verification of the subTLVs size for Extended Link
TLVs and subTLVs. Similar check has been also introduced for the Extended
Prefix TLV.
Iggy Frankovic discovered another ospfd crash when performing fuzzing of OSPF
LSA packets. The crash occurs in ospf_te_parse_ri() function when attemping to
read Segment Routing subTLVs. The original code doesn't check if the size of
the SR subTLVs have the correct length. In presence of erronous LSA, this will
cause a buffer overflow and ospfd crash.
This patch introduces new verification of the subTLVs size for Router
Information TLV.
Louis Scalbert [Thu, 16 May 2024 14:44:03 +0000 (16:44 +0200)]
isisd: fix heap-after-free with prefix sid
> ==2334217==ERROR: AddressSanitizer: heap-use-after-free on address 0x61000001d0a0 at pc 0x563828c8de6f bp 0x7fffbdaee560 sp 0x7fffbdaee558
> READ of size 1 at 0x61000001d0a0 thread T0
> #0 0x563828c8de6e in prefix_sid_cmp isisd/isis_spf.c:187
> #1 0x7f84b8204f71 in hash_get lib/hash.c:142
> #2 0x7f84b82055ec in hash_lookup lib/hash.c:184
> #3 0x563828c8e185 in isis_spf_prefix_sid_lookup isisd/isis_spf.c:209
> #4 0x563828c90642 in isis_spf_add2tent isisd/isis_spf.c:598
> #5 0x563828c91cd0 in process_N isisd/isis_spf.c:824
> #6 0x563828c93852 in isis_spf_process_lsp isisd/isis_spf.c:1041
> #7 0x563828c98dde in isis_spf_loop isisd/isis_spf.c:1821
> #8 0x563828c998de in isis_run_spf isisd/isis_spf.c:1983
> #9 0x563828c99c7b in isis_run_spf_with_protection isisd/isis_spf.c:2009
> #10 0x563828c9a60d in isis_run_spf_cb isisd/isis_spf.c:2090
> #11 0x7f84b835c72d in event_call lib/event.c:2011
> #12 0x7f84b8236d93 in frr_run lib/libfrr.c:1217
> #13 0x563828c21918 in main isisd/isis_main.c:346
> #14 0x7f84b7e4fd09 in __libc_start_main ../csu/libc-start.c:308
> #15 0x563828c20df9 in _start (/usr/lib/frr/isisd+0xf5df9)
>
> 0x61000001d0a0 is located 96 bytes inside of 184-byte region [0x61000001d040,0x61000001d0f8)
> freed by thread T0 here:
> #0 0x7f84b88a9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
> #1 0x7f84b8263bae in qfree lib/memory.c:130
> #2 0x563828c8e433 in isis_vertex_del isisd/isis_spf.c:249
> #3 0x563828c91c95 in process_N isisd/isis_spf.c:811
> #4 0x563828c93852 in isis_spf_process_lsp isisd/isis_spf.c:1041
> #5 0x563828c98dde in isis_spf_loop isisd/isis_spf.c:1821
> #6 0x563828c998de in isis_run_spf isisd/isis_spf.c:1983
> #7 0x563828c99c7b in isis_run_spf_with_protection isisd/isis_spf.c:2009
> #8 0x563828c9a60d in isis_run_spf_cb isisd/isis_spf.c:2090
> #9 0x7f84b835c72d in event_call lib/event.c:2011
> #10 0x7f84b8236d93 in frr_run lib/libfrr.c:1217
> #11 0x563828c21918 in main isisd/isis_main.c:346
> #12 0x7f84b7e4fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
> #0 0x7f84b88aa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
> #1 0x7f84b8263a6c in qcalloc lib/memory.c:105
> #2 0x563828c8e262 in isis_vertex_new isisd/isis_spf.c:225
> #3 0x563828c904db in isis_spf_add2tent isisd/isis_spf.c:588
> #4 0x563828c91cd0 in process_N isisd/isis_spf.c:824
> #5 0x563828c93852 in isis_spf_process_lsp isisd/isis_spf.c:1041
> #6 0x563828c98dde in isis_spf_loop isisd/isis_spf.c:1821
> #7 0x563828c998de in isis_run_spf isisd/isis_spf.c:1983
> #8 0x563828c99c7b in isis_run_spf_with_protection isisd/isis_spf.c:2009
> #9 0x563828c9a60d in isis_run_spf_cb isisd/isis_spf.c:2090
> #10 0x7f84b835c72d in event_call lib/event.c:2011
> #11 0x7f84b8236d93 in frr_run lib/libfrr.c:1217
> #12 0x563828c21918 in main isisd/isis_main.c:346
> #13 0x7f84b7e4fd09 in __libc_start_main ../csu/libc-start.c:308
>
> SUMMARY: AddressSanitizer: heap-use-after-free isisd/isis_spf.c:187 in prefix_sid_cmp
> Shadow bytes around the buggy address:
> 0x0c207fffb9c0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
> 0x0c207fffb9d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
> 0x0c207fffb9e0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
> 0x0c207fffb9f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
> 0x0c207fffba00: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
> =>0x0c207fffba10: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fa
> 0x0c207fffba20: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
> 0x0c207fffba30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
> 0x0c207fffba40: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
> 0x0c207fffba50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
> 0x0c207fffba60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
> Addressable: 00
> Partially addressable: 01 02 03 04 05 06 07
> Heap left redzone: fa
> Freed heap region: fd
> Stack left redzone: f1
> Stack mid redzone: f2
> Stack right redzone: f3
> Stack after return: f5
> Stack use after scope: f8
> Global redzone: f9
> Global init order: f6
> Poisoned by user: f7
> Container overflow: fc
> Array cookie: ac
> Intra object redzone: bb
> ASan internal: fe
> Left alloca redzone: ca
> Right alloca redzone: cb
> Shadow gap: cc
> ==2334217==ABORTING
Fixes: 2f7cc7bcd3 ("isisd: detect Prefix-SID collisions and handle them appropriately") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit e697de58431474cdb06eff79bcbc70de4215e222)
zebra: Deny the routes if ip protocol CLI refers to an undefined rmap
Currently zebra does not deny the routes if `ip protocol <proto> route-map
FOO`
commmand is configured with reference to an undefined route-map (FOO in
this case).
However, on FRR restart, in zebra_route_map_check() routes get denied
if route-map name is available but the route-map is not defined. This
change was introduced in fd303a4ba14c762550db972317e1e88528768005.
Fix:
When `ip protocol <proto> route-map FOO` CLI is configured with reference to an
undefined route-map FOO, let the processing in ip_protocol_rm_add() and
ip_protocol_rm_del() go through so that zebra can deny the routes instead
of simply returning. This will result in consistent behavior.
Testing Done:
Before fix:
```
spine-1# configure
spine-1(config)# ip protocol bgp route-map rmap7
spine-1(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, A - Babel, D - SHARP, F - PBR, f - OpenFabric,
Z - FRR,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
C>* 27.0.0.1/32 is directly connected, lo, 02:27:45
B>* 27.0.0.3/32 [20/0] via fe80::202:ff:fe00:21, downlink_1, weight 1, 02:27:35
B>* 27.0.0.4/32 [20/0] via fe80::202:ff:fe00:29, downlink_2, weight 1, 02:27:40
B>* 27.0.0.5/32 [20/0] via fe80::202:ff:fe00:31, downlink_3, weight 1, 02:27:40
B>* 27.0.0.6/32 [20/0] via fe80::202:ff:fe00:39, downlink_4, weight 1, 02:27:40
```
After fix:
```
spine-1(config)# ip protocol bgp route-map route-map67
spine-1(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, A - Babel, D - SHARP, F - PBR, f - OpenFabric,
Z - FRR,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
C>* 27.0.0.1/32 is directly connected, lo, 00:35:03
B 27.0.0.3/32 [20/0] via fe80::202:ff:fe00:21, downlink_1 inactive, weight 1, 00:34:58
B 27.0.0.4/32 [20/0] via fe80::202:ff:fe00:29, downlink_2 inactive, weight 1, 00:34:57
B 27.0.0.5/32 [20/0] via fe80::202:ff:fe00:31, downlink_3 inactive, weight 1, 00:34:57
B 27.0.0.6/32 [20/0] via fe80::202:ff:fe00:39, downlink_4 inactive, weight 1, 00:34:58
spine-1(config)#
root@spine-1:mgmt:/var/home/cumulus# ip route show
root@spine-1:mgmt:/var/home/cumulus#
```
David Lamparter [Mon, 12 Dec 2022 16:50:59 +0000 (17:50 +0100)]
pimd: fix order of operations for evaluating join
join_desired looks at whether up->channel_oil is empty. up->channel_oil
is updated from pim_forward_stop(), calling pim_channel_del_oif(). But
that was being called *after* updating join_desired, so join_desired saw
a non-empty OIL. Pull up the pim_forward_stop() call to before updating
join_desired.
David Lamparter [Mon, 17 Apr 2023 09:47:08 +0000 (11:47 +0200)]
pimd: fix null register before aging out reg-stop
It looks like the code was trying to do this with the null_register
parameter on pim_upstream_start_register_stop_timer(), but that didn't
quite work right. Restructure a bit to get it right.
Donatas Abraitis [Wed, 27 Mar 2024 17:08:38 +0000 (19:08 +0200)]
bgpd: Prevent from one more CVE triggering this place
If we receive an attribute that is handled by bgp_attr_malformed(), use
treat-as-withdraw behavior for unknown (or missing to add - if new) attributes.
Donatas Abraitis [Wed, 27 Mar 2024 16:42:56 +0000 (18:42 +0200)]
bgpd: Fix error handling when receiving BGP Prefix SID attribute
Without this patch, we always set the BGP Prefix SID attribute flag without
checking if it's malformed or not. RFC8669 says that this attribute MUST be discarded.
Also, this fixes the bgpd crash when a malformed Prefix SID attribute is received,
with malformed transitive flags and/or TLVs.
Olivier Dugeon [Mon, 26 Feb 2024 09:40:34 +0000 (10:40 +0100)]
ospfd: Solved crash in OSPF TE parsing
Iggy Frankovic discovered an ospfd crash when perfomring fuzzing of OSPF LSA
packets. The crash occurs in ospf_te_parse_te() function when attemping to
create corresponding egde from TE Link parameters. If there is no local
address, an edge is created but without any attributes. During parsing, the
function try to access to this attribute fields which has not been created
causing an ospfd crash.
The patch simply check if the te parser has found a valid local address. If not
found, we stop the parser which avoid the crash.
Donatas Abraitis [Sun, 29 Oct 2023 20:44:45 +0000 (22:44 +0200)]
bgpd: Ignore handling NLRIs if we received MP_UNREACH_NLRI
If we receive MP_UNREACH_NLRI, we should stop handling remaining NLRIs if
no mandatory path attributes received.
In other words, if MP_UNREACH_NLRI received, the remaining NLRIs should be handled
as a new data, but without mandatory attributes, it's a malformed packet.
In normal case, this MUST not happen at all, but to avoid crashing bgpd, we MUST
handle that.
Donatas Abraitis [Fri, 27 Oct 2023 08:56:45 +0000 (11:56 +0300)]
bgpd: Treat EOR as withdrawn to avoid unwanted handling of malformed attrs
Treat-as-withdraw, otherwise if we just ignore it, we will pass it to be
processed as a normal UPDATE without mandatory attributes, that could lead
to harmful behavior. In this case, a crash for route-maps with the configuration
such as:
Donatas Abraitis [Tue, 13 Jun 2023 13:08:24 +0000 (16:08 +0300)]
bgpd: Use enum bgp_create_error_code as argument in header
```
bgpd/bgp_vty.c:865:5: warning: conflicting types for ‘bgp_vty_return’ due to enum/integer mismatch; have ‘int(struct vty *, enum bgp_create_error_code)’ [-Wenum-int-mismatch]
865 | int bgp_vty_return(struct vty *vty, enum bgp_create_error_code ret)
| ^~~~~~~~~~~~~~
In file included from ./bgpd/bgp_mplsvpn.h:15,
from bgpd/bgp_vty.c:48:
./bgpd/bgp_vty.h:148:12: note: previous declaration of ‘bgp_vty_return’ with type ‘int(struct vty *, int)’
148 | extern int bgp_vty_return(struct vty *vty, int ret);
| ^~~~~~~~~~~~~~
```
David Lamparter [Sat, 16 Sep 2023 12:17:24 +0000 (14:17 +0200)]
ospf6d: fix uninitialized warnings
GCC 13.2.0 complains:
```
ospf6d/ospf6_intra.c:139:25: error: ‘json_arr’ may be used uninitialized [-Werror=maybe-uninitialized]
ospf6d/ospf6_intra.c:485:20: error: ‘json_arr’ may be used uninitialized [-Werror=maybe-uninitialized]
```
Donatas Abraitis [Mon, 23 Oct 2023 20:34:10 +0000 (23:34 +0300)]
bgpd: Check mandatory attributes more carefully for UPDATE message
If we send a crafted BGP UPDATE message without mandatory attributes, we do
not check if the length of the path attributes is zero or not. We only check
if attr->flag is at least set or not. Imagine we send only unknown transit
attribute, then attr->flag is always 0. Also, this is true only if graceful-restart
capability is received.
Thread 1 "bgpd" received signal SIGSEGV, Segmentation fault.
0x00005555556e37be in route_set_aspath_prepend (rule=0x555555aac0d0, prefix=0x7fffffffe050,
object=0x7fffffffdb00) at bgpd/bgp_routemap.c:2282
2282 if (path->attr->aspath->refcnt)
(gdb)
```
Donald Sharp [Sat, 2 Mar 2024 14:50:38 +0000 (09:50 -0500)]
bgpd: Ensure community data is freed in some cases.
Customer has this valgrind trace:
Direct leak of 2829120 byte(s) in 70728 object(s) allocated from:
0 in community_new ../bgpd/bgp_community.c:39
1 in community_uniq_sort ../bgpd/bgp_community.c:170
2 in route_set_community ../bgpd/bgp_routemap.c:2342
3 in route_map_apply_ext ../lib/routemap.c:2673
4 in subgroup_announce_check ../bgpd/bgp_route.c:2367
5 in subgroup_process_announce_selected ../bgpd/bgp_route.c:2914
6 in group_announce_route_walkcb ../bgpd/bgp_updgrp_adv.c:199
7 in hash_walk ../lib/hash.c:285
8 in update_group_af_walk ../bgpd/bgp_updgrp.c:2061
9 in group_announce_route ../bgpd/bgp_updgrp_adv.c:1059
10 in bgp_process_main_one ../bgpd/bgp_route.c:3221
11 in bgp_process_wq ../bgpd/bgp_route.c:3221
12 in work_queue_run ../lib/workqueue.c:282
The above leak detected by valgrind was from a screenshot so I copied it
by hand. Any mistakes in line numbers are purely from my transcription.
Additionally this is against a slightly modified 8.5.1 version of FRR.
Code inspection of 8.5.1 -vs- latest master shows the same problem
exists. Code should be able to be followed from there to here.
What is happening:
There is a route-map being applied that modifes the outgoing community
to a peer. This is saved in the attr copy created in
subgroup_process_announce_selected. This community pointer is not
interned. So the community->refcount is still 0. Normally when
a prefix is announced, the attr and the prefix are placed on a
adjency out structure where the attribute is interned. This will
cause the community to be saved in the community hash list as well.
In a non-normal operation when the decision to send is aborted after
the route-map application, the attribute is just dropped and the
pointer to the community is just dropped too, leading to situations
where the memory is leaked. The usage of bgp suppress-fib would
would be a case where the community is caused to be leaked.
Additionally the previous commit where an unsuppress-map is used
to modify the outgoing attribute but since unsuppress-map was
not considered part of outgoing policy the attribute would be dropped as
well. This pointer drop also extends to any dynamically allocated
memory saved by the attribute pointer that was not interned yet as well.
So let's modify the return case where the decision is made to
not send the prefix to the peer to always just flush the attribute
to ensure memory is not leaked.
Fixes: #15459 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Wed, 13 Mar 2024 14:26:58 +0000 (10:26 -0400)]
bgpd: Ensure that the correct aspath is free'd
Currently in subgroup_default_originate the attr.aspath
is set in bgp_attr_default_set, which hashs the aspath
and creates a refcount for it. If this is a withdraw
the subgroup_announce_check and bgp_adj_out_set_subgroup
is called which will intern the attribute. This will
cause the the attr.aspath to be set to a new value
finally at the bottom of the function it intentionally
uninterns the aspath which is not the one that was
created for this function. This reduces the other
aspath's refcount by 1 and if a clear bgp * is issued
fast enough the aspath for that will be removed
and the system will crash.
pimd: re-evaluated S,G OILs upon RP changes and for empty SG upstream oils
Topology:
TOR11 (FHR) --- LEAF-11---SPINE1 (RP)MSDP SPINE-2(RP)MSDP --- LEAF-12 -- TOR12 (LHR)
| | | | |
| -----------------------------------------------------(ECMP) |
| | | |
-----------------------------------------------------------------------(ECMP)
Issue:
In some triggers, S,G upstream is preserved even with the PP timer expiry, resulting
in S,G with NULL OILS. This could be because we create a dummy S,G upstream and
dummy channel_oif for *,G, where RPF is UNKNOWN. As a result, PIM+VXLAN traffic is never
forwarded downstream to LHR.
Fix:
when the S,G stream is running, Determine if a reevaluation of the outgoing interface
list (OIL) is required. S,G upstream should then inherit the OIL from *,G.
David Lamparter [Thu, 16 Mar 2023 10:00:02 +0000 (11:00 +0100)]
bgpd: fix NULL argument warning
gcc 12.2.0 complains `error: ‘%s’ directive argument is null`, even
though all enum values are covered with a string. Let's just go with a
`???` default.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
zebra: Fix connected route deletion when multiple entry exists
When multiple interfaces have addresses in the same network, deleting
one of them may cause the wrong connected route being deleted.
For example:
ip link add veth1 type veth peer veth2
ip link set veth1 up
ip link set veth2 up
ip addr add dev veth1 192.168.0.1/24
ip addr add dev veth2 192.168.0.2/24
ip addr flush dev veth1
Zebra deletes the route of interface veth2 rather than veth1.
Should match nexthop against ere->re_nhe instead of ere->re->nhe.
Donald Sharp [Wed, 30 Aug 2023 14:33:29 +0000 (10:33 -0400)]
ospfd: Prevent use after free( and crash of ospf ) when no router ospf
Consider this config:
router ospf
redistribute kernel
Then you issue:
no router ospf
ospf will crash with a use after free.
The problem is that the event's associated with the
ospf pointer were shut off then the ospf_external_delete
was called which rescheduled the event. Let's just move
event deletion to the end of the no router ospf.
Donatas Abraitis [Sun, 20 Aug 2023 21:01:42 +0000 (00:01 +0300)]
bgpd: Do not explicitly print MAXTTL value for ebgp-multihop vty output
1. Create /etc/frr/frr.conf
```
frr version 7.5
frr defaults traditional
hostname centos8.localdomain
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
line vty
router bgp 4250001000
neighbor 192.168.122.207 remote-as 65512
neighbor 192.168.122.207 ebgp-multihop
```
2. Start FRR
`# systemctl start frr
`
3. Show running configuration. Note that FRR explicitly set and shows the default TTL (225)
```
Building configuration...
Current configuration:
!
frr version 7.5
frr defaults traditional
hostname centos8.localdomain
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 4250001000
neighbor 192.168.122.207 remote-as 65512
neighbor 192.168.122.207 ebgp-multihop 255
!
line vty
!
end
```
4. Copy initial frr.conf to frr.conf.new (no changes)
`# cp /etc/frr/frr.conf /root/frr.conf.new
`
5. Run frr-reload.sh:
```
$ /usr/lib/frr/frr-reload.py --test /root/frr.conf.new
2023-08-20 20:15:48,050 INFO: Called via "Namespace(bindir='/usr/bin', confdir='/etc/frr', daemon='', debug=False, filename='/root/frr.conf.new', input=None, log_level='info', overwrite=False, pathspace=None, reload=False, rundir='/var/run/frr', stdout=False, test=True, vty_socket=None)"
2023-08-20 20:15:48,050 INFO: Loading Config object from file /root/frr.conf.new
2023-08-20 20:15:48,124 INFO: Loading Config object from vtysh show running
Lines To Delete
===============
router bgp 4250001000
no neighbor 192.168.122.207 ebgp-multihop 255
Donatas Abraitis [Fri, 11 Aug 2023 15:21:12 +0000 (18:21 +0300)]
vtysh: Print uniq lines when parsing `no service ...`
Before this patch:
```
no service cputime-warning
no service cputime-warning
no ipv6 forwarding
no service cputime-warning
no service cputime-warning
no service cputime-warning
```
bgpd: Fix session reset issue caused by malformed core attributes
RCA:
On encountering any attribute error for core attributes in update message,
the error handling is set to 'treat as withdraw' and
further parsing of the remaining attributes is skipped.
But the stream pointer is not being correctly adjusted to
point to the next NLRI field skipping the rest of the attributes.
This leads to incorrect parsing of the NLRI field,
which causes BGP session to reset.
Fix:
The stream pointer offset is rightly adjusted to point to the NLRI field correctly
when the malformed attribute is encountered and remaining attribute parsing is skipped.
Donald Sharp [Sat, 1 Jul 2023 15:18:06 +0000 (11:18 -0400)]
ospf6d: Fix crash because neighbor structure was freed
The loading_done event needs a event pointer to prevent
use after free's. Testing found this:
ERROR: AddressSanitizer: heap-use-after-free on address 0x613000035130 at pc 0x55ad42d54e5f bp 0x7ffff1e942a0 sp 0x7ffff1e94290
READ of size 1 at 0x613000035130 thread T0
#0 0x55ad42d54e5e in loading_done ospf6d/ospf6_neighbor.c:447
#1 0x55ad42ed7be4 in event_call lib/event.c:1995
#2 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#3 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#4 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#5 0x55ad42cf2b19 in _start (/usr/lib/frr/ospf6d+0x248b19)
0x613000035130 is located 48 bytes inside of 384-byte region [0x613000035100,0x613000035280)
freed by thread T0 here:
#0 0x7f57998d77a8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7a8)
#1 0x55ad42e3b4b6 in qfree lib/memory.c:130
#2 0x55ad42d5d049 in ospf6_neighbor_delete ospf6d/ospf6_neighbor.c:180
#3 0x55ad42d1e1ea in interface_down ospf6d/ospf6_interface.c:930
#4 0x55ad42ed7be4 in event_call lib/event.c:1995
#5 0x55ad42ed84fe in _event_execute lib/event.c:2086
#6 0x55ad42d26d7b in ospf6_interface_clear ospf6d/ospf6_interface.c:2847
#7 0x55ad42d73f16 in ospf6_process_reset ospf6d/ospf6_top.c:755
#8 0x55ad42d7e98c in clear_router_ospf6_magic ospf6d/ospf6_top.c:778
#9 0x55ad42d7e98c in clear_router_ospf6 ospf6d/ospf6_top_clippy.c:42
#10 0x55ad42dc2665 in cmd_execute_command_real lib/command.c:994
#11 0x55ad42dc2b32 in cmd_execute_command lib/command.c:1053
#12 0x55ad42dc2fa9 in cmd_execute lib/command.c:1221
#13 0x55ad42ee3cd6 in vty_command lib/vty.c:591
#14 0x55ad42ee4170 in vty_execute lib/vty.c:1354
#15 0x55ad42eec94f in vtysh_read lib/vty.c:2362
#16 0x55ad42ed7be4 in event_call lib/event.c:1995
#17 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#18 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#19 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
previously allocated by thread T0 here:
#0 0x7f57998d7d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x55ad42e3ab22 in qcalloc lib/memory.c:105
#2 0x55ad42d5c8ff in ospf6_neighbor_create ospf6d/ospf6_neighbor.c:119
#3 0x55ad42d4c86a in ospf6_hello_recv ospf6d/ospf6_message.c:464
#4 0x55ad42d4c86a in ospf6_read_helper ospf6d/ospf6_message.c:1884
#5 0x55ad42d4c86a in ospf6_receive ospf6d/ospf6_message.c:1925
#6 0x55ad42ed7be4 in event_call lib/event.c:1995
#7 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#8 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#9 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Add an actual event pointer and just track it appropriately.
Donald Sharp [Fri, 30 Jun 2023 19:21:43 +0000 (15:21 -0400)]
ospf6d: Stop crash in ospf6_write
I'm seeing crashes in ospf6_write on the `assert(node)`. The only
sequence of events that I see that could possibly cause this to happen
is this:
a) Someone has scheduled a outgoing write to the ospf6->t_write and
placed item(s) on the ospf6->oi_write_q
b) A decision is made in ospf6_send_lsupdate() to send an immediate
packet via a event_execute(..., ospf6_write,....).
c) ospf6_write is called and the oi_write_q is cleaned out.
d) the t_write event is now popped and the oi_write_q is empty
and FRR asserts on the `assert(node)` <crash>
When event_execute is called for ospf6_write, just cancel the t_write
event. If ospf6_write has more data to send at the end of the function
it will reschedule itself. I've only seen this crash one time and am
unable to reliably reproduce this at all. But this is the only mechanism
that I can see that could make this happen, given how little the oi_write_q
is actually touched in code.
This commit introduced a crash. When the VRF is deleted, the RIPNG
instance should not be freed, because the NB infrastructure still stores
the pointer to it. The instance should be deleted only when it's actually
deleted from the configuration.
To reproduce the crash:
```
frr# conf t
frr(config)# vrf vrf1
frr(config-vrf)# exit
frr(config)# router ripng vrf vrf1
frr(config-router)# exit
frr(config)# no vrf vrf1
frr(config)# no router ripng vrf vrf1
vtysh: error reading from ripngd: Resource temporarily unavailable (11)Warning: closing connection to ripngd because of an I/O error!
frr(config)#
```
This commit introduced a crash. When the VRF is deleted, the RIP instance
should not be freed, because the NB infrastructure still stores the
pointer to it. The instance should be deleted only when it's actually
deleted from the configuration.
To reproduce the crash:
```
frr# conf t
frr(config)# vrf vrf1
frr(config-vrf)# exit
frr(config)# router rip vrf vrf1
frr(config-router)# exit
frr(config)# no vrf vrf1
frr(config)# no router rip vrf vrf1
vtysh: error reading from ripd: Resource temporarily unavailable (11)Warning: closing connection to ripd because of an I/O error!
frr(config)#
```
This a convenience release/tag for house keeping. We currently don't plan to publish
binary packages with this release.
Changelog:
bfdd
Fix malformed session with vrf
Remove redundant nb destroy callbacks
bgpd
Aggregate-address memory leak fix
Bmp fix peer-up ports byte order
Check 7 bytes for long-lived graceful-restart capability
Conform bgp_packet.h with coding standards
Copy the password from the previous peer on peer_xfer_config()
Do not allow a `no router bgp xxx` when autoimport is happening
Do not allow l3vni changes when shutting down
Do not announce routes immediatelly on filter updates
Ensure stream received has enough data
Fix bgpd core when unintern attr
Fix crash for `show bgp ... neighbor received-routes detail|prefix`
Fix debug output for route-map names when using a unsuppress-map
Fix ecommunity parsing for as4
Fix for ain->attr corruption during path update
Fix lcom->str string length to correctly cover aliases
Increase buffer size used for dumping bgp to mrt files
Limit flowspec to no attribute means a implicit withdrawal
Make bgp_keepalives.c not use mtype_tmp
Prevent null pointer deref when outputting data
Treat withdraw variable as a bool
Use interface name instead of pointer value
Use the actual pointer type instead of a void
lib
Adjust only `any` flag for prefix-list entries if destroying
Destroy `any` flag when creating a prefix-list entry with prefix
Fix link state memory leak
Fix vtysh core when handling questionmark
On bfd peer shutdown actually stop event
ospf6d
Stop using mtype_tmp in some cases
ospfd, ospf6d
Add more logging details
ospfd, ospfclient
Do not just include .c files in another .c
ospfd
Cleanup some memory leaks on shutdown in ospf_apiserver.c
Fix for vitual-link crash in signal handler
Fix interface param type update
Fix memory leaks w/ `show ip ospf int x json` commands
Fix ospf_lsa memory leak
Fix ospf_ti_lfa drop of an entire table
Fixing summary origination after range configuration
Free up q_space in early return path
Log adjacency changes with neighbor ip in addition to neighbor id
Ospf opaque lsa stale processing fix and topotests.
Remove mtype_tmp
Respect loopback's cost that is set and set loopback costs to 0
pbrd
Fix mismatching in match src-dst
pimd
Fix use after free issue for ifp's moving vrfs
Pim not sending register packets after changing from non dr to dr
Process no-forward bsm packet
ripd
Fix memory leak for ripd's route-map
tests
Add test to validate 4-byte ecomm parsing
Check if prefix-lists with ipv6 any works fine
Check if route-map works correctly if modifying prefix-lists
tools
Fix list value remove in frr-reload
Fix missing remote-as configuration when reload
Make check flag really work for reload
vtysh
Give actual pam error messages
zebra
Cleanup ctx leak on shutdown and turn off event
Evpn handle del event for dup detected mac
Fix evpn dup detected local mac del event
Fix for heap-use-after-free in evpn
Fix race during shutdown
Install directly connected route after interface flap
Reduce creation and fix memory leak of frrscripting pointers
Unlock the route node when sending route notifications
Chirag Shah [Fri, 26 May 2023 20:43:50 +0000 (13:43 -0700)]
ospfd: fix interface param type update
interface link update event needs
to be handle properly in ospf interface
cache.
Example:
When vrf (interface) is created its default type
would be set to BROADCAST because ifp->status
is not set to VRF.
Subsequent link event sets ifp->status to vrf,
ospf interface update need to compare current type
to new default type which would be VRF (OSPF_IFTYPE_LOOPBACK).
Since ospf type param was created in first add event,
ifp vrf link event didn't update ospf type param which
leads to treat vrf as non loopback interface.
Donald Sharp [Tue, 6 Dec 2022 15:23:11 +0000 (10:23 -0500)]
bgpd: Ensure stream received has enough data
BGP_PREFIX_SID_SRV6_L3_SERVICE attributes must not
fully trust the length value specified in the nlri.
Always ensure that the amount of data we need to read
can be fullfilled.
Reported-by: Iggy Frankovic <iggyfran@amazon.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 06431bfa7570f169637ebb5898f0b0cc3b010802)
Chirag Shah [Tue, 6 Jun 2023 04:48:12 +0000 (21:48 -0700)]
tools: fix list value remove in frr-reload
There might be a time element(s) from
temporary list are removed more than once
which leads to valueError in certain python3
version.
commit-id 1543f58b5 did not handle valueError
properly. This caused regression where
prefix-list config leads to delete followed
by add.
The new fix should just pass the exception as
value removal from list_to_add or list_to_del
is best effort.
This allows prefix-list config has no change
then removes the lines from lines_to_del and
lines_to_add properly.
Configure prefix-list in frr.conf and perform
multiple frr-reload. After first reload operatoin
subsequent ones should not result in delete followed
by add of the prefix-list but rather no-op operation.
Donald Sharp [Wed, 31 May 2023 15:40:07 +0000 (11:40 -0400)]
zebra: Unlock the route node when sending route notifications
When using a context to send route notifications to upper
level protocols, the code was using a locking function to
get the route node. There is no need for this to be locked
as such FRR should free it up.
Yuan Yuan [Tue, 30 May 2023 19:20:09 +0000 (19:20 +0000)]
lib: fix vtysh core when handling questionmark
When issue vtysh command with ?, the initial buf size for the
element is 16. Then it would loop through each element in the cmd
output vector. If the required size for printing out the next
element is larger than the current buf size, realloc the buf memory
by doubling the current buf size regardless of the actual size
that's needed. This would cause vtysh core when the doubled size
is not enough for the next element.
Yuan Yuan [Tue, 30 May 2023 18:53:32 +0000 (18:53 +0000)]
bgpd: fix bgpd core when unintern attr
When the remote peer is neither EBGP nor confed, aspath is the
shadow copy of attr->aspath in bgp_packet_attribute(). Striping
AS4_PATH should not be done on the aspath directly, since
that would lead to bgpd core dump when unintern the attr.
Donald Sharp [Fri, 26 May 2023 11:44:11 +0000 (07:44 -0400)]
vtysh: Give actual pam error messages
Code was was written where the pam error message put out
was the result from a previous call to the pam modules
instead of the current call to the pam module.