bgpd: Fix session reset issue caused by malformed core attributes
RCA:
On encountering any attribute error for core attributes in update message,
the error handling is set to 'treat as withdraw' and
further parsing of the remaining attributes is skipped.
But the stream pointer is not being correctly adjusted to
point to the next NLRI field skipping the rest of the attributes.
This leads to incorrect parsing of the NLRI field,
which causes BGP session to reset.
Fix:
The stream pointer offset is rightly adjusted to point to the NLRI field correctly
when the malformed attribute is encountered and remaining attribute parsing is skipped.
- Major Highlights:
- Introduce `mgmtd` daemon
- Add BGP `neighbor path-attribute treat-as-withdraw` command
- Add BGP ASN dot notation support (RFC 5396)
- Add BGP Software Version capability
- Allow BGP peering via 127.0.0.0/8
- Deprecate BGP `internet` community - this is the Cisco-specific community, which is never been RFC-defined and confusing
- Implement `match source-protocol` for BGP route maps
- Implement BGP Node Target extended communities (draft-ietf-idr-node-target-ext-comm)
- Implement Flex-Algo for SR-MPLS (RFC 9350)
- Add support for IS-IS `advertise-passive-only`
- Add IS-IS `affinity-map` support
- Add the `graceful-restart hello-delay` OSPFv2/OSPFv3 command
- Add the `ipv6 mld join` PIMv6 command
- Add `allow-ecmp x` RIP/RIPng Command
- Add BFD support for RIP
- For a full list of new features and bug fixes, please refer to:
- https://frrouting.org/release/
Donald Sharp [Mon, 24 Jul 2023 14:33:21 +0000 (10:33 -0400)]
bgpd: Reduce size of ibuf_work ringbuf
The ringbuf is 650k in size. This is obscenely large and
in practical experimentation FRR never even approaches
that size at all. Let's reduce this to 1.5 max packet sizes.
If a BGP_MAX_PACKET_SIZE packet is ever received having a bit
of extra space ensures that we can read at least 1 packet.
This also will significantly reduce memory usage when the
operator has a lot of peers.
Introduced the idea of a input Q packet limit. Say you read in
635000 bytes of data and the input Q is already at it's limit
(currently 1000) then when bgp_process_reads runs it will
assert because there is less then a BGP_MAX_PACKET_SIZE in ibuf_work.
Don't assert as that it's irrelevant. Even if we can't read a full packet
in let's let the whole system keep working as that as the input Q length
comes down we will start pulling down the ibuf_work and it will be ok.
Donald Sharp [Mon, 24 Jul 2023 00:30:47 +0000 (20:30 -0400)]
bgpd: The last_reset_cause in the peer structure is too large
The last_reset_cause is a plain old BGP_MAX_PACKET_SIZE buffer
that is really enlarging the peer data structure. Let's just
copy the stream that failed and only allocate how ever much
the packet size actually was. While it's likely that we have
a reset reason, the packet typically is not going to be 65k
in size. Let's save space.
Donald Sharp [Fri, 21 Jul 2023 17:10:03 +0000 (13:10 -0400)]
bgpd: Replace peer->ibuf_scratch
The peer->ibuf_scratch was allocating 65535 * 10 bytes
for scratch space to hold data incoming from a read
from a peer. When you have 4k peers this is 262,1400,000
or 262 mb of data. Which is crazy large. Especially
since the i/o pthread is reading per peer without
any chance of having the data interfere with other reads.
Abhishek N R [Thu, 13 Jul 2023 09:54:27 +0000 (02:54 -0700)]
pim6d: Fixing core while running MLD conformance test.
While running MLD conformance test 9.2 core is getting generated.
Test setps:
1. ANVL: Listen (for upto <GeneralQueryRecvWaitTime> seconds) on <AIface-0>.
2. DUT: Send MLD General Query Message.
3. ANVL: Send MLD Report Message to <DIface-0> containing:
• IPv6 Source Address field set to link-local IPv6 Address of HOST-1
• IPv6 Destination Address field set to <McastAddrGroup>
• MLD Multicast Address field set to <McastAddrGroup>.
4. ANVL: Wait for <ProcessTime> seconds for DUT to process and add <Mcas- tAddrGroup> to its Multicast Address list.
5. ANVL: Send MLD General Query Message to <DIface-0> containing:
• IPv6 Source Address field set to link-local IPv6 Address of RTR-1 which is numerically less than the link-local IPv6 unicast address of <DIface-0>
• IPv6 Destination Address field set to link-scope all-nodes multicast address.
6. ANVL: Send MLD Multicast-Address-Specific Query Message to <DIface-0> containing:
• IPv6 Source Address field set to link-local IPv6 Address of RTR-1
• IPv6 Destination Address field set to <McastAddrGroup>
• MLD Multicast Address field set to <McastAddrGroup>
• MLD Maximum Response Delay field value set to 0.
7. ANVL: Verify that the Maximum Response Delay timer for <McastAd- drGroup> is set to zero.
While running above test, when group specific query is received we start gm_t_sg_expire timer.
Once this timer expires, we clear the corresponding entry.
During this sg->state was still set to JOIN. This happened because receiver went down without sending leave.
Added a condition to update the sg->state before starting the timer.
If receiver goes down without sending leave we will update sg->state to GM_SG_JOIN_EXPIRING or GM_SG_NOPRUNE_EXPIRING based on previous state.
If we receive a join then sg->state will be refreshed and will be updated to JOIN state.
bgpd: Do not try to redistribute routes if we are shutting down
When switching `router bgp`, `no router bgp` and doing redistributions, we should
ignore this action, otherwise memory leak happens:
```
Indirect leak of 400 byte(s) in 2 object(s) allocated from:
0 0x7f81b36b3a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
1 0x7f81b327bd2e in qcalloc lib/memory.c:105
2 0x55f301d28628 in bgp_node_create bgpd/bgp_table.c:92
3 0x7f81b3309d0b in route_node_new lib/table.c:52
4 0x7f81b3309d0b in route_node_set lib/table.c:61
5 0x7f81b330be0a in route_node_get lib/table.c:319
6 0x55f301ce89df in bgp_redistribute_add bgpd/bgp_route.c:8907
7 0x55f301dac182 in zebra_read_route bgpd/bgp_zebra.c:593
8 0x7f81b334dcd7 in zclient_read lib/zclient.c:4179
9 0x7f81b331d702 in event_call lib/event.c:1995
10 0x7f81b325d597 in frr_run lib/libfrr.c:1213
11 0x55f301b94b12 in main bgpd/bgp_main.c:505
12 0x7f81b2b57082 in __libc_start_main ../csu/libc-start.c:308
```
Donald Sharp [Mon, 17 Jul 2023 14:00:32 +0000 (10:00 -0400)]
zebra: Further handle route replace semantics
When an upper level protocol is installing a route X that needs to be
route replaced and at the same time the same or another protocol installs a
different route that depends on route X for nexthop resolution can leave
us with a state where the route is not accepted because zebra is still
really early in the route replace semantics ( route X is still on the work
Queue to be processed ) then the dependent route would not be installed.
This came up in the bgp_default_originate test cases frequently.
Further extendd the ROUTE_ENTR_ROUTE_REPLACING flag to cover this case
as well. This has come up because the early route processing queueing
that was implemented late last year.
Donald Sharp [Fri, 14 Jul 2023 16:14:20 +0000 (12:14 -0400)]
bgpd: Prevent use after free
When running bgp_always_compare_med, I am frequently seeing a crash
After running with valgrind I am seeing this and a invalid write
immediately after this as well.
==311743== Invalid read of size 2
==311743== at 0x4992421: route_map_counter_decrement (routemap.c:3308)
==311743== by 0x35664D: peer_route_map_unset (bgpd.c:7259)
==311743== by 0x306546: peer_route_map_unset_vty (bgp_vty.c:8037)
==311743== by 0x3066AC: no_neighbor_route_map (bgp_vty.c:8081)
==311743== by 0x49078DE: cmd_execute_command_real (command.c:990)
==311743== by 0x4907A63: cmd_execute_command (command.c:1050)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
==311743== by 0x1F7655: main (bgp_main.c:505)
==311743== Address 0x9ec2180 is 64 bytes inside a block of size 120 free'd
==311743== at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==311743== by 0x495A1BA: qfree (memory.c:130)
==311743== by 0x498D412: route_map_free_map (routemap.c:748)
==311743== by 0x498D176: route_map_add (routemap.c:672)
==311743== by 0x498D79B: route_map_get (routemap.c:857)
==311743== by 0x499C256: lib_route_map_create (routemap_northbound.c:102)
==311743== by 0x49702D8: nb_callback_create (northbound.c:1234)
==311743== by 0x497107F: nb_callback_configuration (northbound.c:1578)
==311743== by 0x4971693: nb_transaction_process (northbound.c:1709)
==311743== by 0x496FCF4: nb_candidate_commit_apply (northbound.c:1103)
==311743== by 0x496FE4E: nb_candidate_commit (northbound.c:1136)
==311743== by 0x497798F: nb_cli_classic_commit (northbound_cli.c:49)
==311743== by 0x4977B4F: nb_cli_pending_commit_check (northbound_cli.c:88)
==311743== by 0x49078C1: cmd_execute_command_real (command.c:987)
==311743== by 0x4907B44: cmd_execute_command (command.c:1068)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
==311743== by 0x1F7655: main (bgp_main.c:505)
==311743== Block was alloc'd at
==311743== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==311743== by 0x495A068: qcalloc (memory.c:105)
==311743== by 0x498D0C8: route_map_new (routemap.c:646)
==311743== by 0x498D128: route_map_add (routemap.c:658)
==311743== by 0x498D79B: route_map_get (routemap.c:857)
==311743== by 0x499C256: lib_route_map_create (routemap_northbound.c:102)
==311743== by 0x49702D8: nb_callback_create (northbound.c:1234)
==311743== by 0x497107F: nb_callback_configuration (northbound.c:1578)
==311743== by 0x4971693: nb_transaction_process (northbound.c:1709)
==311743== by 0x496FCF4: nb_candidate_commit_apply (northbound.c:1103)
==311743== by 0x496FE4E: nb_candidate_commit (northbound.c:1136)
==311743== by 0x497798F: nb_cli_classic_commit (northbound_cli.c:49)
==311743== by 0x4977B4F: nb_cli_pending_commit_check (northbound_cli.c:88)
==311743== by 0x49078C1: cmd_execute_command_real (command.c:987)
==311743== by 0x4907B44: cmd_execute_command (command.c:1068)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
Effectively the route_map that is being stored has been freed already
but we have not cleaned up properly yet. Go through and clean the
code up by ensuring that the pointer actually exists instead of trusting
it does when doing the decrement operation.
Christian Hopps [Fri, 14 Jul 2023 22:21:55 +0000 (18:21 -0400)]
vtysh: track and fix file-lock use in the workaround from 2004
There's a workaround in the code from a bug from back in 2004, it ends
and re-enters config mode anytime an `exit` is done from a level below
the top-level config node (e.g., from a `router isis` node). We need to
re-enter config mode with or without a lock according to how we actually
entered it to begin with.
Christian Hopps [Fri, 14 Jul 2023 20:13:54 +0000 (16:13 -0400)]
lib: mgmtd: only clear pending for the in-progress command
The lock/unlocks are being done short-circuit so they are never pending;
however, the handling of the unlock notification was always resuming the command
if pending was set. In all cases pending is set for another command. For example
implicit commit locks then when notified its done unlocks which was clearing the
set-config pending flag and resuming that command incorrectly.
lib: fix on-match when added to existing route-map entry
Currently, "on-match (next|goto)" only works if already present in a
route-map entry when the route-map is applied to the routes. However, if
the command is added to an existing route-map entry, the route-map is
not reapplied to the routes in order to accommodate the changes. And
service restart is needed. The problem is that setting the command
doesn't signal about the change to the listener (i.e. to a routing
daemon).
With this fix, signal to the listener about addition of "on-match
(next|goto)" to a route-map entry.
zebra: Fix crash when `dplane_fpm_nl` fails to process received routes
When `dplane_fpm_nl` receives a route, it allocates memory for a dplane
context and calls `netlink_route_change_read_unicast_internal` without
initializing the `intf_extra_list` contained in the dplane context. If
`netlink_route_change_read_unicast_internal` is not able to process the
route, we call `dplane_ctx_fini` to free the dplane context. This causes
a crash because `dplane_ctx_fini` attempts to access the intf_extra_list
which is not initialized.
To solve this issue, we can call `dplane_ctx_route_init`to initialize
the dplane route context properly, just after the dplane context
allocation.
(gdb) bt
#0 0x0000555dd5ceae80 in dplane_intf_extra_list_pop (h=0x7fae1c007e68) at ../zebra/zebra_dplane.c:427
#1 dplane_ctx_free_internal (ctx=0x7fae1c0074b0) at ../zebra/zebra_dplane.c:724
#2 0x0000555dd5cebc99 in dplane_ctx_free (pctx=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:869
#3 dplane_ctx_free (pctx=0x7fae2aa88c98, pctx@entry=0x7fae2aa78c28) at ../zebra/zebra_dplane.c:855
#4 dplane_ctx_fini (pctx=pctx@entry=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:890
#5 0x00007fae31e93f29 in fpm_read (t=) at ../zebra/dplane_fpm_nl.c:605
#6 0x00007fae325191dd in thread_call (thread=thread@entry=0x7fae2aa98da0) at ../lib/thread.c:2006
#7 0x00007fae324c42b8 in fpt_run (arg=0x555dd74777c0) at ../lib/frr_pthread.c:309
#8 0x00007fae32405ea7 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007fae32325a2f in clone () from /lib/x86_64-linux-gnu/libc.so.6
zebra: Abstract `dplane_ctx_route_init` to init route without copying
The function `dplane_ctx_route_init` initializes a dplane route context
from the route object passed as an argument. Let's abstract this
function to allow initializing the dplane route context without actually
copying a route object.
This allows us to use this function for initializing a dplane route
context when we don't have any route to copy in it.
anlan_cs [Mon, 19 Jun 2023 02:21:28 +0000 (10:21 +0800)]
zebra: fix wrong nexthop check for kernel routes
When changing one interface's vrf, the kernel routes are wrongly kept
in old vrf. Finally, the forwarding table in that old vrf can't forward
traffic correctly for those residual entries.
Follow these steps to make this problem happen:
( Firstly, "x1" interface of default vrf is with address of "6.6.6.6/24". )
```
anlan# ip route add 4.4.4.0/24 via 6.6.6.8 dev x1
anlan# ip link add vrf1 type vrf table 1
anlan# ip link set vrf1 up
anlan# ip link set x1 master vrf1
```
Then check `show ip route`, the route of "4.4.4.0/24" is still selected
in default vrf.
If the interface goes down, the kernel routes will be reevaluated. Those
kernel routes with active interface of nexthop can be kept no change, it
is a fast path. Otherwise, it enters into slow path to do careful examination
on this nexthop.
After the interface's vrf had been changed into new vrf, the down message of
this interface came. It means the interface is not in old vrf although it
still exists during that checking, so the kernel routes should be dropped
after this nexthop matching against a default route in slow path. But, in
current code they are wrongly kept in fast path for not checking vrf.
So, modified the checking active nexthop with vrf comparision for the interface
during reevaluation.
ryndia [Fri, 30 Jun 2023 11:02:27 +0000 (15:02 +0400)]
ospf6d: unlock lsa
The function ospf6_router_lsa_contains_adj(), ospf6_gr_check_adjs() and ospf6_find_interf_prefix_lsa() iterate through LSDB and lock each LSA. During testing, it was discovered that the lock count did not reach zero upon termination. The stack trace below indicates the leak. To resolve this issue, it was found that unlocking the LSA before returning from the functions solves the problem. This suggests that there was a missing unlock that caused the lock count to remain nonzero.
Direct leak of 400 byte(s) in 2 object(s) allocated from:
#0 0x7fa744ccea37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7fa744867562 in qcalloc ../lib/memory.c:105
#2 0x555cdbb37506 in ospf6_lsa_alloc ../ospf6d/ospf6_lsa.c:710
#3 0x555cdbb375d6 in ospf6_lsa_create ../ospf6d/ospf6_lsa.c:725
#4 0x555cdbaf1008 in ospf6_receive_lsa ../ospf6d/ospf6_flood.c:912
#5 0x555cdbb48ceb in ospf6_lsupdate_recv ../ospf6d/ospf6_message.c:1621
#6 0x555cdbb4ac90 in ospf6_read_helper ../ospf6d/ospf6_message.c:1896
#7 0x555cdbb4aecc in ospf6_receive ../ospf6d/ospf6_message.c:1925
#8 0x7fa744950c33 in event_call ../lib/event.c:1995
#9 0x7fa74483b34a in frr_run ../lib/libfrr.c:1213
#10 0x555cdbacf1eb in main ../ospf6d/ospf6_main.c:250
#11 0x7fa7443f9d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Indirect leak of 80 byte(s) in 2 object(s) allocated from:
#0 0x7fa744cce867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x7fa744867525 in qmalloc ../lib/memory.c:100
#2 0x555cdbb37520 in ospf6_lsa_alloc ../ospf6d/ospf6_lsa.c:711
#3 0x555cdbb375d6 in ospf6_lsa_create ../ospf6d/ospf6_lsa.c:725
#4 0x555cdbaf1008 in ospf6_receive_lsa ../ospf6d/ospf6_flood.c:912
#5 0x555cdbb48ceb in ospf6_lsupdate_recv ../ospf6d/ospf6_message.c:1621
#6 0x555cdbb4ac90 in ospf6_read_helper ../ospf6d/ospf6_message.c:1896
#7 0x555cdbb4aecc in ospf6_receive ../ospf6d/ospf6_message.c:1925
#8 0x7fa744950c33 in event_call ../lib/event.c:1995
#9 0x7fa74483b34a in frr_run ../lib/libfrr.c:1213
#10 0x555cdbacf1eb in main ../ospf6d/ospf6_main.c:250
#11 0x7fa7443f9d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Direct leak of 2000 byte(s) in 10 object(s) allocated from:
#0 0x7f2c3faeea37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7f2c3f68a6d9 in qcalloc ../lib/memory.c:105
#2 0x56431b83633d in ospf6_lsa_alloc ../ospf6d/ospf6_lsa.c:710
#3 0x56431b83640d in ospf6_lsa_create ../ospf6d/ospf6_lsa.c:725
#4 0x56431b7efe13 in ospf6_receive_lsa ../ospf6d/ospf6_flood.c:912
#5 0x56431b847b31 in ospf6_lsupdate_recv ../ospf6d/ospf6_message.c:1621
#6 0x56431b849ad6 in ospf6_read_helper ../ospf6d/ospf6_message.c:1896
#7 0x56431b849d12 in ospf6_receive ../ospf6d/ospf6_message.c:1925
#8 0x7f2c3f773c62 in event_call ../lib/event.c:1995
#9 0x7f2c3f65e2de in frr_run ../lib/libfrr.c:1213
#10 0x56431b7cdff6 in main ../ospf6d/ospf6_main.c:221
#11 0x7f2c3f21dd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Indirect leak of 460 byte(s) in 10 object(s) allocated from:
#0 0x7f2c3faee867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x7f2c3f68a69c in qmalloc ../lib/memory.c:100
#2 0x56431b836357 in ospf6_lsa_alloc ../ospf6d/ospf6_lsa.c:711
#3 0x56431b83640d in ospf6_lsa_create ../ospf6d/ospf6_lsa.c:725
#4 0x56431b7efe13 in ospf6_receive_lsa ../ospf6d/ospf6_flood.c:912
#5 0x56431b847b31 in ospf6_lsupdate_recv ../ospf6d/ospf6_message.c:1621
#6 0x56431b849ad6 in ospf6_read_helper ../ospf6d/ospf6_message.c:1896
#7 0x56431b849d12 in ospf6_receive ../ospf6d/ospf6_message.c:1925
#8 0x7f2c3f773c62 in event_call ../lib/event.c:1995
#9 0x7f2c3f65e2de in frr_run ../lib/libfrr.c:1213
#10 0x56431b7cdff6 in main ../ospf6d/ospf6_main.c:221
#11 0x7f2c3f21dd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Donald Sharp [Fri, 30 Jun 2023 13:56:51 +0000 (09:56 -0400)]
lib: Add two places we were not counting route-map applied
There were a couple of places where it was possible a route-map
was applied( and DENIED ) but the count for the number of times
the application happen was not incremented.
Donald Sharp [Sat, 1 Jul 2023 15:18:06 +0000 (11:18 -0400)]
ospf6d: Fix crash because neighbor structure was freed
The loading_done event needs a event pointer to prevent
use after free's. Testing found this:
ERROR: AddressSanitizer: heap-use-after-free on address 0x613000035130 at pc 0x55ad42d54e5f bp 0x7ffff1e942a0 sp 0x7ffff1e94290
READ of size 1 at 0x613000035130 thread T0
#0 0x55ad42d54e5e in loading_done ospf6d/ospf6_neighbor.c:447
#1 0x55ad42ed7be4 in event_call lib/event.c:1995
#2 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#3 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#4 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#5 0x55ad42cf2b19 in _start (/usr/lib/frr/ospf6d+0x248b19)
0x613000035130 is located 48 bytes inside of 384-byte region [0x613000035100,0x613000035280)
freed by thread T0 here:
#0 0x7f57998d77a8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7a8)
#1 0x55ad42e3b4b6 in qfree lib/memory.c:130
#2 0x55ad42d5d049 in ospf6_neighbor_delete ospf6d/ospf6_neighbor.c:180
#3 0x55ad42d1e1ea in interface_down ospf6d/ospf6_interface.c:930
#4 0x55ad42ed7be4 in event_call lib/event.c:1995
#5 0x55ad42ed84fe in _event_execute lib/event.c:2086
#6 0x55ad42d26d7b in ospf6_interface_clear ospf6d/ospf6_interface.c:2847
#7 0x55ad42d73f16 in ospf6_process_reset ospf6d/ospf6_top.c:755
#8 0x55ad42d7e98c in clear_router_ospf6_magic ospf6d/ospf6_top.c:778
#9 0x55ad42d7e98c in clear_router_ospf6 ospf6d/ospf6_top_clippy.c:42
#10 0x55ad42dc2665 in cmd_execute_command_real lib/command.c:994
#11 0x55ad42dc2b32 in cmd_execute_command lib/command.c:1053
#12 0x55ad42dc2fa9 in cmd_execute lib/command.c:1221
#13 0x55ad42ee3cd6 in vty_command lib/vty.c:591
#14 0x55ad42ee4170 in vty_execute lib/vty.c:1354
#15 0x55ad42eec94f in vtysh_read lib/vty.c:2362
#16 0x55ad42ed7be4 in event_call lib/event.c:1995
#17 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#18 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#19 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
previously allocated by thread T0 here:
#0 0x7f57998d7d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x55ad42e3ab22 in qcalloc lib/memory.c:105
#2 0x55ad42d5c8ff in ospf6_neighbor_create ospf6d/ospf6_neighbor.c:119
#3 0x55ad42d4c86a in ospf6_hello_recv ospf6d/ospf6_message.c:464
#4 0x55ad42d4c86a in ospf6_read_helper ospf6d/ospf6_message.c:1884
#5 0x55ad42d4c86a in ospf6_receive ospf6d/ospf6_message.c:1925
#6 0x55ad42ed7be4 in event_call lib/event.c:1995
#7 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#8 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#9 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Add an actual event pointer and just track it appropriately.
Donald Sharp [Fri, 30 Jun 2023 19:21:43 +0000 (15:21 -0400)]
ospf6d: Stop crash in ospf6_write
I'm seeing crashes in ospf6_write on the `assert(node)`. The only
sequence of events that I see that could possibly cause this to happen
is this:
a) Someone has scheduled a outgoing write to the ospf6->t_write and
placed item(s) on the ospf6->oi_write_q
b) A decision is made in ospf6_send_lsupdate() to send an immediate
packet via a event_execute(..., ospf6_write,....).
c) ospf6_write is called and the oi_write_q is cleaned out.
d) the t_write event is now popped and the oi_write_q is empty
and FRR asserts on the `assert(node)` <crash>
When event_execute is called for ospf6_write, just cancel the t_write
event. If ospf6_write has more data to send at the end of the function
it will reschedule itself. I've only seen this crash one time and am
unable to reliably reproduce this at all. But this is the only mechanism
that I can see that could make this happen, given how little the oi_write_q
is actually touched in code.
anlan_cs [Wed, 28 Jun 2023 04:31:52 +0000 (12:31 +0800)]
pbrd: fix crash with match command
Crash with empty `ip-protocol`:
```
anlan(config-pbr-map)# match ip-protocol
vtysh: error reading from pbrd: Resource temporarily unavailable (11)Warning: closing connection to pbrd because of an I/O error!
```
Keelan10 [Mon, 26 Jun 2023 10:56:26 +0000 (14:56 +0400)]
pimd: Fix memory leak in PIM interface deletion
This commit ensures proper cleanup by deleting the gm_join_list when a PIM interface is deleted. The gm_join_list was previously not being freed, causing a memory leak.
The ASan leak log for reference:
```
***********************************************************************************
Address Sanitizer Error detected in multicast_mld_join_topo1.test_multicast_mld_local_join/r1.asan.pim6d.28070
Direct leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7f3605dbfd28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x56230373dd6b in qcalloc lib/memory.c:105
#2 0x56230372180f in list_new lib/linklist.c:49
#3 0x56230361b589 in pim_if_gm_join_add pimd/pim_iface.c:1313
#4 0x562303642247 in lib_interface_gmp_address_family_static_group_create pimd/pim_nb_config.c:2868
#5 0x562303767280 in nb_callback_create lib/northbound.c:1235
#6 0x562303767280 in nb_callback_configuration lib/northbound.c:1579
#7 0x562303768a1d in nb_transaction_process lib/northbound.c:1710
#8 0x56230376904a in nb_candidate_commit_apply lib/northbound.c:1104
#9 0x5623037692ba in nb_candidate_commit lib/northbound.c:1137
#10 0x562303769dec in nb_cli_classic_commit lib/northbound_cli.c:49
#11 0x56230376fb79 in nb_cli_pending_commit_check lib/northbound_cli.c:88
#12 0x5623036c5bcb in cmd_execute_command_real lib/command.c:991
#13 0x5623036c5f1b in cmd_execute_command lib/command.c:1053
#14 0x5623036c6392 in cmd_execute lib/command.c:1221
#15 0x5623037e75da in vty_command lib/vty.c:591
#16 0x5623037e7a74 in vty_execute lib/vty.c:1354
#17 0x5623037f0253 in vtysh_read lib/vty.c:2362
#18 0x5623037db4e8 in event_call lib/event.c:1995
#19 0x562303720f97 in frr_run lib/libfrr.c:1213
#20 0x56230368615d in main pimd/pim6_main.c:184
#21 0x7f360461bc86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 192 byte(s) in 4 object(s) allocated from:
#0 0x7f3605dbfd28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x56230373dd6b in qcalloc lib/memory.c:105
#2 0x56230361b91d in gm_join_new pimd/pim_iface.c:1288
#3 0x56230361b91d in pim_if_gm_join_add pimd/pim_iface.c:1326
#4 0x562303642247 in lib_interface_gmp_address_family_static_group_create pimd/pim_nb_config.c:2868
#5 0x562303767280 in nb_callback_create lib/northbound.c:1235
#6 0x562303767280 in nb_callback_configuration lib/northbound.c:1579
#7 0x562303768a1d in nb_transaction_process lib/northbound.c:1710
#8 0x56230376904a in nb_candidate_commit_apply lib/northbound.c:1104
#9 0x5623037692ba in nb_candidate_commit lib/northbound.c:1137
#10 0x562303769dec in nb_cli_classic_commit lib/northbound_cli.c:49
#11 0x56230376fb79 in nb_cli_pending_commit_check lib/northbound_cli.c:88
#12 0x5623036c5bcb in cmd_execute_command_real lib/command.c:991
#13 0x5623036c5f1b in cmd_execute_command lib/command.c:1053
#14 0x5623036c6392 in cmd_execute lib/command.c:1221
#15 0x5623037e75da in vty_command lib/vty.c:591
#16 0x5623037e7a74 in vty_execute lib/vty.c:1354
#17 0x5623037f0253 in vtysh_read lib/vty.c:2362
#18 0x5623037db4e8 in event_call lib/event.c:1995
#19 0x562303720f97 in frr_run lib/libfrr.c:1213
#20 0x56230368615d in main pimd/pim6_main.c:184
#21 0x7f360461bc86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 96 byte(s) in 4 object(s) allocated from:
#0 0x7f3605dbfd28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x56230373dd6b in qcalloc lib/memory.c:105
#2 0x562303721651 in listnode_new lib/linklist.c:71
#3 0x56230372182b in listnode_add lib/linklist.c:92
#4 0x56230361ba9a in gm_join_new pimd/pim_iface.c:1295
#5 0x56230361ba9a in pim_if_gm_join_add pimd/pim_iface.c:1326
#6 0x562303642247 in lib_interface_gmp_address_family_static_group_create pimd/pim_nb_config.c:2868
#7 0x562303767280 in nb_callback_create lib/northbound.c:1235
#8 0x562303767280 in nb_callback_configuration lib/northbound.c:1579
#9 0x562303768a1d in nb_transaction_process lib/northbound.c:1710
#10 0x56230376904a in nb_candidate_commit_apply lib/northbound.c:1104
#11 0x5623037692ba in nb_candidate_commit lib/northbound.c:1137
#12 0x562303769dec in nb_cli_classic_commit lib/northbound_cli.c:49
#13 0x56230376fb79 in nb_cli_pending_commit_check lib/northbound_cli.c:88
#14 0x5623036c5bcb in cmd_execute_command_real lib/command.c:991
#15 0x5623036c5f1b in cmd_execute_command lib/command.c:1053
#16 0x5623036c6392 in cmd_execute lib/command.c:1221
#17 0x5623037e75da in vty_command lib/vty.c:591
#18 0x5623037e7a74 in vty_execute lib/vty.c:1354
#19 0x5623037f0253 in vtysh_read lib/vty.c:2362
#20 0x5623037db4e8 in event_call lib/event.c:1995
#21 0x562303720f97 in frr_run lib/libfrr.c:1213
#22 0x56230368615d in main pimd/pim6_main.c:184
#23 0x7f360461bc86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 48 byte(s) in 1 object(s) allocated from:
#0 0x7f3605dbfd28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x56230373dd6b in qcalloc lib/memory.c:105
#2 0x56230361b91d in gm_join_new pimd/pim_iface.c:1288
#3 0x56230361b91d in pim_if_gm_join_add pimd/pim_iface.c:1326
#4 0x562303642247 in lib_interface_gmp_address_family_static_group_create pimd/pim_nb_config.c:2868
#5 0x562303767280 in nb_callback_create lib/northbound.c:1235
#6 0x562303767280 in nb_callback_configuration lib/northbound.c:1579
#7 0x562303768a1d in nb_transaction_process lib/northbound.c:1710
#8 0x56230376904a in nb_candidate_commit_apply lib/northbound.c:1104
#9 0x5623037692ba in nb_candidate_commit lib/northbound.c:1137
#10 0x562303769dec in nb_cli_classic_commit lib/northbound_cli.c:49
#11 0x56230376fb79 in nb_cli_pending_commit_check lib/northbound_cli.c:88
#12 0x5623036c5bcb in cmd_execute_command_real lib/command.c:991
#13 0x5623036c5f6f in cmd_execute_command lib/command.c:1072
#14 0x5623036c6392 in cmd_execute lib/command.c:1221
#15 0x5623037e75da in vty_command lib/vty.c:591
#16 0x5623037e7a74 in vty_execute lib/vty.c:1354
#17 0x5623037f0253 in vtysh_read lib/vty.c:2362
#18 0x5623037db4e8 in event_call lib/event.c:1995
#19 0x562303720f97 in frr_run lib/libfrr.c:1213
#20 0x56230368615d in main pimd/pim6_main.c:184
#21 0x7f360461bc86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f3605dbfd28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x56230373dd6b in qcalloc lib/memory.c:105
#2 0x562303721651 in listnode_new lib/linklist.c:71
#3 0x56230372182b in listnode_add lib/linklist.c:92
#4 0x56230361ba9a in gm_join_new pimd/pim_iface.c:1295
#5 0x56230361ba9a in pim_if_gm_join_add pimd/pim_iface.c:1326
#6 0x562303642247 in lib_interface_gmp_address_family_static_group_create pimd/pim_nb_config.c:2868
#7 0x562303767280 in nb_callback_create lib/northbound.c:1235
#8 0x562303767280 in nb_callback_configuration lib/northbound.c:1579
#9 0x562303768a1d in nb_transaction_process lib/northbound.c:1710
#10 0x56230376904a in nb_candidate_commit_apply lib/northbound.c:1104
#11 0x5623037692ba in nb_candidate_commit lib/northbound.c:1137
#12 0x562303769dec in nb_cli_classic_commit lib/northbound_cli.c:49
#13 0x56230376fb79 in nb_cli_pending_commit_check lib/northbound_cli.c:88
#14 0x5623036c5bcb in cmd_execute_command_real lib/command.c:991
#15 0x5623036c5f6f in cmd_execute_command lib/command.c:1072
#16 0x5623036c6392 in cmd_execute lib/command.c:1221
#17 0x5623037e75da in vty_command lib/vty.c:591
#18 0x5623037e7a74 in vty_execute lib/vty.c:1354
#19 0x5623037f0253 in vtysh_read lib/vty.c:2362
#20 0x5623037db4e8 in event_call lib/event.c:1995
#21 0x562303720f97 in frr_run lib/libfrr.c:1213
#22 0x56230368615d in main pimd/pim6_main.c:184
#23 0x7f360461bc86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
SUMMARY: AddressSanitizer: 400 byte(s) leaked in 11 allocation(s).
***********************************************************************************
```
Chirag Shah [Mon, 26 Jun 2023 22:29:59 +0000 (15:29 -0700)]
zebra: fix evpn rmac nh list cmp function
EVPN RMAC (Router MAC) nexthop list compare
function needs to return all values so
the list element can be compared and added/deleted
properly.
Ticket:#3486989
Testing Done:
Originate EVPN Type-5 route with PIP IP and MAC as remote
nexthops.
Change the PIP IP address which triggers nexthop change.
Before fix:
When PIP IP changes RMAC is deleted from remote VTEPs.
TORS1# show evpn next-hops vni 4001 | include 00:02:00:00:00:2d
27.0.0.11 00:02:00:00:00:2d
TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
00:02:00:00:00:2d 27.0.0.11
----- Remote VTEP change nexthop IP to 172.16.16.16 -----
TORS1# show evpn next-hops vni 4001 | include 00:02:00:00:00:2d
172.16.16.16 00:02:00:00:00:2d
TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
TORS1#
After fix:
RMAC is retained as its nexthop list is not empty,
thus it is not deleted from remote VTEPs.
TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
00:02:00:00:00:2d 172.16.16.16
Christian Hopps [Sun, 18 Jun 2023 20:19:54 +0000 (16:19 -0400)]
mgmtd: KISS the locking code
Move away from things like "lock if not locked" type code, require the
user has locked prior to geting to that point.
For now we warn if we are taking a lock we already had; however, this
should really be a failure point.
New requirements:
SETCFG -
not implicit commit - requires user has locked candidate DS and they
must unlock after
implicit commit - requires user has locked candidate and running DS
both locks will be unlocked on reply to the SETCFG
COMMITCFG -
requires user has locked candidate and running DS and they must unlock
after
rollback - this code now get both locks and then does an unlock and
early return thing on the adapter side. It needs to be un-special
cased in follow up work that would also include tests for this
functionality.
Christian Hopps [Fri, 9 Jun 2023 20:54:54 +0000 (16:54 -0400)]
lib: mgmtd: improvements in logging and commentary
- log names of datastores not numbers
- improve logging for mgmt_msg_read
- Rather than use a bool, instead store the pending const string name of
the command being run that has postponed the CLI. This adds some nice
information to the logging when enabled.