Donald Sharp [Mon, 19 Dec 2022 15:51:58 +0000 (10:51 -0500)]
ospfd: Fix some json mem leaks and some issues
a) if show_function happened to be NULL we would leak json memory
b) json_lsa_type was being allocated but only used in the default case, leaking memory
c) json output would sometimes produce text output and that is incorrect
Manoj Naragund [Mon, 19 Dec 2022 12:07:22 +0000 (04:07 -0800)]
ospf6d: Fixing memory leak in ospf6_lsa_create_headeronly for both master and slave.
Problem Statement:
=================
Memory leak backtraces
2022-11-23 01:51:10,525 - ERROR: ==842== 1,100 (1,000 direct, 100 indirect) bytes in 5 blocks are definitely lost in loss record 29 of 31
2022-11-23 01:51:10,525 - ERROR: ==842== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_dbdesc_recv_slave (ospf6_message.c:976)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_dbdesc_recv (ospf6_message.c:1038)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,526 - ERROR: ==842== by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,526 - ERROR: ==842== by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,526 - ERROR: ==842==
2022-11-23 01:51:10,524 - ERROR: Found memory leak in module ospf6d
2022-11-23 01:51:10,525 - ERROR: ==842== 220 (200 direct, 20 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 31
2022-11-23 01:51:10,525 - ERROR: ==842== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_dbdesc_recv_master (ospf6_message.c:760)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_dbdesc_recv (ospf6_message.c:1036)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,525 - ERROR: ==842==
RCA:
====
These memory leaks are beacuse of last lsa in neighbour's request_list is not
getting freed beacuse of lsa lock. The last request has an addtional lock which
is added as a part of ospf6_make_lsreq, this lock needs to be removed
in order for the lsa to get freed.
Fix:
====
Check and remove the lock on the last request in all the functions.
Donald Sharp [Sat, 17 Dec 2022 19:09:29 +0000 (14:09 -0500)]
zebra: Ensure memory is not freed that dplane depends on in shutdown
Zebra has a shutdown setup where it asks the dplane to shutdown but can
still be processing data. This is especially true if something the dplane
is listening on receives data that will be processed by the main dplane thread
from netlink. When zebra_finalize is called it is possible that a bit
of data comes in before the zebra_dplane_shutdown() function is called
and the memory freed in ns_walk_func() causes the main dplane event
to crash when it cannot find the ns data anymore.
Reverse the order, stop the zebra dplane pthread and then free the
memory associated with the namespaces.
anlan_cs [Sat, 17 Dec 2022 08:25:56 +0000 (16:25 +0800)]
zebra: fix wrong gateway for fpm debug
The wrong parameter is passed in `inet_ntop()` of `zfpm_log_route_info()` in
old fpm module, so the display of gateway is always wrong. Just remove
that extra ampersand.
Additionally, use "none" as gateway value for the case of no gateway.
ospfd: json support for show ip ospf border-routers
show ip ospf border-routers json support added.
commands:
- show ip ospf vrf default border-routers json
- show ip ospf vrf all border-routers json
- show ip ospf border-routers json
Changes:
JSON support added for below commands,
- show ip nht route-map vrf all json
- show ip nht route-map vrf <name> json
- show ipv6 nht route-map vrf all json
- show ipv6 nht route-map vrf <name> json
- show ipv6 nht route-map json
- show ip nht route-map json
Donald Sharp [Fri, 16 Dec 2022 13:43:16 +0000 (08:43 -0500)]
bgpd: Convert bgp_packet_mpattr_prefix to use an enum for safi's
The function bgp_packet_mpattr_prefix was using an if statement
to encode packets to the peer. Change it to a switch and make
it handle all the cases and fail appropriately when something
has gone wrong. Hopefully in the future when a new afi/safi
is added we can catch it by compilation breaking instead of
weird runtime errors
Donald Sharp [Fri, 16 Dec 2022 13:17:18 +0000 (08:17 -0500)]
bgpd: make bgp_packet_mpattr_start more prescriptive when using enum's
This function was just using default: case statements for
the encoding of nlri's to a peer. Lay out all the different
cases and make things fail hard when a dev escape is found.
Donald Sharp [Fri, 16 Dec 2022 12:58:54 +0000 (07:58 -0500)]
bgpd: Convert bgp_packet_mpattr_prefix_size to use an enum
The function bgp_packet_mpattr_prefix_size had an if/else
body that allowed people to add encoding types to bgpd
such that we could build the wrong size packets. This
was exposed recently in commit: 0a9705a1e07c1d8176fd21f8f1bde2a9a155331b
Where it was discovered flowspec was causing bgp update
messages to exceed the maximum size and the peer to
drop the connection. Let's be proscriptive about this
and hopefully make it so that things don't work when
someone adds a new safi to the system ( and they'll have
to update this function ).
Donald Sharp [Fri, 16 Dec 2022 12:38:58 +0000 (07:38 -0500)]
lib, staticd: return values even after an assert
When compiling with -fsanitize=thread. I started getting this error:
staticd/static_zebra.c: In function ‘static_zebra_nht_get_prefix’:
staticd/static_zebra.c:316:1: error: control reaches end of non-void function [-Werror=return-type]
316 | }
| ^
Just to make future efforts still work, let's just make the compiler happy.
Trey Aspelund [Thu, 15 Dec 2022 18:49:43 +0000 (18:49 +0000)]
bgpd: cleanup macip_path_list for remote macip
ASAN reported the following memleak:
```
Direct leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x4d4342 in calloc (/usr/lib/frr/bgpd+0x4d4342)
#1 0xbc3d68 in qcalloc /home/sharpd/frr8/lib/memory.c:116:27
#2 0xb869f7 in list_new /home/sharpd/frr8/lib/linklist.c:64:9
#3 0x5a38bc in bgp_evpn_remote_ip_hash_alloc /home/sharpd/frr8/bgpd/bgp_evpn.c:6789:24
#4 0xb358d3 in hash_get /home/sharpd/frr8/lib/hash.c:162:13
#5 0x593d39 in bgp_evpn_remote_ip_hash_add /home/sharpd/frr8/bgpd/bgp_evpn.c:6881:7
#6 0x59dbbd in install_evpn_route_entry_in_vni_common /home/sharpd/frr8/bgpd/bgp_evpn.c:3049:2
#7 0x59cfe0 in install_evpn_route_entry_in_vni_ip /home/sharpd/frr8/bgpd/bgp_evpn.c:3126:8
#8 0x59c6f0 in install_evpn_route_entry /home/sharpd/frr8/bgpd/bgp_evpn.c:3318:8
#9 0x59bb52 in install_uninstall_route_in_vnis /home/sharpd/frr8/bgpd/bgp_evpn.c:3888:10
#10 0x59b6d2 in bgp_evpn_install_uninstall_table /home/sharpd/frr8/bgpd/bgp_evpn.c:4019:5
#11 0x578857 in install_uninstall_evpn_route /home/sharpd/frr8/bgpd/bgp_evpn.c:4051:9
#12 0x58ada6 in bgp_evpn_import_route /home/sharpd/frr8/bgpd/bgp_evpn.c:6049:9
#13 0x713794 in bgp_update /home/sharpd/frr8/bgpd/bgp_route.c:4842:3
#14 0x583fa0 in process_type2_route /home/sharpd/frr8/bgpd/bgp_evpn.c:4518:9
#15 0x5824ba in bgp_nlri_parse_evpn /home/sharpd/frr8/bgpd/bgp_evpn.c:5732:8
#16 0x6ae6a2 in bgp_nlri_parse /home/sharpd/frr8/bgpd/bgp_packet.c:363:10
#17 0x6be6fa in bgp_update_receive /home/sharpd/frr8/bgpd/bgp_packet.c:2020:15
#18 0x6b7433 in bgp_process_packet /home/sharpd/frr8/bgpd/bgp_packet.c:2929:11
#19 0xd00146 in thread_call /home/sharpd/frr8/lib/thread.c:2006:2
```
The list itself was not being cleaned up when the final list entry was
removed, so make sure we do that instead of leaking memory.
Donald Sharp [Wed, 14 Dec 2022 20:53:06 +0000 (15:53 -0500)]
lib: Fix free function
The list delete function on creation was set to srv6_locator_chunk_free
Which takes a double pointer and dereferences it to free the data.
When list_delete is called it calls the delete function like this:
if (*list->del)
(*list->del)(node->data);
The data is not passed in by reference and as such we do not have
a double pointer. Fortunately this list_delete is only really
called on shutdown when the locator was deleted and we do not
have a fun situation where we were suddenly freeing 'something'.
Donald Sharp [Wed, 14 Dec 2022 19:20:26 +0000 (14:20 -0500)]
zebra: When freeing the early route queue, actually free it right
The early route queue has a series of `struct zebra_early_route *`
entries. Zebra is treating this memory as just a `struct route entry`.
This is wrong. Correct this to free the memory correctly.
Donald Sharp [Wed, 14 Dec 2022 19:05:11 +0000 (14:05 -0500)]
lib, tests, zebra: Remove unused workqueue error function
The wq->spec.errorfunc is never used in the code.
It's been in the code base since 2005 and I also
do not remember ever seeing it being called. No
workqueue process function ever returns error.
Since it's not used let's just remove it from the
code base.
Donald Sharp [Wed, 14 Dec 2022 17:53:53 +0000 (12:53 -0500)]
tests: Limit run of config_timing when building with --enable-address-sanitizer
Building FRR with --enable-address-sanitizer and then running the
config_timing test makes the test run for over an hour on my machine.
The goal of this test is to ensure that the test runs 10000 routes
in/out in a reasonable amount of time. We cannot test this with
address-sanitizer enabled. So just make the test meaningless
from a timing perspective but keep it `alive` from a it might
catch some address sanitizer issue with 50 -vs- 10000 routes
Direct leak of 128 byte(s) in 4 object(s) allocated from:
#0 0x4bd732 in calloc (/usr/lib/frr/zebra+0x4bd732)
#1 0x7feaeab8f798 in qcalloc /home/sharpd/frr8/lib/memory.c:116:27
#2 0x7feaeaba40f4 in nexthop_group_new /home/sharpd/frr8/lib/nexthop_group.c:270:9
#3 0x56859b in netlink_route_change_read_unicast /home/sharpd/frr8/zebra/rt_netlink.c:950:9
#4 0x5651c2 in netlink_route_change /home/sharpd/frr8/zebra/rt_netlink.c:1204:2
#5 0x54af15 in netlink_information_fetch /home/sharpd/frr8/zebra/kernel_netlink.c:407:10
#6 0x53e7a3 in netlink_parse_info /home/sharpd/frr8/zebra/kernel_netlink.c:1184:12
#7 0x548d46 in kernel_read /home/sharpd/frr8/zebra/kernel_netlink.c:501:2
#8 0x7feaeacc87f6 in thread_call /home/sharpd/frr8/lib/thread.c:2006:2
#9 0x7feaeab36503 in frr_run /home/sharpd/frr8/lib/libfrr.c:1198:3
#10 0x550d38 in main /home/sharpd/frr8/zebra/main.c:476:2
#11 0x7feaea492d09 in __libc_start_main csu/../csu/libc-start.c:308:16
Indirect leak of 576 byte(s) in 4 object(s) allocated from:
#0 0x4bd732 in calloc (/usr/lib/frr/zebra+0x4bd732)
#1 0x7feaeab8f798 in qcalloc /home/sharpd/frr8/lib/memory.c:116:27
#2 0x7feaeab9b3f8 in nexthop_new /home/sharpd/frr8/lib/nexthop.c:373:7
#3 0x56875e in netlink_route_change_read_unicast /home/sharpd/frr8/zebra/rt_netlink.c:960:15
#4 0x5651c2 in netlink_route_change /home/sharpd/frr8/zebra/rt_netlink.c:1204:2
#5 0x54af15 in netlink_information_fetch /home/sharpd/frr8/zebra/kernel_netlink.c:407:10
#6 0x53e7a3 in netlink_parse_info /home/sharpd/frr8/zebra/kernel_netlink.c:1184:12
#7 0x548d46 in kernel_read /home/sharpd/frr8/zebra/kernel_netlink.c:501:2
#8 0x7feaeacc87f6 in thread_call /home/sharpd/frr8/lib/thread.c:2006:2
#9 0x7feaeab36503 in frr_run /home/sharpd/frr8/lib/libfrr.c:1198:3
#10 0x550d38 in main /home/sharpd/frr8/zebra/main.c:476:2
#11 0x7feaea492d09 in __libc_start_main csu/../csu/libc-start.c:308:16
SUMMARY: AddressSanitizer: 704 byte(s) leaked in 8 allocation(s).
Louis Scalbert [Mon, 14 Feb 2022 13:18:10 +0000 (14:18 +0100)]
bgpd: fix the IGP metric for best path selection on VPN import
Since the commit da0c0ef70c ("bgpd: VRF-Lite fix best path selection"),
the best path selection is made from the comparison of the attributes
of the original route i.e. the ultimate path.
The IGP metric is currently set on the child path instead of the
ultimate path (i.e. the parent path). On eBGP, the ultimate path is the
child path. However, for imported routes, the ultimate path is always
set to 0, which results in skipping the IGP metric comparison when
selecting the best path.
Set the IGP metric on the ultimate path when a BGP nexthop is added or
updated.
Fixes: da0c0ef70c ("bgpd: VRF-Lite fix best path selection") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
bgpd: Add support for flowspec prefixes in bgp_packet_mpattr_prefix_size
Currently, bgp_packet_mpattr_prefix_size (bgpd/bgp_attr.c:3978) always returns zero for Flowspec prefixes.
This is because, for flowspec prefixes, the prefixlen attribute of the prefix struct is always set to 0, and the actual length is bytes is set inside the flowspec_prefix struct instead (see lib/prefix.h:293 and lib/prefix.h:178).
Because of this, with a large number of flowspec NLRIs, bgpd ends up building update messages that exceed the maximum size and cause the peer to drop the connection (bgpd/bgp_updgrp_packet.c:L719).
The proposed change allows the bgp_packet_mpattr_prefix_size to return correct result for flowspec prefixes.
Donald Sharp [Thu, 3 Nov 2022 23:01:36 +0000 (19:01 -0400)]
bgpd: use the enum instead of an int
The bgp_fsm_change_status function takes an int
for the new bgp state, which is an `enum bgp_fsm_status status`
let's convert over to being explicit.bgpd: use the enum instead of an int
Kuldeep Kashyap [Fri, 9 Dec 2022 15:21:29 +0000 (07:21 -0800)]
tests: Topotests daemon start as per feature test
Currently topotests starts all daemons by default,
made changes to f/w so only needed daemons can
be started, daemons which are needed to tests
particular test suite.
David Lamparter [Tue, 13 Dec 2022 17:34:41 +0000 (18:34 +0100)]
accords: CLI coloring
This is pretty much a writeup of the discussion we had on the FRR weekly
call about an hour ago. I added a bunch of detail but hopefully it
still represents the overall consensus.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Rafael Zalamena [Mon, 12 Dec 2022 18:11:27 +0000 (15:11 -0300)]
ospfd: fix memory leak on SPF calculation
Fix the following problems:
- Always free vertex next hops on `vertex_parent_free`
- Signalize failure on `ospf_spf_add_parent` when parent already exists
so the caller has the chance to `free()` any allocated resources.
- Don't reuse vertex next hops without the reference count logic in
`ospf_nexthop_calculation`. Instead allocate a new copy so it can be
`free()`d later without complications
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Donald Sharp [Tue, 4 Oct 2022 19:41:36 +0000 (15:41 -0400)]
zebra: Add ctx to netlink message parsing
Add the initial step of passing in a dplane context
to reading route netlink messages. This code
will be run in two contexts:
a) The normal pthread for reading netlink messages from
the kernel
b) The dplane_fpm_nl pthread.
The goal of this commit is too just allow a) to work
b) will be filled in in the future. Effectively
everything should still be working as it should
pre this change. We will just possibly allow
the passing of the context around( but not used )
Donald Sharp [Mon, 3 Oct 2022 17:22:22 +0000 (13:22 -0400)]
zebra: Rearrange dplane_ctx_route_init
In order for a future commit to abstract the dplane_ctx_route_init
so that the kernel can use it, let's move some stuff around
and add a dplane_ctx_route_init_basic that can be used by multiple
different paths
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
create a dplane_ctx_route_init_basic so it can be used
Volta submitted notification changes for the dplane that had a
special use case for their system. Volta is no more, the code
is not being actively developed and from talking with ex-Volta
employees there is no current plans to even maintain this code.
Wrap the special handling of nexthops that their asic-dataplane
did in a bit of code to isolate it and allow for future removal,
as that I do not actually believe anyone else is using this code.
Add a CPP_NOTICE several years into the future that will tell us
to remove the code. If someone starts using it then they will
have to notice this variable to set it and hopefully they will
see my CPP_NOTICE to come talk to us. If this is being used then
we can just remove this wrapper.