Donald Sharp [Sat, 2 Mar 2024 14:50:38 +0000 (09:50 -0500)]
bgpd: Ensure community data is freed in some cases.
Customer has this valgrind trace:
Direct leak of 2829120 byte(s) in 70728 object(s) allocated from:
0 in community_new ../bgpd/bgp_community.c:39
1 in community_uniq_sort ../bgpd/bgp_community.c:170
2 in route_set_community ../bgpd/bgp_routemap.c:2342
3 in route_map_apply_ext ../lib/routemap.c:2673
4 in subgroup_announce_check ../bgpd/bgp_route.c:2367
5 in subgroup_process_announce_selected ../bgpd/bgp_route.c:2914
6 in group_announce_route_walkcb ../bgpd/bgp_updgrp_adv.c:199
7 in hash_walk ../lib/hash.c:285
8 in update_group_af_walk ../bgpd/bgp_updgrp.c:2061
9 in group_announce_route ../bgpd/bgp_updgrp_adv.c:1059
10 in bgp_process_main_one ../bgpd/bgp_route.c:3221
11 in bgp_process_wq ../bgpd/bgp_route.c:3221
12 in work_queue_run ../lib/workqueue.c:282
The above leak detected by valgrind was from a screenshot so I copied it
by hand. Any mistakes in line numbers are purely from my transcription.
Additionally this is against a slightly modified 8.5.1 version of FRR.
Code inspection of 8.5.1 -vs- latest master shows the same problem
exists. Code should be able to be followed from there to here.
What is happening:
There is a route-map being applied that modifes the outgoing community
to a peer. This is saved in the attr copy created in
subgroup_process_announce_selected. This community pointer is not
interned. So the community->refcount is still 0. Normally when
a prefix is announced, the attr and the prefix are placed on a
adjency out structure where the attribute is interned. This will
cause the community to be saved in the community hash list as well.
In a non-normal operation when the decision to send is aborted after
the route-map application, the attribute is just dropped and the
pointer to the community is just dropped too, leading to situations
where the memory is leaked. The usage of bgp suppress-fib would
would be a case where the community is caused to be leaked.
Additionally the previous commit where an unsuppress-map is used
to modify the outgoing attribute but since unsuppress-map was
not considered part of outgoing policy the attribute would be dropped as
well. This pointer drop also extends to any dynamically allocated
memory saved by the attribute pointer that was not interned yet as well.
So let's modify the return case where the decision is made to
not send the prefix to the peer to always just flush the attribute
to ensure memory is not leaked.
Fixes: #15459 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Wed, 13 Mar 2024 14:26:58 +0000 (10:26 -0400)]
bgpd: Ensure that the correct aspath is free'd
Currently in subgroup_default_originate the attr.aspath
is set in bgp_attr_default_set, which hashs the aspath
and creates a refcount for it. If this is a withdraw
the subgroup_announce_check and bgp_adj_out_set_subgroup
is called which will intern the attribute. This will
cause the the attr.aspath to be set to a new value
finally at the bottom of the function it intentionally
uninterns the aspath which is not the one that was
created for this function. This reduces the other
aspath's refcount by 1 and if a clear bgp * is issued
fast enough the aspath for that will be removed
and the system will crash.
Donald Sharp [Tue, 12 Mar 2024 17:12:48 +0000 (13:12 -0400)]
eigrpd, mgmtd, ospf6d: frr_fini is last
I noticed that ospf6d always had a linked list memory leak.
Tracking it down shows that frr_fini() shuts down the memory
system and prints out memory not cleaned up. eigrpd, mgmtd
and ospf6d all called cleanup functions after frr_fini leaving
memory leaked that was not really leaked.
Igor Ryzhov [Sun, 10 Mar 2024 15:35:21 +0000 (17:35 +0200)]
lib: fix initialization of northbound nodes
When actions and notification are defined as descendants of other nodes,
they are not getting initialized, because the iterator skips them. Fix
the iterator to include them when traversing the schema.
This adds specific width length modifiers in the form of wN and wfN
(where N is 8, 16, 32, or 64) which allow printing intN_t and
int_fastN_t without resorting to casts or PRI macros.
FRR changes only include printf(), scanf/strtol are not locally
implemented in FRR. Also added "(void) 0" to empty "else ..." to
avoid a compiler warning.
Donatas Abraitis [Thu, 29 Feb 2024 12:37:40 +0000 (14:37 +0200)]
docker: Do not use pip Python package manager
Alpine Linux gets this with 3.19:
This is already installed with `pytest` via apk package manager.
```
15 78.20 error: externally-managed-environment
15 78.20
15 78.20 × This environment is externally managed
15 78.20 ╰─>
15 78.20 The system-wide python installation should be maintained using the system
15 78.20 package manager (apk) only.
15 78.20
15 78.20 If the package in question is not packaged already (and hence installable via
15 78.20 "apk add py3-somepackage"), please consider installing it inside a virtual
15 78.20 environment, e.g.:
15 78.20
15 78.20 python3 -m venv /path/to/venv
15 78.20 . /path/to/venv/bin/activate
15 78.20 pip install mypackage
15 78.20
15 78.20 To exit the virtual environment, run:
15 78.20
15 78.20 deactivate
15 78.20
15 78.20 The virtual environment is not deleted, and can be re-entered by re-sourcing
15 78.20 the activate file.
15 78.20
15 78.20 To automatically manage virtual environments, consider using pipx (from the
15 78.20 pipx package).
15 78.20
15 78.20 note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
```
Donatas Abraitis [Thu, 29 Feb 2024 12:21:27 +0000 (14:21 +0200)]
vtysh: Include fnctl.h for vtysh_main
Fixing compilation for Alpine Linux:
```
25 91.59 vtysh/vtysh_main.c: In function 'vtysh_flock_config':
25 91.59 vtysh/vtysh_main.c:276:20: warning: implicit declaration of function 'open'; did you mean 'popen'? [-Wimplicit-function-declaration]
25 91.59 276 | flock_fd = open(flock_file, O_RDONLY, 0644);
25 91.59 | ^~~~
25 91.59 | popen
25 91.60 vtysh/vtysh_main.c:276:37: error: 'O_RDONLY' undeclared (first use in this function)
25 91.60 276 | flock_fd = open(flock_file, O_RDONLY, 0644);
25 91.60 | ^~~~~~~~
25 91.60 vtysh/vtysh_main.c:276:37: note: each undeclared identifier is reported only once for each function it appears in
25 91.60 CC zebra/if_netlink.o
25 91.61 vtysh/vtysh_main.c: In function 'main':
25 91.61 vtysh/vtysh_main.c:637:49: error: 'O_CREAT' undeclared (first use in this function)
25 91.61 637 | fp = open(history_file, O_CREAT | O_EXCL,
25 91.61 | ^~~~~~~
25 91.62 vtysh/vtysh_main.c:637:59: error: 'O_EXCL' undeclared (first use in this function)
25 91.62 637 | fp = open(history_file, O_CREAT | O_EXCL,
25 91.62 | ^~~~~~
```
Donald Sharp [Fri, 8 Mar 2024 18:04:34 +0000 (18:04 +0000)]
pimd: Cleanup inclusion of headers
FRR needs to properly include the FreeBSD headers for
compilation on FreeBSD. I have setup v6 as well
but I have not even tested it. Since I know
that the form is the same I think this is ok
at the moment. This is a step forward.
Because of this change *clearly* no-one is even
using pim on FreeBSD. <look at the MRT_XXX values
to prove to yourself>. In any event this is a step
in the direction of getting that working again.
Signed-off-by: Donald Sharp <sharpd@freebsd.network>
Rajasekar Raja [Mon, 12 Feb 2024 18:44:18 +0000 (10:44 -0800)]
zebra: backpressure - Zebra push back on Buffer/Stream creation
Currently, the way zebra works is it creates pthread per client (BGP is
of interest in this case) and this thread loops itself in zserv_read()
to check for any incoming data. If there is one, then it reads,
validates and adds it in the ibuf_fifo signalling the main thread to
process the message. The main thread when it gets a change, processes
the message, and invokes the function pointer registered in the header
command. (Ex: zserv_handlers).
Finally, if all of this was successful, this task reschedules itself and
loops in zserv_read() again
However, if there are already items on ibuf FIFO, that means zebra is
slow in processing. And with the current mechanism if Zebra main is
busy, the ibuf FIFO keeps growing holding up the memory.
Fix:
Client IO Thread: (zserv_read)
- Stop doing the read events when we know there are X number of items
on the FIFO already.(X - zebra zapi-packets <1-10000> (Default-1000)
- Determine the number of items on the zserv->ibuf_fifo. Subtract this
from the work items and only pull the number of items off that would
take us to X items on the ibuf_fifo again.
- If the number of items in the ibuf_fifo has reached to the maximum
* Either initially when zserv_read() is called (or)
* when processing the remainders of the incoming buffer
the client IO thread is woken by the the zebra main.
Main thread: (zserv_process_message)
If the client ibuf always schedules a wakeup to the client IO to read
more items from the socked buffer. This way we ensure
- Client IO thread always tries to read the socket buffer and add more
items to the ibuf_fifo (until max limit)
- hidden config change (zebra zapi-packets <>) is taken into account
Donald Sharp [Tue, 5 Mar 2024 20:55:07 +0000 (15:55 -0500)]
tests: test invokes a script which does not exist
Apparently test_bgp_peer_type_multipath_relax.py does
no really need to run a `setup_vrfs` script. Looking
at the other configuration for this test there are
no vrf's in the frr configuration. So let's remove it
Igor Ryzhov [Fri, 1 Mar 2024 11:28:25 +0000 (13:28 +0200)]
vty: change output of errors from mgmtd
Make errors look the same way as in regular non-mgmtd vty. We don't need
to show information about some internal request names.
Before:
```
ERROR: SET_CONFIG request failed, Error: YANG error(s):
Path: Data location "/frr-affinity-map:lib/affinity-maps/affinity-map[name='a']".
Error: Unique data leaf(s) "value" not satisfied in "/frr-affinity-map:lib/affinity-maps/affinity-map[name='b']" and "/frr-affinity-map:lib/affinity-maps/affinity-map[name='a']".
```
After:
```
% Configuration failed.
YANG error(s):
Path: Data location "/frr-affinity-map:lib/affinity-maps/affinity-map[name='b']".
Error: Unique data leaf(s) "value" not satisfied in "/frr-affinity-map:lib/affinity-maps/affinity-map[name='a']" and "/frr-affinity-map:lib/affinity-maps/affinity-map[name='b']".
```
Donald Sharp [Tue, 5 Mar 2024 15:37:48 +0000 (10:37 -0500)]
bgpd: pi->attr is deref'ed in all paths leading up to test
In make_prefix, the code checks to see if the pi->attr
is non-NULL. Since (A) we cannot have a path_info without
an attribute and (B) all paths leading up to the test
in make_prefix already have pi->attr deref'ed and the
code is not crashing we know this is safe to remove.
Igor Ryzhov [Mon, 4 Mar 2024 18:41:41 +0000 (20:41 +0200)]
lib: fix infinite loop in __darr_in_vsprintf
`darr_avail` returns the available capacity excluding the already
existing terminating NULL byte. Take this into account when using
`darr_avail`. Otherwise, if the error length is a power of 2, the
capacity is never enough and the function stucks in an infinite loop.
Donald Sharp [Mon, 4 Mar 2024 23:07:42 +0000 (18:07 -0500)]
zebra: fnc->obuf could be accessed without a lock
Found by coverity. Let's just lock the writeable
amount to see if it is possible. It's ok because
we want to know if we have room *now*. If another
pthread runs it will only remove data from fnc->obuf
and make more room. So this is ok.
Donald Sharp [Wed, 21 Feb 2024 17:50:13 +0000 (12:50 -0500)]
debian: Add a frr-test-tools debian package
This package will hold test tools that are built and useful for
developers of FRR but not useful for everyday usage of FRR. This
is separted out because these are useful enough to have in their
own package.
Donald Sharp [Wed, 14 Feb 2024 17:45:16 +0000 (12:45 -0500)]
tests: Add a very simple test for the dplane_fpm_nl module
Ensure that the fpm module connects to the specified listener
and then ensure that 10k routes from sharpd are installed
into the system and then are removed.
Donald Sharp [Thu, 15 Feb 2024 15:19:36 +0000 (10:19 -0500)]
doc, tools: Remove ARRAY_SIZE check
checkpatch.pl wants you to use ARRAY_SIZE in a kernel
header file. We don't have access to this kernel header
file for normal compilation. I'm just going to remove it.
Igor Ryzhov [Mon, 5 Feb 2024 17:04:39 +0000 (19:04 +0200)]
lib: fix __darr_in_vsprintf
If the initial darr capacity is not enough for the output, the `ap` is
reused multiple times, which is wrong, because it may be altered by
`vsnprintf`. Make a copy of `ap` each time instead of reusing.