Donald Sharp [Thu, 28 Oct 2021 14:42:23 +0000 (10:42 -0400)]
zebra: return void for dplane_ctx_get_pbr_ipset_entry
The dplane_ctx_get_pbr_ipset_entry function only
failed when the caller did not pass in a valid
usable pointer. Change the code to assert on
a pointer not being passed in and remove the
bool return
Donald Sharp [Thu, 28 Oct 2021 14:35:51 +0000 (10:35 -0400)]
zebra: return void for dplane_ctx_get_pbr_iptable
The only time this function ever failed is when
the developer does not pass in a usable pointer
to place the data in. Change it to an assert
to signify to the end developer that is what
we want and then remove all the if checks
for failure
Donald Sharp [Thu, 28 Oct 2021 13:15:44 +0000 (09:15 -0400)]
zebra: dplane_ctx_get_pbr_ipset should return void
The function call dplane_ctx_get_pbr_ipset only
returns false when the calling function fails to
pass in a valid ipset pointer. This should
be an assertion issue since it's a programming
issue as opposed to an actual run time issue.
Change the function call parameter to not return
a bool on success/fail for a compile time decision.
Igor Ryzhov [Mon, 15 Nov 2021 21:00:00 +0000 (00:00 +0300)]
zebra: fix memleak on shutdown
During shutdown, when table_manager_disable is called for the default
VRF, its vrf_id is already set to VRF_UNKNOWN, so the expression is true
and the table manager memory is not freed. Change the expression to
compare the VRF name instead of the id. The check in table_manager_enable
is changed for consistency.
Olivier Dugeon [Mon, 15 Nov 2021 17:19:35 +0000 (18:19 +0100)]
ospfd: Fix wrong parsing of TE subTLV
Function ospf_te_parse_te() and ospf_te_delete_te() browse TE TLV but also
subTLV. The loop that parse the subTLV check that cummulative read data doesn't
exceed the total size of the TLV. However, the sum variable that counts the
number of read data was wrongly intialize to 0 instead to 4 (i.e. the initial
TLV Header size that is located at the TOP of subTLV).
This patch adjust accordingly the initial value of the counter.
Quentin Young [Fri, 12 Nov 2021 19:45:36 +0000 (14:45 -0500)]
doc: update & clarify language in process arch doc
There was a historical blurb at the top of the process architecture
document that in several instances caused some confusion regarding
whether or not FRR supports multithreading. Remove this paragraph and
replace it with a summary of the page contents.
Donald Sharp [Fri, 12 Nov 2021 15:52:03 +0000 (10:52 -0500)]
tests: Ensure BGP has had time to import routes through the vpn
Currently I get bgp_instance_del-test as well as bgp_l3vpn_to_bgp_vrf
failures every ~3-4 runs when under a 40 parallel run with micronet.
Examination of the failure and passing cases always leads to the
failures showing convergence of bgp bestpath immediately after
the show commands to ensure that the routes are there.
Modify the code to look for the fact that the vrf has
converged from routes being passed around across vrf's
and ensure that bestpath has run on them.
Igor Ryzhov [Fri, 12 Nov 2021 16:32:06 +0000 (19:32 +0300)]
bgpd: fix source-address for BFD sessions when using update-source IFNAME
When "update-source IFNAME" is used for the neighbor, p->update_source
is set to NULL, so we can't use it as a source address and should use
the address from p->su_local.
Donald Sharp [Thu, 11 Nov 2021 18:25:35 +0000 (13:25 -0500)]
ospfd: Prevent use after free on shutdown
Running ospf_topo_vrf1 leads us to this valgrind issue:
==2386518== Invalid read of size 8
==2386518== at 0x4971520: route_top (table.c:401)
==2386518== by 0x181F08: ospf_interface_bfd_apply (ospf_bfd.c:126)
==2386518== by 0x182069: ospf_interface_disable_bfd (ospf_bfd.c:158)
==2386518== by 0x18BF51: ospf_del_if_params (ospf_interface.c:557)
==2386518== by 0x18C584: ospf_if_delete_hook (ospf_interface.c:712)
==2386518== by 0x490CA0B: hook_call_if_del (if.c:61)
==2386518== by 0x490D1F3: if_delete_retain (if.c:286)
==2386518== by 0x490D337: if_delete (if.c:309)
==2386518== by 0x490CDED: if_destroy_via_zapi (if.c:200)
==2386518== by 0x49940A9: zclient_interface_delete (zclient.c:2237)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518== Address 0x5df39a0 is 0 bytes inside a block of size 56 free'd
==2386518== at 0x48399AB: free (vg_replace_malloc.c:538)
==2386518== by 0x492A03E: qfree (memory.c:141)
==2386518== by 0x4970C6F: route_table_free (table.c:141)
==2386518== by 0x4970A36: route_table_finish (table.c:61)
==2386518== by 0x18C543: ospf_if_delete_hook (ospf_interface.c:708)
==2386518== by 0x490CA0B: hook_call_if_del (if.c:61)
==2386518== by 0x490D1F3: if_delete_retain (if.c:286)
==2386518== by 0x490D337: if_delete (if.c:309)
==2386518== by 0x490CDED: if_destroy_via_zapi (if.c:200)
==2386518== by 0x49940A9: zclient_interface_delete (zclient.c:2237)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518== Block was alloc'd at
==2386518== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==2386518== by 0x4929EFC: qcalloc (memory.c:116)
==2386518== by 0x49709F8: route_table_init_with_delegate (table.c:53)
==2386518== by 0x49717F4: route_table_init (table.c:528)
==2386518== by 0x18C328: ospf_if_new_hook (ospf_interface.c:659)
==2386518== by 0x490C97D: hook_call_if_add (if.c:60)
==2386518== by 0x490CE85: if_create_name (if.c:223)
==2386518== by 0x490DF32: if_get_by_name (if.c:622)
==2386518== by 0x4993F73: zclient_interface_add (zclient.c:2186)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518==
Fix the ordering to do the individual node tree cleanup after we delete
the data we care about.
Donald Sharp [Thu, 11 Nov 2021 14:39:52 +0000 (09:39 -0500)]
*: use compiler.h MIN/MAX macros instead of everyone having one
We had various forms of min/max macros across multiple daemons
all of which duplicated what we have in compiler.h. Convert
everyone to use the `correct` ones
Table manager is allocated when the VRF is created, but it is freed when
the VRF is disabled. When this VRF is re-enabled, zebra ends up with
table manager being NULL pointer and it crashes on any dereference.
Table manager should be freed when the VRF is deleted, not when it's
disabled.
Igor Ryzhov [Wed, 3 Nov 2021 09:40:40 +0000 (12:40 +0300)]
isisd: use time_t for last update and last flap
These variables are only assigned with time() which returns time_t.
This should also fix occasional CI build failures because of comparisons
of signed and unsigned integers.
Igor Ryzhov [Fri, 5 Nov 2021 22:22:07 +0000 (01:22 +0300)]
lib: fix vrf deletion when the last interface is deleted
Currently, we automatically delete an inactive VRF when its last
interface is deleted. This code introduces a couple of crashes because
of the following problems:
- vrf_delete is called before calling if_del hook, so daemons may try to
dereference an ifp->vrf pointer which is freed
- in if_terminate, we continue to use the VRF in the loop condition
after the last interface is deleted
This check is needed only when the interface is deleted by the user,
because if the interface is deleted by the system, VRF must still exist
in the system. Move the check to appropriate places to fix crashes.
Igor Ryzhov [Sat, 6 Nov 2021 00:36:48 +0000 (03:36 +0300)]
zebra: fix netns deletion
We don't receive interface down/delete notifications from kernel when a
netns is deleted. Therefore we have to manually replicate the necessary
actions, otherwise interfaces are kept in the system with stale pointers
to the deleted netns.
Igor Ryzhov [Thu, 4 Nov 2021 12:11:18 +0000 (15:11 +0300)]
lib: remove confusing bfd TTL API
There are two APIs to control the expected number of hops for a BFD
session – `bfd_sess_set_mininum_ttl` and `bfd_sess_set_hop_count`.
The former is very confusing, as it takes an expected TTL in the
BFD packet which is actually a protocol internal value. The latter is
simple and straightforward – it takes an expected number of hops, which
is always 1 for single-hop and >1 for multi-hop.
As the former API is not used anywhere, just remove it to avoid any
confusion.
David Lamparter [Thu, 11 Nov 2021 10:57:59 +0000 (11:57 +0100)]
build: work around AC_PROG_LEX deprecation warning
`AC_PROG_LEX without either yywrap or noyywrap is obsolete`, says
autoconf 2.70. Sadly, there is no transition window for this, in 2.69
the macro takes no arguments.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Igor Ryzhov [Sat, 6 Nov 2021 14:00:46 +0000 (17:00 +0300)]
ospfd: remove commands for broken GR helper mode
Issue #9983 explains what is wrong with the GR helper mode.
To unblock the CI that fails almost all the time on the ospf_gr_topo1
test, remove the commands and disable the test. Also add a reminder to
completely remove the helper mode if no one fixes the code in a month.
David Lamparter [Wed, 10 Nov 2021 11:16:40 +0000 (12:16 +0100)]
doc: stick `libunwind` into build docs
It's strictly optional, but… the backtraces are really much better.
Specifically, `libunwind` is notably more capable in figuring out
function names compared to glibc/libexecinfo `backtrace_symbols()`.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Igor Ryzhov [Wed, 10 Nov 2021 13:36:15 +0000 (16:36 +0300)]
bfdd: fix coverity warnings
show/clear DEFUNs always require either peer label or IP address to be
specified, so if `label` is NULL then `peer_str` is definitely not NULL.
But Coverity doesn't know about that, so it complains about possible
NULL dereference of `peer_str`. This commit should make Coverity happy.
rgirada [Tue, 9 Nov 2021 12:21:06 +0000 (04:21 -0800)]
ospfd: fixing few coverity issues in ospf_vty.c
Description:
timerval datastructure is being used without initialization.
Using these uninitialized parameters can lead unexpected results
so initializing before using it.
Chirag Shah [Wed, 3 Nov 2021 00:57:07 +0000 (17:57 -0700)]
zebra: svi down remove l2vni from l3vni list
Problem:
L2-VNI SVI down followed by L2-VNI's vxlan device
deletion leads to stale entry into L3VNI's
L2-VNI list.
Solution:
When L2-VNI associated SVI is down, default vrf
is the new tenant vrf.
Remove L2-VNI from L3VNI's l2vni list as
L3VNI/VRF is no longer valid in absence of associated
SVI.
When SVI is up re-add L2-VNI into associated VRF's
L3VNI.
The above remove/add from the L3VNI's L2VNI list is
already done when vxlan or L2-VNI is flaped, just need
to handle when SVI is flapped.
Without fix:
running config:
ip msdp mesh-group foo1 member *
Frr reoad failure log:
2021-11-02 11:05:04,317 INFO: Loading Config object from vtysh show running
line 5: % Unknown command: ip msdp mesh-group foo1 member *
Traceback (most recent call last):
File "/usr/lib/frr/frr-reload.py", line 1950, in <module>
With fix:
--------
running config displays:
ip msdp mesh-group foo1 member 0.0.0.0