David Lamparter [Thu, 11 Nov 2021 10:57:59 +0000 (11:57 +0100)]
build: work around AC_PROG_LEX deprecation warning
`AC_PROG_LEX without either yywrap or noyywrap is obsolete`, says
autoconf 2.70. Sadly, there is no transition window for this, in 2.69
the macro takes no arguments.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Igor Ryzhov [Sat, 6 Nov 2021 14:00:46 +0000 (17:00 +0300)]
ospfd: remove commands for broken GR helper mode
Issue #9983 explains what is wrong with the GR helper mode.
To unblock the CI that fails almost all the time on the ospf_gr_topo1
test, remove the commands and disable the test. Also add a reminder to
completely remove the helper mode if no one fixes the code in a month.
Igor Ryzhov [Wed, 10 Nov 2021 13:36:15 +0000 (16:36 +0300)]
bfdd: fix coverity warnings
show/clear DEFUNs always require either peer label or IP address to be
specified, so if `label` is NULL then `peer_str` is definitely not NULL.
But Coverity doesn't know about that, so it complains about possible
NULL dereference of `peer_str`. This commit should make Coverity happy.
Without fix:
running config:
ip msdp mesh-group foo1 member *
Frr reoad failure log:
2021-11-02 11:05:04,317 INFO: Loading Config object from vtysh show running
line 5: % Unknown command: ip msdp mesh-group foo1 member *
Traceback (most recent call last):
File "/usr/lib/frr/frr-reload.py", line 1950, in <module>
With fix:
--------
running config displays:
ip msdp mesh-group foo1 member 0.0.0.0
David Lamparter [Sun, 7 Nov 2021 14:49:17 +0000 (15:49 +0100)]
tests: allow common_cli.c with logging enabled
common_cli.c disables logging by default so stdio is usable as vty
without log messages getting strewn inbetween. This the right thing for
most tests, but not all; sometimes we do want log messages.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Sun, 7 Nov 2021 14:41:18 +0000 (15:41 +0100)]
lib: fix c-ares thread misuse
The `struct thread **ref` that the thread code takes is written to and
needs to stay valid over the lifetime of a thread. This does not hold
up if thread pointers are directly put in a `vector` since adding items
to a `vector` may reallocate the entire array. The thread code would
then write to a now-invalid `ref`, potentially corrupting entirely
unrelated data.
This should be extremely rare to trigger in practice since we only use
one c-ares channel, which will likely only ever use one fd, so the
vector is never resized. That said, c-ares using only one fd is just
plain fragile luck.
Either way, fix this by creating a resolver_fd tracking struct, and
clean up the code while we're at it.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donald Sharp [Sun, 7 Nov 2021 13:37:09 +0000 (08:37 -0500)]
tests: bfd_ospf_topo1 expects unreasonable convergence times under load
When our CI test system is under high load, expecting bfd to
converge in under 2 seconds is not going to happen. Modify the test
suites to just ensure that things reconvderge.
Donald Sharp [Sun, 7 Nov 2021 12:45:27 +0000 (07:45 -0500)]
tests: Remove debugs from topotests
Debugs take up a significant amount of cpu time as well as
increased disk space for storage of results. Reduce test
over head by removing the debugs, Hopefully this helps
alleviate some of the overloading that we are seeing in
our CI systems.
Donald Sharp [Fri, 5 Nov 2021 21:56:42 +0000 (17:56 -0400)]
ospf6d: Prevent crash in adj_ok
The adj_ok thread event is being added but not killed
when the underlying interface is deleted. I am seeing
this crash:
OSPF6: Received signal 11 at 1636142186 (si_addr 0x0, PC 0x561d7fc42285); aborting...
OSPF6: zlog_signal+0x18c 7f227e93519a7ffdae024590 /lib/libfrr.so.0 (mapped at 0x7f227e884000)
OSPF6: core_handler+0xe3 7f227e97305e7ffdae0246b0 /lib/libfrr.so.0 (mapped at 0x7f227e884000)
OSPF6: funlockfile+0x50 7f227e8631407ffdae024800 /lib/x86_64-linux-gnu/libpthread.so.0 (mapped at 0x7f227e84f000)
OSPF6: ---- signal ----
OSPF6: need_adjacency+0x10 561d7fc422857ffdae024db0 /usr/lib/frr/ospf6d (mapped at 0x561d7fbc6000)
OSPF6: adj_ok+0x180 561d7fc42f0b7ffdae024dc0 /usr/lib/frr/ospf6d (mapped at 0x561d7fbc6000)
OSPF6: thread_call+0xc2 7f227e989e327ffdae024e00 /lib/libfrr.so.0 (mapped at 0x7f227e884000)
OSPF6: frr_run+0x217 7f227e92a7f37ffdae024ec0 /lib/libfrr.so.0 (mapped at 0x7f227e884000)
OSPF6: main+0xf3 561d7fc0f5737ffdae024fd0 /usr/lib/frr/ospf6d (mapped at 0x561d7fbc6000)
OSPF6: __libc_start_main+0xea 7f227e6b0d0a7ffdae025010 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x7f227e68a000)
OSPF6: _start+0x2a 561d7fc0f06a7ffdae0250e0 /usr/lib/frr/ospf6d (mapped at 0x561d7fbc6000)
OSPF6: in thread adj_ok scheduled from ospf6d/ospf6_interface.c:678 dr_election()
The crash is in the on->ospf6_if pointer is NULL. The only way this could
happen from what I can tell is that the event is added to the system
and then we immediately delete the interface, removing the memory
but not freeing up the adj_ok thread event.
Donald Sharp [Thu, 4 Nov 2021 17:00:51 +0000 (13:00 -0400)]
ospf6d: Prevent use after free
I am seeing a crash of ospf6d with this stack trace:
OSPF6: Received signal 11 at 1636042827 (si_addr 0x0, PC 0x55efc2d09ec2); aborting...
OSPF6: zlog_signal+0x18c 7fe20c8ca19a7ffd08035590 /lib/libfrr.so.0 (mapped at 0x7fe20c819000)
OSPF6: core_handler+0xe3 7fe20c90805e7ffd080356b0 /lib/libfrr.so.0 (mapped at 0x7fe20c819000)
OSPF6: funlockfile+0x50 7fe20c7f81407ffd08035800 /lib/x86_64-linux-gnu/libpthread.so.0 (mapped at 0x7fe20c7e4000)
OSPF6: ---- signal ----
OSPF6: ospf6_neighbor_state_change+0xdc 55efc2d09ec27ffd08035d90 /usr/lib/frr/ospf6d (mapped at 0x55efc2c8e000)
OSPF6: exchange_done+0x15c 55efc2d0ab4a7ffd08035dc0 /usr/lib/frr/ospf6d (mapped at 0x55efc2c8e000)
OSPF6: thread_call+0xc2 7fe20c91ee327ffd08035df0 /lib/libfrr.so.0 (mapped at 0x7fe20c819000)
OSPF6: frr_run+0x217 7fe20c8bf7f37ffd08035eb0 /lib/libfrr.so.0 (mapped at 0x7fe20c819000)
OSPF6: main+0xf3 55efc2cd75737ffd08035fc0 /usr/lib/frr/ospf6d (mapped at 0x55efc2c8e000)
OSPF6: __libc_start_main+0xea 7fe20c645d0a7ffd08036000 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x7fe20c61f000)
OSPF6: _start+0x2a 55efc2cd706a7ffd080360d0 /usr/lib/frr/ospf6d (mapped at 0x55efc2c8e000)
OSPF6: in thread exchange_done scheduled from ospf6d/ospf6_message.c:2264 ospf6_dbdesc_send_newone()
The stack trace when decoded is:
(gdb) l *(ospf6_neighbor_state_change+0xdc)
0x7bec2 is in ospf6_neighbor_state_change (ospf6d/ospf6_neighbor.c:200).
warning: Source file is more recent than executable.
195 on->name, ospf6_neighbor_state_str[prev_state],
196 ospf6_neighbor_state_str[next_state],
197 ospf6_neighbor_event_string(event));
198 }
199
200 /* Optionally notify about adjacency changes */
201 if (CHECK_FLAG(on->ospf6_if->area->ospf6->config_flags,
202 OSPF6_LOG_ADJACENCY_CHANGES)
203 && (CHECK_FLAG(on->ospf6_if->area->ospf6->config_flags,
204 OSPF6_LOG_ADJACENCY_DETAIL)
OSPFv3 is creating the event without a managing thread and as such
if the event is not run before a deletion event comes in memory
will be freed up and we'll start trying to access memory we should
not. Modify ospfv3 to track the thread and appropriately stop
it when the memory is deleted or it is no longer need to run
that bit of code.
Donald Sharp [Fri, 5 Nov 2021 15:49:37 +0000 (11:49 -0400)]
tests: pim_basic needs to wait for event to happen under load
The test system under load looks for upstream state only
1 time immediately after sending 2 streams of S,G data
flowing. Give the system some time to process this
and ensure that it actually shows up in a small
amount of time.
Donald Sharp [Fri, 5 Nov 2021 15:13:12 +0000 (11:13 -0400)]
tests: Ensure ospf has reconverged before continuing
The test_ldp_pseudowires_after_link_down test
shuts a link down and was blindly waiting 5 seconds
before just assuming the test system was in a sane
state. Remove the sleep(5) and actually look for
the changed state for the route 2.2.2.2 that the
psueudowire actually depends on.
Igor Ryzhov [Thu, 4 Nov 2021 23:04:02 +0000 (02:04 +0300)]
zebra: don't register same hook multiple times
Before 42d4b30e, table_manager_enable was called only once and the hook
was also registered once. After the change, the hook is registered per
each VRF that is created in the system. This is wrong.
Donald Sharp [Thu, 4 Nov 2021 15:45:27 +0000 (11:45 -0400)]
tests: Fix route replace test in all_protocol_startup
The route replace test was doing this seq of events:
a) Create nhg
b) Install route w/ sharpd
c) Ensure it worked
d) Modify nhg
d) Ensure the update group replace worked
The problem is that the sharp code is doing this:
/* Only send via ID if nhgroup has been successfully installed */
if (nhgid && sharp_nhgroup_id_is_installed(nhgid)) {
SET_FLAG(api.message, ZAPI_MESSAGE_NHG);
api.nhgid = nhgid;
} else {
for (ALL_NEXTHOPS_PTR(nhg, nh)) {
api_nh = &api.nexthops[i];
zapi_nexthop_from_nexthop(api_nh, nh);
i++;
}
api.nexthop_num = i;
}
The created nhg has not been successfully installed( or at least
sharpd has not read the results yet) when it gets the command
to install the routes. As such it passes down the individual
nexthops instead. The route replace is never going to work.
Modify the code to add a bit of sleep to allow sharpd to
get notified when the system is under load. At this point
there is no way to query sharpd for whether or not it
thinks it's nhg is installed properly or not. This
test is failing all over the place for a bunch of people
let's get this fixed so people can get running
Donald Sharp [Thu, 4 Nov 2021 12:01:14 +0000 (08:01 -0400)]
zebra: Send up ifindex for redistribution when appropriate
Currently the NEXTHOP_TYPE_IPV4 and NEXTHOP_TYPE_IPV6 are
not sending up the resolved ifindex for the route. This
is causing upper level protocols that have something like
this:
route-map FOO permit 10
match interface swp13
!
router ospf
redistribute static
!
ip route 4.5.6.7/32 10.10.10.10
where 10.10.10.10 resolves to interface swp13. The route-map
will never match in this case.
Since FRR has the resolved nexthop interface, FRR might as
well send it up to be selected on by the upper level protocol
as needed.
Rafael Zalamena [Tue, 2 Nov 2021 21:54:23 +0000 (18:54 -0300)]
bgpd: fix BFD configuration update on TTL change
When altering the TTL of a eBGP peer also update the BFD
configuration. This was only working when the configuration happened
after the peer connection had been established.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Philippe Guibert [Thu, 28 Oct 2021 16:28:42 +0000 (18:28 +0200)]
zebra: update dataplane flowspec address family in ipset_info
It is needed for the ipset entry to know for which address family
this ipset entry applies to. Actually, the family is in the original
ipset structure and was not passed as attribute in the dataplane
ipset_info structure. Add it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 28 Oct 2021 11:42:57 +0000 (13:42 +0200)]
zebra: fix flowspec ipset operations
When injecting an ipset entry into the zebra dataplane context, the
ipset name is stored in a separate structure. This will permit the
flowspec plugin to be able to know which ipset has to be appended with
relevant ipset entry.
The problem was that the zebra dataplane objects related to ipset entries
is made up of an union between the ipset structure and the ipset info
structure. This was implying that the two structures were on the same
memory zone, and when extracting the data stored, the data were incomplete.
Fix this by replacing the union structure by a defined struct.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Wed, 27 Oct 2021 14:45:05 +0000 (16:45 +0200)]
ospf6d: avoid writing dumb ospf6 info at startup
in show: 'show ipv6 ospf6' handler command, the reason of SPF
executation is looked up and displayed. At startup, SPF has been
started, but shows no specific reason. Instead of dumping non
initialised string context, reset the string context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Igor Ryzhov [Tue, 2 Nov 2021 21:29:19 +0000 (00:29 +0300)]
lib: fix crash when terminating inactive VRFs
If the VRF is not enabled, if_terminate deletes the VRF after the last
interface is removed from it. Therefore daemons crash on the subsequent
call to vrf_delete. We should call vrf_delete only for enabled VRFs.
Igor Ryzhov [Tue, 2 Nov 2021 20:54:43 +0000 (23:54 +0300)]
zebra: fix stale pointer when netns is deleted
When the netns is deleted, we should always clear the vrf->ns_ctxt
pointer. Currently, it is not cleared when there are interfaces in the
netns at the time of deletion.
If the netns is re-created, zebra crashes because it tries to use the
stale pointer.
Donald Sharp [Mon, 1 Nov 2021 19:08:50 +0000 (15:08 -0400)]
tests: All_protocol_startup sporadic failure
the test_nexthop_groups function is failing occassionally
because the test executes 4 in succession sharp install
routes commands. When I dumped the rib on a failed test
run there were only 2 of the 4 routes in the rib and
the two that were in were the last 2 installed.
The sharp daemon setups a event process where it
installs routes `automatically`. If the previous
run is not finished entering a new command to install
the routes will mess up the last one from ever happening.
It is assumed that the user doesn't do stupid stuff here.
In this case I am just adding a small sleep between each
installation to just let the test proceed.
Donald Sharp [Fri, 29 Oct 2021 17:03:42 +0000 (13:03 -0400)]
lib: Return Null when we have an empty string for script name
The script entries were being stored in a hash lookup with
the script name a pre-defined array of characters. The hash
lookup is succeeding since it is auto-installed at script
start time irrelevant if there is a handler function.
Modify the code so that if the scriptname is an empty
string "\0" just return a NULL so that zebra does
not attempt to actually load up the script
Donald Sharp [Mon, 1 Nov 2021 00:08:29 +0000 (20:08 -0400)]
tests: isis_topo1 needs to wait for results under load
the isis_topo1 test has two functions where immediately
after the test ensures that the routes are in isis
tests to see if they are in the rib. Under system
load I am seeing this test failing because the
routes are still queued. Modify the zebra check
for the isis routes to look for the proper results
for 10 seconds.
Tested with GoBGP (Helper):
```
long-lived-graceful-restart: advertised and received
Local:
ipv4-unicast, restart time 100000 sec
Remote:
ipv4-unicast, restart time 10 sec, forward flag set
```
Igor Ryzhov [Sat, 30 Oct 2021 00:15:24 +0000 (03:15 +0300)]
isisd: fix circuit is-type configuration
Currently, we have a lot of checks in CLI and NB layer to prevent
incompatible IS-types of circuits and areas. All these checks become
completely meaningless when the interface is moved between VRFs. If the
area IS-type is different in the new VRF, previously done checks mean
nothing and we still end up with incorrect circuit IS type. To actually
prevent incorrect IS type, all checks must be done in the processing
code.
Igor Ryzhov [Fri, 29 Oct 2021 22:58:50 +0000 (01:58 +0300)]
isisd: remove useless checks when configuring ldp-sync
We have checks on NB validation stage to prevent configuring LDP sync on
interfaces in non-default VRFs. These checks are completely useless,
because the interface can be easily moved to another VRF after
configuring LDP sync. Instead, the check must be done in the actual code
to cover the case when the interface is moved between VRFs.
Igor Ryzhov [Fri, 29 Oct 2021 22:49:04 +0000 (01:49 +0300)]
isisd: remove useless checks when configuring passive interfaces
Currently, we have some checks in the CLI and NB layer to "protect" from
setting loopback interfaces into non-passive mode. These checks are not
correct, because we can not rely on operational data during config
reading and validation stage as this data doesn't exist yet. There's
nothing wrong in allowing "incorrect" configuration – it is already
correctly handled by the actual code.
Igor Ryzhov [Fri, 29 Oct 2021 22:33:01 +0000 (01:33 +0300)]
isisd: don't remove interface config when isis router is deleted
In previous releases, it was not possible to configure ISIS on an
interfaces without configuring the ISIS router first. Therefore, we had
to delete the ISIS config from all interfaces when the router config was
deleted. This is fixed since version 8.0 – interface and router configs
are completely separate and don't depend on each other, so now we can
remove this hack and preserve the interface config when the router
config is deleted.
Donald Sharp [Fri, 29 Oct 2021 14:21:28 +0000 (10:21 -0400)]
tests: Fix zebra_seg6_route to not always reinstall the same route
This code has two issues:
a) The loop to test for successful installation re-installs
the route every time it loops. A system under load will
have issues ensuring the route is installed and repeated
attempts does not help
b) The nexthop group installation was always failing
but never noticed (because of the previous commit)
and the test was always passing, when it should
have never passed.