Philippe Guibert [Thu, 28 Oct 2021 16:28:42 +0000 (18:28 +0200)]
zebra: update dataplane flowspec address family in ipset_info
It is needed for the ipset entry to know for which address family
this ipset entry applies to. Actually, the family is in the original
ipset structure and was not passed as attribute in the dataplane
ipset_info structure. Add it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 28 Oct 2021 11:42:57 +0000 (13:42 +0200)]
zebra: fix flowspec ipset operations
When injecting an ipset entry into the zebra dataplane context, the
ipset name is stored in a separate structure. This will permit the
flowspec plugin to be able to know which ipset has to be appended with
relevant ipset entry.
The problem was that the zebra dataplane objects related to ipset entries
is made up of an union between the ipset structure and the ipset info
structure. This was implying that the two structures were on the same
memory zone, and when extracting the data stored, the data were incomplete.
Fix this by replacing the union structure by a defined struct.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Wed, 27 Oct 2021 14:45:05 +0000 (16:45 +0200)]
ospf6d: avoid writing dumb ospf6 info at startup
in show: 'show ipv6 ospf6' handler command, the reason of SPF
executation is looked up and displayed. At startup, SPF has been
started, but shows no specific reason. Instead of dumping non
initialised string context, reset the string context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Igor Ryzhov [Tue, 2 Nov 2021 21:29:19 +0000 (00:29 +0300)]
lib: fix crash when terminating inactive VRFs
If the VRF is not enabled, if_terminate deletes the VRF after the last
interface is removed from it. Therefore daemons crash on the subsequent
call to vrf_delete. We should call vrf_delete only for enabled VRFs.
Igor Ryzhov [Tue, 2 Nov 2021 20:54:43 +0000 (23:54 +0300)]
zebra: fix stale pointer when netns is deleted
When the netns is deleted, we should always clear the vrf->ns_ctxt
pointer. Currently, it is not cleared when there are interfaces in the
netns at the time of deletion.
If the netns is re-created, zebra crashes because it tries to use the
stale pointer.
Donald Sharp [Mon, 1 Nov 2021 19:08:50 +0000 (15:08 -0400)]
tests: All_protocol_startup sporadic failure
the test_nexthop_groups function is failing occassionally
because the test executes 4 in succession sharp install
routes commands. When I dumped the rib on a failed test
run there were only 2 of the 4 routes in the rib and
the two that were in were the last 2 installed.
The sharp daemon setups a event process where it
installs routes `automatically`. If the previous
run is not finished entering a new command to install
the routes will mess up the last one from ever happening.
It is assumed that the user doesn't do stupid stuff here.
In this case I am just adding a small sleep between each
installation to just let the test proceed.
Donald Sharp [Fri, 29 Oct 2021 17:03:42 +0000 (13:03 -0400)]
lib: Return Null when we have an empty string for script name
The script entries were being stored in a hash lookup with
the script name a pre-defined array of characters. The hash
lookup is succeeding since it is auto-installed at script
start time irrelevant if there is a handler function.
Modify the code so that if the scriptname is an empty
string "\0" just return a NULL so that zebra does
not attempt to actually load up the script
Donald Sharp [Mon, 1 Nov 2021 00:08:29 +0000 (20:08 -0400)]
tests: isis_topo1 needs to wait for results under load
the isis_topo1 test has two functions where immediately
after the test ensures that the routes are in isis
tests to see if they are in the rib. Under system
load I am seeing this test failing because the
routes are still queued. Modify the zebra check
for the isis routes to look for the proper results
for 10 seconds.
Donald Sharp [Fri, 29 Oct 2021 14:21:28 +0000 (10:21 -0400)]
tests: Fix zebra_seg6_route to not always reinstall the same route
This code has two issues:
a) The loop to test for successful installation re-installs
the route every time it loops. A system under load will
have issues ensuring the route is installed and repeated
attempts does not help
b) The nexthop group installation was always failing
but never noticed (because of the previous commit)
and the test was always passing, when it should
have never passed.
Donald Sharp [Fri, 29 Oct 2021 12:47:05 +0000 (08:47 -0400)]
tests: zebra_seg6local has a race condition
The test is checking installing of seg6 routes by this
loop:
for up to 5 times:
sharp install seg6 route
show ip route and is it installed
The problem is that if the system is under heavy
load the installation may not have happened yet
and by immediately reinstalling the same route
the same thing could happen again.
Modify the code to pull the route installation
outside of the loop and to increase to 10 attempts
in case there is very heavy system load.
Olivier Dugeon [Mon, 25 Oct 2021 09:52:19 +0000 (11:52 +0200)]
lib: Fix comparison function in link_state.c
ls_node_same, ls_attributes_same and ls_prefix_same are not producing expected
result due to a wrong usage of memcmp. In addition, if respective structures
are not initialized with 0, there is a risk that the comparison failed.
This patch correct usage of memcmp and expand comparison to each invidual
parameters of the respective structure for safer result.
Donald Sharp [Thu, 28 Oct 2021 19:51:46 +0000 (15:51 -0400)]
tests: Fix `check_ping` function in test_bgp_srv6l3vpn_to_bgp_vrf.py
The check_ping function `_check` function was asserting and being
passed to the topotests.run_and_expect() functionality causing
it to not run the full range of pings if one failed the test.
So effectively it was properly detecting pass / failure but
only allowing for 1 iteration if it was going to fail.
Modify the code to not assert and act like all the other
run_and_expect functionality.
Donald Sharp [Thu, 28 Oct 2021 12:10:28 +0000 (08:10 -0400)]
zebra: Fix netlink RTM_NEWNEXTHOP parsing for nested attributes
With the addition of resillient hashing for nexthops, the
parsing of nexthops requires telling the decoder functions
that there may be nested attributes. This was found by
code inspection of iproute2/ipnexthop.c when trying to
understand resillient hashing as well as statistics
gathering for nexthops that are / will be in upstream
kernels in the near future.
The default vrf name is obtained by zebra daemon. While isis is not
connected to zebra, i.e. at startup, when loading a startup configuration,
the macro VRF_DEFAULT_NAME is used and returns 'default'.
But because zebra connected and forces to a new default vrf name, the
configuration is not seen as the default one, and further attempts to
configure the isis instance via 'router isis 1' will trigger creation
of an other instance.
To handle this situation, at vrf_enable() event, which is called for
each default vrf name change, the associated isis instance is updated
with th new vrf name. The same is done for NB yang path.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Donatas Abraitis [Tue, 26 Oct 2021 07:37:22 +0000 (10:37 +0300)]
bgpd: Unify multiprotocol capability for OPEN logging
Before:
```
192.168.10.17 OPEN has MultiProtocol Extensions capability (1), length 4
192.168.10.17 OPEN has MP_EXT CAP for afi/safi: IPv4/unicast
```
After:
```
192.168.10.17 OPEN has MultiProtocol Extensions capability (1), length 4
192.168.10.17 OPEN has MultiProtocol Extensions capability for afi/safi: IPv4/unicast
```
Hiroki Shirokura [Mon, 25 Oct 2021 23:36:14 +0000 (23:36 +0000)]
lib: fix srv6 route hardcode with BGP
zclient_send_localsid is called by various routing protocol daemons. To set the
srv6 endpoint function. Fix a hard-coded error in the initial implementation.
Before this PR, the srv6 function will be registered to zebra as a BGP route
even if isisd executes zclient_send_localsid.
when gre information could not be retrieved because GRE interface has
been deleted, a GRE_UPDATE message may be sent to NHRP. In that case,
the gre values are reset. There was a missing tunnel destination value,
which has been omitted.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
So the 2.2.2.2 routes on r1,3 and 4 are received via ospf, but are
modified by some other process to add labels ( probably ldp, since
it is running too ). The 2nd ping to 2.2.2.2 is failing because
the 2.2.2.2 route on r4 is being replaced. As an example here
is `ip monitor all` on r4 during boot up. Please note timestamps
are not necessarily representative of what we will see on the
loaded ci system.
[2021-10-15T15:46:52.261456] [NEXTHOP]id 27 via 10.0.2.2 dev r4-eth0 scope link proto zebra
[2021-10-15T15:46:52.261490] [ROUTE]2.2.2.2 nhid 27 via 10.0.2.2 dev r4-eth0 proto ospf metric 20
<snip>
[2021-10-15T15:46:53.556405] [NEXTHOP]Deleted id 27 via 10.0.2.2 dev r4-eth0 scope link proto zebra
<snip>
[2021-10-15T15:46:53.566575] [NEXTHOP]id 32 via 10.0.2.2 dev r4-eth0 scope link proto zebra
[2021-10-15T15:46:53.566585] [ROUTE]2.2.2.2 nhid 32 via 10.0.2.2 dev r4-eth0 proto ospf metric 20
For a small amount of time the route was *gone*. I believe the upstream
CI system hits that window sometimes, causing the test to fail.
This patch attempts to ensure that the 2.2.2.2 route should be learned
appropriately ( thus slowing it down ) before the test moves onto
the ping. I suspect the long term answer might be to add a test to
the scripts/adjancies.py script to ensure that the test does not
continue until the appropriate label is in place, but I want to
make the test run a bit more perscriptive in what it is looking
for here.
Donald Sharp [Mon, 25 Oct 2021 13:03:48 +0000 (09:03 -0400)]
doc: Remove reference to non-existent command
The `neighbor <peer> version <X>` command does not
exist. I am unable to find it going back to version
2.0 of FRR. So this command has been not in the system
for a very long time.
In any event bgp already supports version 4 of bgp and
it auto-negotiates this.
Igor Ryzhov [Wed, 13 Oct 2021 15:30:06 +0000 (18:30 +0300)]
*: fix interface config write in NB-converted daemons
When writing the config from the NB-converted daemon, we must not rely
on the operational data. This commit changes the output of the interface
configuration to use only config data. As the code is the same for all
daemons, move it to the lib and remove all the duplicated code.
Donald Sharp [Fri, 22 Oct 2021 19:27:50 +0000 (15:27 -0400)]
tests: bfd_isis_topo1 expects unreasonable convergence times under load
When our ci test system is under high load, expecting bfd to converge
in under 2 seconds is not going to happen. Modify the test suites
to just ensure that things converge. If we need actual functional
testing of bfd response times the topotests are not an appropriate place
to do this or we need to modify the test system to gather the data for
how long it takes after the tests are run.
Donald Sharp [Fri, 22 Oct 2021 18:20:16 +0000 (14:20 -0400)]
tests: Fix bgp_ecmp_topo3 to look for a bit more state
During a local CI run, bgp_ecmp_topo3 was failing
to properly notice the fast-convergence command
issued before the interface is shut down. As
such there exists a race condition where under
high load the zebra process can actually shut
an interface down before we have properly ensured
that fast convergence is on for ibgp.
Modify the test for in two ways:
1) Ensure that previous section makes sure
that we have properly converged for when we
bring back up the interfaces instead of
assuming that we have done so.
2) After issuing the fast-convergence command.
Ensure that bgp has fully processed it and is
ready to receive the interface down events
as triggers for shutting down the ibgp session.
Donald Sharp [Fri, 22 Oct 2021 18:18:33 +0000 (14:18 -0400)]
tests: Make test_ldp_topo1.py aware of how many neighbors it needs
On a local CI run. The test_ldp_topo1.py showed fail to converge
on r3. r3 has 2 neighbors but only 1 was up when we got to
further steps in the test suites.
Modify the neighbor checking to `know` how many neighbors
should be operational and continue looking for them until
they are up and running.
Donald Sharp [Thu, 21 Oct 2021 14:21:40 +0000 (10:21 -0400)]
tests: all_protocol_startup needs some tweaks to allow for processing
The nexthop group code is installing routes and nexthop groups
and immediately expecting zebra to have processed the results
as a result there is a situation when the CI system is under
intense load that the nexthop group might not have been processed.
Add a bit of code to allow the test to give FRR some time
to finish work before declaring it not working.
David Lamparter [Thu, 21 Oct 2021 15:54:42 +0000 (17:54 +0200)]
zebra: set SELECTED before going into dplane code
There is a bit of an impedance mismatch in the sequence of events here.
Depending on the dplane behavior, the `ROUTE_ENTRY_SELECTED` bit will be
inconsistent for rib_process_result().
With an asynchronous dataplane:
0. rib_process() is called
1. rib_install_kernel() is called, dplane action is queued
2. rib_install_kernel() returns
3. rib_process() sets the SELECTED bit appropriately, returns
4. dplane is done, triggers rib_process_result()
5. SELECTED bit is seen in "after" state
(5a. NHT code looks at the SELECTED bit, works correctly.)
With a synchronous dataplane:
0. rib_process() is called
1. rib_install_kernel() is called, dplane action is executed
2. dplane (should) trigger rib_process_result()
3. SELECTED bit is seen in "before" state
(3a. NHT code looks at the SELECTED bit, fails.)
4. rib_install_kernel() returns
5. rib_process() sets the SELECTED bit appropriately, too late.
Essentially, poking the dataplane is a sequencing point where control is
handed over to the dplane. Control may or may not return immediately.
Doing /anything/ after triggering the dataplane is a recipe for odd race
conditions.
(FWIW, I'm not sure rib_process_result() is called correctly in the
synchronous case, but that's a separate problem.)
Unfortunately, this change might have some unforeseen side effects. I
haven't dug through the code to see if anything breaks. There
/shouldn't/ be anything looking at the SELECTED bit here, but who knows.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>