Igor Ryzhov [Fri, 13 Aug 2021 23:09:54 +0000 (02:09 +0300)]
vtysh: fix duplicated output of key chain configuration
When both ripd and eigrpd run at the same time, all key configuration in
key chain node is duplicated. This change adds a concept of nested nodes
into vtysh to fix the issue.
Donald Sharp [Tue, 30 Nov 2021 00:33:48 +0000 (19:33 -0500)]
tests: Fix Daemon Killing to actually notice when a deamon dies
Lot's of the GR topotests kill daemons in order to test code
that deals with crashing daemons. Under heavy system load
it was noticed that a kill command was sent and if told to
wait we would sleep 2 seconds send another kill command and
call it good. This was causiing issues when subsuquent
json commands would get errors like `lost connection to daemon`
as the daemon finally shut down after some time due to load.
Modify the kill the daemon function to notice that the daemon
was not actually killed and if we need to wait wait some
more time for it too happen
Donald Sharp [Mon, 29 Nov 2021 20:51:45 +0000 (15:51 -0500)]
zebra: Prevent thread usage of data after it being freed
On startup we create a thread timer event to do a rib sweep
of the system. On shutdown we never stopped this timer and
as such we have a situation where a thread event could be run
on shutdown after the data for it has been freed. Here is the
crash I am seeing:
(gdb) bt
(gdb)
Save the thread data in zebra_router and stop the thread so we don't
accidently do work on shutdown we don't mean to. In this case
it happened in our topotests with some severe system load.
Essentially we happened to kill the zebra daemon just as the
graceful_restart timer popped here.
Donald Sharp [Mon, 29 Nov 2021 17:11:43 +0000 (12:11 -0500)]
tests: Allow interface statistics to be gathered with some delay
Currently under system load tests that use verify_pim_interface_traffic
immediately after a interface down/up event are not giving any time
for pim to receive and process the data from that event. Give
the test some time to gather this data.
Donald Sharp [Mon, 29 Nov 2021 13:37:21 +0000 (08:37 -0500)]
test: Fix addKernelRoute looking for positive results
Under heavy system load, we are sometimes seeing this
output for addKernelRoute:
2021-11-28 16:17:27,604 INFO: topolog: [DUT: b1]: Running command: [ip route add 224.0.0.13 dev b1-f1-eth0]
2021-11-28 16:17:27,604 DEBUG: topolog.b1: LinuxNamespace(b1): cmd_status("['/bin/bash', '-c', 'ip route add 224.0.0.13 dev b1-f1-eth0']", kwargs: {'encoding': 'utf-8', 'stdout': -1, 'stderr': -2, 'shell': False, 'stdin': None})
2021-11-28 16:17:27,967 DEBUG: topolog.b1: LinuxNamespace(b1): cmd_status("['/bin/bash', '-c', 'ip route']", kwargs: {'encoding': 'utf-8', 'stdout': -1, 'stderr': -2, 'shell': False, 'stdin': None})
2021-11-28 16:17:28,243 DEBUG: topolog: ip route
70.0.0.0/24 dev b1-f1-eth0 proto kernel scope link src 70.0.0.1 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This tells us that the ip route add succeeded but when looking for it
the system failed to immediately find it. Why is this happening?
Probably we are under heavy system load and the two different
commands, 'ip route add..' and 'ip route show' are being executed
on different cpu's and the data has not been copied to the different
cpu yet in the kernel. This is not necessarily something normally
seen but entirely possible. Giving the system a few extra seconds
for the kernel to execute/work the memory barrier system seems
prudent for long term success of our programming.
Donald Sharp [Sat, 27 Nov 2021 18:12:50 +0000 (13:12 -0500)]
tests: Fix isis_topo1_vrf to wait a tiny bit for zebra route install
During repeated runs I am seeing this test fail to run successfully.
Upon inspecting the output:
{
"prefix":"10.0.10.0/24",
"prefixLen":24,
"protocol":"isis",
"vrfId":6,
"vrfName":"r1-cust1",
"selected":true,
"destSelected":true,
"distance":115,
"metric":10,
"queued":true,
We can see that the route is still queued. Under heavy system
load and not ensuring that isis has time to send the route to
zebra and for zebra to install the route, this test can fail.
Igor Ryzhov [Thu, 25 Nov 2021 18:17:58 +0000 (21:17 +0300)]
ospfd: fix summary-address deletion
When the summary-address is deleted, `ospf_aggr_handle_external_info` is
called for each aggregated route for the cleanup. It needs to find the
corresponding OSPF instance and it does it using the `ei->instance`
which is totally wrong, because it's the instance from which the route
is redistributed, not the local OSPF instance. A pointer to the correct
OSPF instance is already stored in the external_info structure.
ckishimo [Mon, 8 Nov 2021 23:25:06 +0000 (00:25 +0100)]
ospf6d: check N-bit set in Hello packet
RFC 3101 states both E-bit and N-bit need to be checked when receiving a Hello packet.
"To support the NSSA option an additional check must be made in the function
that handles the receiving of the Hello packet to verify that both the N-bit
and the E-bit found in the Hello packet's option field match the area type and
ExternalRoutingCapability of the area of the receiving interface."
Donald Sharp [Tue, 23 Nov 2021 12:56:03 +0000 (07:56 -0500)]
tests: Do not put area under `router ospf6`
The interface area command is deprecated under
router ospf6 and should be on the individual interface.
Let's modify the tests to not actually put the
interface foo area 0.0.0.0 command under the
router node.
Donald Sharp [Tue, 23 Nov 2021 12:47:33 +0000 (07:47 -0500)]
tests: Add clear event to creation of router for v4 and v6 ospf
When using build_config_from_json there exists a timing
window where neighbors can come up before the router-id
is applied. As a precaution, quickly clear the neighbors
to ensure that we get neighbors with the expected router-id.
This can especially happen under high system load.
Donald Sharp [Tue, 23 Nov 2021 12:24:34 +0000 (07:24 -0500)]
tests: Move area configuration to interface for ospv3
The test_ospf_dual_stack test had area configuration
under the `router ospf6` nodes. This is getting
lots of warning messages from the cli. Let's remove
this.
Donald Sharp [Tue, 23 Nov 2021 12:06:05 +0000 (07:06 -0500)]
tests: Don't double create ospfv3 config
When testers use the build_config_from_json function
the create_router_ospf function is double creating
the ospfv3 cli to be passed in. This is because
the create_router_ospf loops over both v2 and v3
and then create_router_ospf6 re-adds v3.
Donald Sharp [Wed, 24 Nov 2021 01:16:29 +0000 (20:16 -0500)]
tests: Do not pick an ip address that overlaps with ourselves
The ospf_basic_functionality/test_ospf_lan.py creates
a ethernet segment and attaches 4 routers to it and
assigns ip addresses in a /24. As one of the tests
it picks a new address for r0 which coincides with
a ip address on r3. Then the test immediatly
checks for other data. The problem is of course
that if a test is `slow` enough hello's will
start to be ignored from r3 to r0 and the
neighbor relationships will come down. Choose
an ip address that doesn't cause this issue.
Donald Sharp [Wed, 24 Nov 2021 01:19:05 +0000 (20:19 -0500)]
tests: Add aggressive timers to the new route server client test
The new bgp_route_server_client test is not setting the
timers for peers to be fast enough to have the ability
to converge in under 60 seconds if a packet is dropped/missed
at startup. Make the test have the ability to converge
under load
Donald Sharp [Wed, 24 Nov 2021 00:46:16 +0000 (19:46 -0500)]
ospf6d: Remove ospf6->external_id_table
The external_id_table was only ever used to store pointers to data
and was never used for lookup during the course of normal operations.
However it did lead to crashes because somewhere along the way
external routes stored in the external_table never had their
id associated into the external_id_table and we would assert
on the node lookup failing.
Since this code was never used for anything other than
storing data and it was never retrieved for anything useful
let's just remove it from ospf6d.