Christian Hopps [Thu, 8 Jun 2023 08:12:26 +0000 (04:12 -0400)]
tests: convert old pim test to more cleanly use pytest fixture
This is a good way to run a per-test background helper process. Here the
helper object is created before the test function requesting it (through param
name match), and then cleaned up after the test function exits (pass or failed).
A context manager is used to further guarantee the cleanup is done.
Christian Hopps [Thu, 8 Jun 2023 06:42:32 +0000 (02:42 -0400)]
tests: fixing pim6 topotest bugs
- Remove use of bespoke socat
- Use ipv6 support in mcast-tester.py
- do not run processes in the background behind munet/micronet's
back with `&` (ever) -- use popen or the helper class
pimd, pim6d: Move mld/igmp deletion code to a common api
Move the mld/igmp deletion common code to api pim_gm_interface_delete
code for IPv6 deletion(gm_ifp_teardown) for MLD was missing in this flow
Making the code common fixes this too.
Move pim_if_membership_clear api from pimd/pim_nb_config.c
to pimd/pim_iface.c
Also fixed curly braces warning
WARNING: braces {} are not necessary for single statement blocks
1773: FILE: /tmp/f1-127504/pim_iface.c:1773:
Christian Hopps [Tue, 6 Jun 2023 19:12:58 +0000 (15:12 -0400)]
mgmtd: assert an assertion for coverity
I believe coverity can't tell the length of the return value from strftime based
on the format string (like we can), so it allows `n` to be larger than it could
be which then allows `sz - n` to be negative which is size_t positive and very
large so it thinks an overrun is possible.
Chirag Shah [Tue, 6 Jun 2023 04:48:12 +0000 (21:48 -0700)]
tools: fix list value remove in frr-reload
There might be a time element(s) from
temporary list are removed more than once
which leads to valueError in certain python3
version.
commit-id 1543f58b5 did not handle valueError
properly. This caused regression where
prefix-list config leads to delete followed
by add.
The new fix should just pass the exception as
value removal from list_to_add or list_to_del
is best effort.
This allows prefix-list config has no change
then removes the lines from lines_to_del and
lines_to_add properly.
Configure prefix-list in frr.conf and perform
multiple frr-reload. After first reload operatoin
subsequent ones should not result in delete followed
by add of the prefix-list but rather no-op operation.
Christian Hopps [Sat, 27 May 2023 16:11:48 +0000 (12:11 -0400)]
tests: fix some broken logging
- make sure we close and remove all handlers for named logs on each reuse.
- test module level exec.log no longer truncated to last test case output
- cleanup the log names, and make sure they are present in all exec logs
- keep separate exec logs for each pytest worker when running in distributed mode
- disabled code due to CI infra can't handle it: add per test case exec logs
Donald Sharp [Fri, 2 Jun 2023 19:04:38 +0000 (15:04 -0400)]
bgpd: entry->any is never true
The only places entry->any could ever be set to true was
when str was NULL. Unfortunately with the way our CLI works
str is impossible to be NonNULL. The entry->any value *used*
to work prior to commit e961923c7217b935027107cad30c35c3907c936f
but it was changed back in 2016 and no-one has noticed the changed
ability.
Let's just admit that there are no users of this and remove this
dead code.
Problem:
In LHR, ipv6 pim state remains after MLD prune received.
Root Cause:
When LHR receives join, it creates (*,G) channel oil with
oil_ref_count = 2. The channel_oil is used by gm_sg sg->oil
and upstream->channel_oil.
When LHR receives prune, currently upstream->channel_oil is
deleted and gm_sg sg->oil still present. Due to this channel_oil
is still present with oil_ref_count = 1
Fix:
When LHR receives prune, upstream->channel_oil and pim_sg sg->oil
needs to be deleted.
Donald Sharp [Fri, 2 Jun 2023 15:02:54 +0000 (11:02 -0400)]
bgpd: Give more data when state machine fails to change state
When a state machine transition fails, bgpd would output
data about what happened, but not necessarily give the
reason why. Add that data to the output.
pimd: Change in PIM northbound error, when a path to RP is not found during config apply
Currently, in PIM Northbound, when a path to RP is not found during config apply, we are treating this as a NB_ERR_INCONSISTENCY.
However, there are two issues with this approach:
When OSPF or IGP convergence is completed, it is possible that the RPF check will succeed.
If we have multiple groups and RPs (e.g. 50 RPs), we will receive 50 logs with inconsistency errors.
example:
2023/05/27 22:57:45 PIM: [G822R-SBMNH] config-from-file# ip pim rp 192.168.100.1 239.100.0.0/28 2023/05/27 22:57:45
PIM: [VAKV3-NMY7B][EC 100663337] error processing configuration change: error [internal inconsistency] event [apply]
operation [create] xpath [/frr-routing:routing/control-plane-protocols/control-plane-protocol[type='frr-pim:pimd']
[name='pim'][vrf='default']/frr-pim:pim/address-family[address-family='frr-routing:ipv4']/frr-pim-rp:rp/static-rp/rp-list
[rp-address='192.168.100.1']/group-list[.='239.100.0.0/28']] message: No Path to RP address specified: 192.168.100.1
Donald Sharp [Thu, 1 Jun 2023 13:57:48 +0000 (09:57 -0400)]
tests: new mgmt_startup tests are failing due to insufficient time
The tests are failing due to heavily loaded system and insufficient
time for large configs to be handled. Increasing the time
allows the tests to complete locally for me under heavy load.
Louis Scalbert [Wed, 31 May 2023 14:53:58 +0000 (16:53 +0200)]
isisd: fix wrongly disabled flex-algo
A configured flex-algo algorithm may remain in disabled state after its
definition is advertised on the area.
It happens sometimes that, in isis_sr_flex_algo_topo1 topotest step 4 or
8, flex-algo 203 is disabled. It depends on the following sequence:
1. Flex-algo 203 is configured on a remote router to be re-advertised.
2. A LSP is received on the local router and contains the algo 203
definition.
3. The local router re-builds its own LSP with lsp_build().
4. local router isis_run_spf() recomputes the algo 203 SPF tree.
A 1. 2. 3. 4. sequence results in a working test. The reception of the
remote LSP (2.) does not trigger the built of the local LSP. If for
some reasons, the sequence is 1. 3. 4. 2. 4., isis_run_spf() will not
knows that flex-algo 203 has been re-enabled because
flex_algo_get_state() only returns the state from the local LSP.
Compare in sequence step 4. the flex-algo state from the local LSP with
the actual state. If the state is not the same, request a new local LSP
generation and quits the re-computation of algo SPF tree. The SPF tree
will be recomputed just after the built of the local LSP.
Fixes: 3f55b8c621 ("isisd: fix disabled flex-algo on race condition") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Donald Sharp [Wed, 31 May 2023 15:40:07 +0000 (11:40 -0400)]
zebra: Unlock the route node when sending route notifications
When using a context to send route notifications to upper
level protocols, the code was using a locking function to
get the route node. There is no need for this to be locked
as such FRR should free it up.
David Ward [Wed, 31 May 2023 20:44:44 +0000 (16:44 -0400)]
ospf6d: Prevent redundant LSA generation before interface goes down
Commit 76249532faad ("ospf6d: Handle Premature Aging of LSAs") added a
duplicate call to OSPF6_INTRA_PREFIX_LSA_EXECUTE_TRANSIT(), when the
interface state changes to "Down".
Fixes: #1738 Signed-off-by: David Ward <david.ward@ll.mit.edu>
Yuan Yuan [Tue, 30 May 2023 19:20:09 +0000 (19:20 +0000)]
lib: fix vtysh core when handling questionmark
When issue vtysh command with ?, the initial buf size for the
element is 16. Then it would loop through each element in the cmd
output vector. If the required size for printing out the next
element is larger than the current buf size, realloc the buf memory
by doubling the current buf size regardless of the actual size
that's needed. This would cause vtysh core when the doubled size
is not enough for the next element.
Donatas Abraitis [Wed, 31 May 2023 20:08:57 +0000 (23:08 +0300)]
doc: Update reference table for current and upcoming release dates
Keep only 3 release dates, current and two upcoming. On the next release,
just update one, instead of multiple (zero point looking too much in the
future).
Donatas Abraitis [Fri, 26 May 2023 11:52:45 +0000 (14:52 +0300)]
bgpd: Add an ability to control default-originate route-map timer
By default it's 5 seconds. That means, every 5 second it iterates over the
whole BGP table and checks if a route-map is kicked in (if route-map is defined).
Having a full feed with many of neighbors, this is a huge CPU-killer, and takes
a lot of time.
Yuan Yuan [Tue, 30 May 2023 18:53:32 +0000 (18:53 +0000)]
bgpd: fix bgpd core when unintern attr
When the remote peer is neither EBGP nor confed, aspath is the
shadow copy of attr->aspath in bgp_packet_attribute(). Striping
AS4_PATH should not be done on the aspath directly, since
that would lead to bgpd core dump when unintern the attr.