Christian Hopps [Fri, 9 Jun 2023 20:54:54 +0000 (16:54 -0400)]
lib: mgmtd: improvements in logging and commentary
- log names of datastores not numbers
- improve logging for mgmt_msg_read
- Rather than use a bool, instead store the pending const string name of
the command being run that has postponed the CLI. This adds some nice
information to the logging when enabled.
Donald Sharp [Wed, 7 Jun 2023 14:09:27 +0000 (10:09 -0400)]
bgpd: Add some color to why nexthop_set failed
We are seeing some frequent test failures with
setting the nexthop correctly. At this point
in time, I have no idea what is going wrong,
but I don't have a bunch of information either,
so let's add the local and remote values.
Christian Hopps [Mon, 12 Jun 2023 02:13:48 +0000 (22:13 -0400)]
lib: mgmtd: session create and destroy both short-circuit
For creation this is the first thing done so short-circuit just means inline
sync response. However, for destroy there could be commands in-flight, these
will be discarded when they match no session, and the state cleaned up
immediately when the message short-circuits.
Christian Hopps [Sun, 11 Jun 2023 21:53:10 +0000 (17:53 -0400)]
vtysh: stop reading config file if user `exit`s from root level.
This is required to make sure that we properly send the
XFRR_end_configuration tag to the daemons. Previously if the user had an
`exit` at the root level the parser would just drop out of the config
node and so XFRR_end_configuration, even if sent, would be ignored
Donald Sharp [Thu, 8 Jun 2023 16:03:49 +0000 (12:03 -0400)]
zebra: Prevent crash because nl is NULL on shutdown
When shutting down the main pthread was first closing
the sockets associated with the dplane pthread and
then telling it to shutdown the pthread at a later point
in time. This caused the dplane to crash because the nl
data has been freed already. Change the shutdown order
to stop the dplane pthread *and* then close the sockets.
Christian Hopps [Thu, 8 Jun 2023 08:12:26 +0000 (04:12 -0400)]
tests: convert old pim test to more cleanly use pytest fixture
This is a good way to run a per-test background helper process. Here the
helper object is created before the test function requesting it (through param
name match), and then cleaned up after the test function exits (pass or failed).
A context manager is used to further guarantee the cleanup is done.
Christian Hopps [Thu, 8 Jun 2023 06:42:32 +0000 (02:42 -0400)]
tests: fixing pim6 topotest bugs
- Remove use of bespoke socat
- Use ipv6 support in mcast-tester.py
- do not run processes in the background behind munet/micronet's
back with `&` (ever) -- use popen or the helper class
pimd, pim6d: Move mld/igmp deletion code to a common api
Move the mld/igmp deletion common code to api pim_gm_interface_delete
code for IPv6 deletion(gm_ifp_teardown) for MLD was missing in this flow
Making the code common fixes this too.
Move pim_if_membership_clear api from pimd/pim_nb_config.c
to pimd/pim_iface.c
Also fixed curly braces warning
WARNING: braces {} are not necessary for single statement blocks
1773: FILE: /tmp/f1-127504/pim_iface.c:1773:
Christian Hopps [Tue, 6 Jun 2023 19:12:58 +0000 (15:12 -0400)]
mgmtd: assert an assertion for coverity
I believe coverity can't tell the length of the return value from strftime based
on the format string (like we can), so it allows `n` to be larger than it could
be which then allows `sz - n` to be negative which is size_t positive and very
large so it thinks an overrun is possible.
Chirag Shah [Tue, 6 Jun 2023 04:48:12 +0000 (21:48 -0700)]
tools: fix list value remove in frr-reload
There might be a time element(s) from
temporary list are removed more than once
which leads to valueError in certain python3
version.
commit-id 1543f58b5 did not handle valueError
properly. This caused regression where
prefix-list config leads to delete followed
by add.
The new fix should just pass the exception as
value removal from list_to_add or list_to_del
is best effort.
This allows prefix-list config has no change
then removes the lines from lines_to_del and
lines_to_add properly.
Configure prefix-list in frr.conf and perform
multiple frr-reload. After first reload operatoin
subsequent ones should not result in delete followed
by add of the prefix-list but rather no-op operation.
Christian Hopps [Sat, 27 May 2023 16:11:48 +0000 (12:11 -0400)]
tests: fix some broken logging
- make sure we close and remove all handlers for named logs on each reuse.
- test module level exec.log no longer truncated to last test case output
- cleanup the log names, and make sure they are present in all exec logs
- keep separate exec logs for each pytest worker when running in distributed mode
- disabled code due to CI infra can't handle it: add per test case exec logs
Donald Sharp [Fri, 2 Jun 2023 19:04:38 +0000 (15:04 -0400)]
bgpd: entry->any is never true
The only places entry->any could ever be set to true was
when str was NULL. Unfortunately with the way our CLI works
str is impossible to be NonNULL. The entry->any value *used*
to work prior to commit e961923c7217b935027107cad30c35c3907c936f
but it was changed back in 2016 and no-one has noticed the changed
ability.
Let's just admit that there are no users of this and remove this
dead code.
Problem:
In LHR, ipv6 pim state remains after MLD prune received.
Root Cause:
When LHR receives join, it creates (*,G) channel oil with
oil_ref_count = 2. The channel_oil is used by gm_sg sg->oil
and upstream->channel_oil.
When LHR receives prune, currently upstream->channel_oil is
deleted and gm_sg sg->oil still present. Due to this channel_oil
is still present with oil_ref_count = 1
Fix:
When LHR receives prune, upstream->channel_oil and pim_sg sg->oil
needs to be deleted.
Donald Sharp [Fri, 2 Jun 2023 15:02:54 +0000 (11:02 -0400)]
bgpd: Give more data when state machine fails to change state
When a state machine transition fails, bgpd would output
data about what happened, but not necessarily give the
reason why. Add that data to the output.
pimd: Change in PIM northbound error, when a path to RP is not found during config apply
Currently, in PIM Northbound, when a path to RP is not found during config apply, we are treating this as a NB_ERR_INCONSISTENCY.
However, there are two issues with this approach:
When OSPF or IGP convergence is completed, it is possible that the RPF check will succeed.
If we have multiple groups and RPs (e.g. 50 RPs), we will receive 50 logs with inconsistency errors.
example:
2023/05/27 22:57:45 PIM: [G822R-SBMNH] config-from-file# ip pim rp 192.168.100.1 239.100.0.0/28 2023/05/27 22:57:45
PIM: [VAKV3-NMY7B][EC 100663337] error processing configuration change: error [internal inconsistency] event [apply]
operation [create] xpath [/frr-routing:routing/control-plane-protocols/control-plane-protocol[type='frr-pim:pimd']
[name='pim'][vrf='default']/frr-pim:pim/address-family[address-family='frr-routing:ipv4']/frr-pim-rp:rp/static-rp/rp-list
[rp-address='192.168.100.1']/group-list[.='239.100.0.0/28']] message: No Path to RP address specified: 192.168.100.1
Donald Sharp [Thu, 1 Jun 2023 13:57:48 +0000 (09:57 -0400)]
tests: new mgmt_startup tests are failing due to insufficient time
The tests are failing due to heavily loaded system and insufficient
time for large configs to be handled. Increasing the time
allows the tests to complete locally for me under heavy load.