Louis Scalbert [Fri, 1 Jul 2022 10:11:03 +0000 (12:11 +0200)]
topostest: bgp_conditional_advertisement cleanup
The bgp_conditional_advertisement topotest runs all the test cases in
the same function. It is not easy to debug it because the pytest
"--pause" argument does not make breaks between test cases.
Dispatch the test-cases into functions to benefit from the "--pause"
feature.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Donald Sharp [Thu, 30 Jun 2022 13:03:12 +0000 (09:03 -0400)]
zebra: Add more cases to proto2zebra for understanding kernel routes
Just some missing ones. Make zebra stop complaining, was getting
some messages from proto2zebra when doing testing, let's clean
that up from happening.
Donald Sharp [Thu, 30 Jun 2022 14:04:26 +0000 (10:04 -0400)]
zebra: Notice to end operator when a failure happens
When reading a multipath route and we detect an encoding
error from the kernel( yeah I don't think so either ),
let's tell the operator what happened to that route.
Donald Sharp [Thu, 30 Jun 2022 12:03:02 +0000 (08:03 -0400)]
zebra: Realign SOL_NETLINK to warn when FRR does not have it
There exists a possibility that an end operator has choosen
to compile FRR on an extremely old KERNEL that does not support
the SOL_NETLINK sockopt call. If so let's note it for them
instead of stuff silently not working.
Donald Sharp [Thu, 30 Jun 2022 11:50:04 +0000 (07:50 -0400)]
zebra: Correct implication of SOL_NETLINK NETLINK_ADD_MEMBERSHIP usage
The usage of SOL_NETLINK for adding memberships of interest is
1 group per call. The netink_socket function implied that
the call could be a bitfield of values. This is not correct
at all. This will trip someone else up in the future when
a new value is needed. Let's get it right `now` before
it becomes a problem.
Let's also add a bit of extra code to give operator a better
understanding of what went wrong when a kernel does not
support the option.
Finally as a point of future reference should FRR just switch
over to a loop to add the required loops instead of having
this bastardized approach of some going in one way and some
going in another way?
Implementing the TBD of watermark-warn CLI for IPv6 MLD
This command can be use to warn the user
when more than the desired limit of groups gets configured.
Signed-off-by: Sai Gomathi N <nsaigomathi@vmware.com>
Donald Sharp [Tue, 28 Jun 2022 14:26:52 +0000 (10:26 -0400)]
lib: Allow downgrade of all caps when none are specified
Staticd when run tells privs.c that it does not need any
priviledges. The lib/privs.c code was not downgrading
any and all permissions it may have been given at startup.
Since we don't need any let's actually tell the system that
FRR does not need the capabilities anymore in the case
where a daemon does not ask for any cap's.
Sarita Patra [Wed, 29 Jun 2022 13:34:19 +0000 (06:34 -0700)]
pimd: Register stop message sent with mask 32
As per RFC 4601 section 4.9.4, For Register-Stops,
the Mask Len field contains full address length * 8
(e.g. 32 for IPv4 native encoding) (e.g. 128 for IPv6),
if the message is sent for a single group
Trey Aspelund [Tue, 28 Jun 2022 14:08:55 +0000 (14:08 +0000)]
bgpd: include 0 in configured hold/keepalive
The default keepalive/hold timers are always exposed via this commit:
```
commit 9b1b96233d7204263d409ea6c504b316af9e533f (origin/bgp_timer_always_on)
Author: Trey Aspelund <taspelund@nvidia.com>
Date: Mon Jun 27 23:20:33 2022 +0000
bgpd: always display keepalive/hold intervals
`show bgp neighbors <peer> [json]` was only displaying the configured
keepalive and holdtime intervals when they differed from the default
values. Since default config is still config, let's make sure these
values are always displayed.
However it mistakenly changed the logic to only display the peer's
timers if the configured value was non-zero. This updates the logic to
check PEER_FLAG_TIMER to determine if the values were configured,
given 0 is a valid value (to disable keepalives).
pimd: Querier to non-querier transition to be ignored
Fixing IGMPv2 ANVL Conformance issue 3.10
As per RFC 2236 section 3, when the leave message is received at a querier,
it starts sending Query messages for "last Member Query Interval*query count"
During this time there should not be any querier to non-querier
transition and the same router needs to send the remaning queries.
Currently the code is handling this scenario only when leave is receive
for a group and the query is received for the same group.
But we need to handle it irrespective of group since the querier
election is based on interface and not group.
Trey Aspelund [Mon, 27 Jun 2022 23:20:33 +0000 (23:20 +0000)]
bgpd: always display keepalive/hold intervals
`show bgp neighbors <peer> [json]` was only displaying the configured
keepalive and holdtime intervals when they differed from the default
values. Since default config is still config, let's make sure these
values are always displayed.
Donald Sharp [Mon, 27 Jun 2022 19:30:55 +0000 (15:30 -0400)]
zebra: Add ability for netconf dplane to handle global values
Add the ability for the netconf dplane code to handle
the global NETCONFA_IFINDEX_DEFAULT and NETCONF_IFINDEX_ALL
values. Then store our interested values when we get
them from the kernel as well as being able to display
them to the end operator.
lynnemorrison [Mon, 6 Jun 2022 23:40:17 +0000 (19:40 -0400)]
bfdd: add IPv4 BFD Echo support that matches RFC
Modify the existing BFD Echo code to send an Echo message that will
be looped in the peers forwarding plane. The existing Echo code
only works with other FRR implementations because the Echo packet
must go up to BFD to be turned around and forwarded back to the
local router. The new BFD Echo code sets the src/dst IP of the
packet to be the local router's IP and sets the dest MAC to be the
peers MAC address. The peer receives the packet and because it
is not it's IP address it forwards it back to the local router.
configure, zebra: include DPDK headers and shared libs in the dp-dpdk build
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
-> Moved new capabilities needed to under HAVE_DPDK Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
zebra: expand pbr rule action for dataplane programming
PBR rules are installed as match, action rules in most dataplanes. This
requires the action to be resolved via a GW. And the GW to be subsequently
resolved to {SMAC, DMAC}.
zebra: add support for maintaining local neigh entries
Currently specific local neighbors (attached to SVIs) are maintatined
in an EVPN specific database. There is a need to maintain L3 neighbors
for other purposes including MAC resolution for PBR nexthops.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Cleanup compile and fix crash Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
G. Paul Ziemba [Fri, 20 May 2022 16:26:56 +0000 (09:26 -0700)]
toptests/isis_sr_te_topo1: test out-of-order route/route-map changes
A SR policy matches a BGP nexthop based on the IP address of
the nexthop and the color of the route (color may be assigned
to routes using a route-map).
The order of events (BGP route arrival, route-map definition,
policy and candidate-path definition) should not affect the
matching/mapping.
These changes add tests for:
- removing/adding BGP route after policy and routemap are
defined and held constant
- changing route map color to be different from policy color,
and then changing back to match
after each change, the policy should be observed to be in effect
unchanged from before, i.e., the route's nexthops should reflect
the matching SR policy.
Sarita Patra [Fri, 24 Jun 2022 14:48:03 +0000 (07:48 -0700)]
pimd: fix pim interface deletion flow
Deletion of pim interface(pim_if_delete) should
do the below things before cleanup.
1. Send a hello message with zero hold time.
2. Delete all the neighbors.
3. Close the pim socket.
Sarita Patra [Fri, 24 Jun 2022 10:04:37 +0000 (03:04 -0700)]
pimd: fix invalid memory access join_timer_stop
Issue:
==16837== Invalid read of size 8
==16837== at 0x17971C: pim_neighbor_find (pim_neighbor.c:431)
==16837== by 0x186439: join_timer_stop (pim_upstream.c:348)
==16837== by 0x186794: pim_upstream_del (pim_upstream.c:231)
==16837== by 0x189A66: pim_upstream_terminate (pim_upstream.c:1951)
==16837== by 0x17111B: pim_instance_terminate (pim_instance.c:54)
==16837== by 0x17111B: pim_vrf_delete (pim_instance.c:172)
==16837== by 0x4F1D6C8: vrf_delete (vrf.c:264)
==16837== by 0x19006F: pim_terminate (pimd.c:160)
==16837== by 0x1B2E4D: pim_sigterm (pim_signals.c:51)
==16837== by 0x4F08FA2: frr_sigevent_process (sigevent.c:130)
==16837== by 0x4F1A2CC: thread_fetch (thread.c:1771)
==16837== by 0x4ED4F92: frr_run (libfrr.c:1197)
==16837== by 0x15D81A: main (pim_main.c:176)
Root Cause:
In the pim_terminate flow, the interface is deleted
before the pim_interface clean up. Because of this,
the pim_interface is having garbage value.
Fix:
Release the pim interface memory and then delete the
interface.
Donald Sharp [Mon, 25 Apr 2022 20:30:36 +0000 (16:30 -0400)]
bgpd: Add `bgp allow-martian-nexthop` command
The command `debug bgp allow-martian` is not actually
a debug command it's a command that when entered allows
bgp to not reset a peering when a martian nexthop is
passed in the nlri.
Add the `bgp allow-martian-nexthop` command and allow it to be
used.
Donald Sharp [Fri, 17 Jun 2022 15:23:31 +0000 (11:23 -0400)]
zebra: Fix rtadv startup when config read in is before interface up
When a interface is configured with this:
int eva
ipv6 nd ra-interval 5
no ipv6 nd suppress-ra
!
And then subsuquently the interface is created and brought up, FRR
would both error on joining the RA multicast address and never
properly work in this state.
Delay the startup of the join and start of the Router Advertisements
until after the ifindex has actually been found.
Eugene Bogomazov [Fri, 24 Jun 2022 09:28:13 +0000 (12:28 +0300)]
bgpd: update topotests for role mismatch
In topotests, we also want to check for role mismatch cases. However, if
we are testing the sender of a role mismatch notification, sometimes it
can have non-deterministic behavior (probably due to a configuration
change). Thus, there is an assumption that the recipient of
notifications will more consistently display the reason why the session
was terminated in the first place.
rgirada [Thu, 23 Jun 2022 14:37:28 +0000 (07:37 -0700)]
vtysh: Account validity should be verified when authenticating users with PAM.
Description:
SonarQube detects the following behaviour as a vulanarability.
When authenticating users using PAM, it is strongly recommended to
check the validity of the account (not locked, not expired ...),
otherwise it leads to unauthorized access to resources.
pam_acct_mgmt() should be called for account validity after
calling pam_authenticate().
Donald Sharp [Sat, 18 Jun 2022 18:37:14 +0000 (14:37 -0400)]
isisd: Fix crash with xfrm interface type
When creating a xfrm interface FRR is crashing when configured
with isis. This is because the weird pattern of not allocating
list's until needed and then allowing the crash when we have
a usage pattern that was not expected. Just always allocate
the different lists that a circuit needs.
(gdb) bt
(gdb)
Fixes #11432 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Fri, 17 Jun 2022 19:40:36 +0000 (15:40 -0400)]
tests: Increase time for zebra_seg6local to look for sharp routes
I have a test failure:
r1.vtysh_cmd(
"sharp install seg6local-routes {} nexthop-seg6local dum0 {} 1".format(
dest, context
)
)
test_func = partial(
check,
r1,
dest,
manifest["out"],
)
success, result = topotest.run_and_expect(test_func, None, count=5, wait=1)
> assert result is None, "Failed"
E AssertionError: Failed
E assert Generated JSON diff error report:
E
E > $: d2 has the following element at index 0 which is not present in d1:
E
E {
E "prefix": "1::1/128",
E "protocol": "sharp",
E "selected": true,...
E
The test output for 1::1/128:
{
"1::1/128":[
{
"prefix":"1::1/128",
"prefixLen":128,
"protocol":"sharp",
"vrfId":0,
"vrfName":"default",
"selected":true,
"destSelected":true,
"distance":150,
"metric":0,
"queued":true,
"table":254,
"internalStatus":8,
Notice that it is still queued after 5 seconds. Under extremely heavy system load
this is not long enough for convergence. Also the zebra.log shows thread starvation
as well as long running tasks
2022/06/17 15:30:02 ZEBRA: [PHJDC-499N2][EC 100663314] STARVATION: task dplane_incoming_request (55b3ce0fea8b) ran for 6369ms (cpu time 0ms)
2022/06/17 15:30:02 ZEBRA: [T83RR-8SM5G] zebra 8.4-dev starting: vty@2601
2022/06/17 15:30:02 ZEBRA: [YZRX4-ZXG0C][EC 100663315] Thread Starvation: {(thread *)0x55b3ce6c15b0 arg=0x0 timer r=-6.375 rib_sweep_route() &zrouter.sweeper from zebra/main.c:447} was scheduled to pop greater than 4s ago
Increasing the time to 25 seconds to give it a chance.