Donald Sharp [Mon, 18 Apr 2022 18:06:26 +0000 (14:06 -0400)]
bgpd: Allow type 5 routes to be handled better when link is flapping
In some stress testing, we are seeing type-5 evpn routes being
left in a rejected state in zebra.
Sequence of events as I am seeing it:
a) Interface comes up that type5 routes nexthop depends on
b) zebra processes creates the connected and lets bgp know via nht
c) bgp installs the route to zebra
d) zebra processes and sends install to kernel
e) before route is installed, the interface the nexthop points at flaps
f) the route install is rejected, notify zebra
g) the interface comes up
h) zebra gets the notification about the route install rejection
i) zebra processes the down/up and turns it into a single up event
j) BGP never reinstalls the type 5 route
This up event does not translate into a nexthop tracking event
when the events happen quickly enough and/or zebra is extremelyh
busy and bgp would never see that the nexthops changed even very quickly.
This is the same thing that was going on with
https://github.com/FRRouting/frr/pull/7724
in PBR.
To fix this let's notice the interface up/down events for v4
in bgp now as well.
Optional recognized and unrecognized BGP attributes,
whether transitive or non-transitive, SHOULD NOT be updated by the
route server (unless enforced by local IXP operator configuration)
and SHOULD be passed on to other route server clients.
By default LB ext-community works with iBGP peers. When we receive a route
from eBGP peer, we can send LB ext-community to iBGP peers.
With this patch, allow sending LB ext-community to iBGP/eBGP peers if they
are set as RS clients.
FRR does not send non-transitive ext-communities to eBGP peers, but for
example GoBGP sends and if it's set as RS client, we should pass those attributes
towards another RS client.
frr(config-if)# ip igmp join 232.1.1.1 10.10.10.10
frr(config-if)# do sh ip igmp sources
Interface Address Group Source Timer Fwd Uptime
ens192 232.1.1.1 10.10.10.10 04:10 N 00:00:10
frr(config-if)#
The above output is misaligned and is having Address field which is not
required here.
Donald Sharp [Sun, 10 Apr 2022 11:47:01 +0000 (07:47 -0400)]
tests: Do not turn off multicast stream
The test is testing whether interface flaps are causing
the appropriate pim reactions. Unfortunately the test
is turning off the multicast stream and the test also
has a keep alive timer of 15 seconds set on all routers.
Which of course means the test has 15 seconds(at most) to finish
testing. This is not always possible given system loads.
Donald Sharp [Tue, 5 Apr 2022 13:26:18 +0000 (09:26 -0400)]
tests: Fix test_multicast_pim_sm_topo3.py from generating a support bundle
The test_multicast_pim_sm_topo3.py test is both spending extra time
looking for state that will never occurr but also generating a support
bundle when it doesn't find it. Fix the test to come to the correct
solution faster.
Donald Sharp [Sat, 9 Apr 2022 17:12:28 +0000 (13:12 -0400)]
zebra: Allow system routes to recurse through themselves
Currently if a end user has something like this:
Routing entry for 192.168.212.1/32
Known via "kernel", distance 0, metric 100, best
Last update 00:07:50 ago
* directly connected, ens5
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 0.0.0.0/0 [0/100] via 192.168.212.1, ens5, src 192.168.212.19, 00:00:15
C>* 192.168.212.0/27 is directly connected, ens5, 00:07:50
K>* 192.168.212.1/32 [0/100] is directly connected, ens5, 00:07:50
And FRR does a link flap, it refigures the route and rejects the default
route:
2022/04/09 16:38:20 ZEBRA: [NZNZ4-7P54Y] default(0:254):0.0.0.0/0: Processing rn 0x56224dbb5b00
2022/04/09 16:38:20 ZEBRA: [ZJVZ4-XEGPF] default(0:254):0.0.0.0/0: Examine re 0x56224dbddc20 (kernel) status: Changed Installed flags: Selected dist 0 metric 100
2022/04/09 16:38:20 ZEBRA: [GG8QH-195KE] nexthop_active_update: re 0x56224dbddc20 nhe 0x56224dbdd950 (7), curr_nhe 0x56224dedb550
2022/04/09 16:38:20 ZEBRA: [T9JWA-N8HM5] nexthop_active_check: re 0x56224dbddc20, nexthop 192.168.212.1, via ens5
2022/04/09 16:38:20 ZEBRA: [M7EN1-55BTH] nexthop_active: Route Type kernel has not turned on recursion
2022/04/09 16:38:20 ZEBRA: [HJ48M-MB610] nexthop_active_check: Unable to find active nexthop
2022/04/09 16:38:20 ZEBRA: [JPJF4-TGCY5] default(0:254):0.0.0.0/0: After processing: old_selected 0x56224dbddc20 new_selected 0x0 old_fib 0x56224dbddc20 new_fib 0x0
So the 192.168.212.1 route is matched for the nexthop but it is not connected and
zebra treats it as a problem. Modify the code such that if a system route
matches through another system route, then it should work imo.
bgpd: Show conditional advertisement timers in neighbor CLI output
```
spine1-debian-11# sh ip bgp neighbors 192.168.0.1
BGP neighbor is 192.168.0.1, remote AS 65001, local AS 65000, external link
Hostname: exit1-debian-11
BGP version 4, remote router ID 192.168.10.123, local router ID 192.168.100.1
BGP state = Established, up for 00:00:32
Last read 00:00:30, Last write 00:00:30
Hold time is 180, keepalive interval is 60 seconds
Configured conditional advertisements interval is 5 seconds
Time until conditional advertisements begin is 4 seconds
```
bgpd: Allow setting BGP [large]community in route-maps
Before:
```
spine1-debian-11(config-route-map)# bgp community alias 65001:65001 test1
spine1-debian-11(config)# route-map rm permit 10
spine1-debian-11(config-route-map)# set community 65001:65001
% Malformed communities attribute
```
After:
```
spine1-debian-11(config)# bgp community alias 65001:65001 test1
spine1-debian-11(config)# route-map rm permit 10
spine1-debian-11(config-route-map)# set community 65001:65001
spine1-debian-11(config-route-map)#
```
Donald Sharp [Wed, 6 Apr 2022 13:13:51 +0000 (09:13 -0400)]
watchfrr: Send operational state to systemd
When watchfrr has noticed issues, send operational state
to systemd so operators issuing `systemd status frr` can
see a more nuanced state of the daemon.
Add the `--operational-timeout X` value to the cli. After
the daemon has been restarted and communication re-established
wait this time before reporting to systemd that the daemon
is up and running.
Default value of 60 seconds was choosen to allow some small
delay in reporting so that, if the daemon is in a crash loop
status will not ping pong.
bgpd: Do not forget to update conditional advertisements rmaps for peer-groups
When the peer is configured for the first time:
```
neighbor P1 peer-group
neighbor P1 remote-as external
neighbor P1 advertise-map ADV exist-map EXIST
neighbor 10.10.10.1 peer-group P1
```
Conditional advertisements route-maps are not updated and cond. advertisements
do not work until FRR restarted. BGP sessions clear does not help.
Or even changing peer-group for a peer, causes this bug to kick in.
```
no neighbor 10.10.10.1
neighbor 10.10.10.1 peer-group P2
```
With this fix, cond. advertisements start working immediatelly.
Donald Sharp [Tue, 29 Mar 2022 14:55:34 +0000 (10:55 -0400)]
zebra: Allow multiple connected routes to be choosen for kernel routes
This bug should only really affect kernel routes. To reproduce:
a) Have multiple connected routes that point to the same prefix
swp8 up default 169.254.0.250/30
swp9 up default 169.254.0.250/30
b) Have a kernel route that uses one of those connected routes
7.6.2.8 via 169.254.0.249 dev swp8 proto static
(But have it choose a non-selected connected nexthop)
c) Introduce an event that causes the rib table to be reprocessed,
say a unrelated interface going up / down
This causes the route to be lost with this message:
2022/03/28 21:21:53 ZEBRA: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID (3454): 169.254.0.249, via swp8(1383) vrf default(0)
2022/03/28 21:21:53 ZEBRA: [YF2E6-J60JH] nexthop_active: 169.254.0.249, via swp8 given ifindex does not match nexthops ifindex found found: directly connected, swp9
Effectively the nexthop that zebra is choosing would not be the one
that the kernel route has choosen and FRR removes the route:
022/03/28 21:21:53 ZEBRA: [NM15X-X83N9] rib_process: (0:254):7.6.2.8/32: rn 0x56042e632e90, removing re 0x56042e6316e0
2022/03/28 21:21:53 ZEBRA: [Y53JX-CBC5H] rib_unlink: (0:254):7.6.2.8/32: rn 0x56042e632e90, re 0x56042e6316e0
2022/03/28 21:21:53 ZEBRA: [KT8QQ-45WQ0] rib_gc_dest: (0:?):7.6.2.8/32: removing dest from table
What is happening?
Zebra is not looking at all connected routes and if any of them
would have the appropriate ifindex and just blindly rejecting
the route.
So when nexthop resolution happens and it matches a connected
route and the dest->selected nexthop ifindex does not match, let's sort
through the rest of them and see if any of them match and if so
let's keep the route.
David Lamparter [Fri, 8 Apr 2022 08:30:24 +0000 (10:30 +0200)]
pimd: remove pim_interface->options
I should've removed this in #10960. It's a hazard in terms of
forgetting to adjust PRs/other changes that might accidentally still
reference the field.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Thu, 7 Apr 2022 12:44:23 +0000 (14:44 +0200)]
vtysh: remove extraneous newline
vtysh_client_execute() expects just a string without a newline; the
newline is passed through and ends up in logging output where newlines
are not quite wanted.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Two cosmetic change -
1) Remove unnecessary local variable - `es_vtep` used in one condition.
2) Remove unused variable - `es_cnt`. `proc_cnt` has already taken `es`
into account.
Since additional information such as block_bits_length is needed to
generate SIDs properly, the type of elements in srv6_locator_chunks
list is extended from "struct prefix_ipv6 *" to
"struct srv6_locator_chunk *". Even in terms of variable name,
"struct srv6_locator_chunk *" is appropriate.
ipv6 mld last-member-query-count (1-255)
This command can be use to tune the number of Multicast-Address- Specific Queries
sent before the router assumes there are no remaining listeners for an address on a link.
Signed-off-by: Sai Gomathi N <nsaigomathi@vmware.com>
ipv6 mld query-max-response-time <1-65535>
This command can be use to tune the max response time for general queries.
The number of seconds represented by the [Query Response Interval] must be less than the [Query Interval]
Signed-off-by: Sai Gomathi N <nsaigomathi@vmware.com>
The entire `case NEXTHOP_TYPE_IPV6_IFINDEX:` block here was a bit of a
tripwire to stumble over, since there was no indication at all that this
concerns RFC5549 nexthop handling. So it got mis-adapted for PIM IPv6
support.
Clarify this a whole bunch that this is for v4-over-v6 nexthop mangling,
and nothing else.
This should really also use neighbor's secondary address lists for the
lookup, but that is probably going to break compatibility with other
boxes that don't include v6 addrs in v4 hellos and needs further
machinery to do properly, so for now just leave a breadcrumb.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>