Donald Sharp [Thu, 27 Jul 2023 19:36:33 +0000 (15:36 -0400)]
tests: Convert isis to use 1 and 10 for hello/multiplier
Current isis tests use a variety of hello timers as well
as hello-multiplier, let's modify all of the isis test
cases to use 1 and 10. This cleans up some spurious test
failures I was seeing locally. As an example without
these changes running isis_tilfa_topo1 2r6 times I would
see 5-10 test failures now I am seeing ~2 test failures.
In any event part of the problem was that some tests were
not fully converged when looking at them under heavy
system load. Changing this to 1/10 gives us 10 chances
to see the incoming packet.
Donald Sharp [Sat, 29 Jul 2023 17:26:26 +0000 (13:26 -0400)]
tests: bfd_bgp_cbit_topo3 allow bgp to converge before testing
This test was failing upstream a bunch of times. Upon examining
the log files as well as the test script it was noticed that
the bfd peers were checked to see that they had come up. But
both the timers used for bgp as well as not checking that bgp
has actually come up would cause the test to fail in subsuquent
steps if bgp has not come up. Test that bgp peering is actually
established before testing link down events. It's possible
this test might need to be revisited to ensure that the routes
are actually installed and ready to go before as well, but I am
not seeing that right now.
Donald Sharp [Sat, 29 Jul 2023 17:24:55 +0000 (13:24 -0400)]
tests: Fix zebra_seg6_route to give more time for routes to be installed
This test is failing upstream regularly, when inspecting the log
files we see that the route being looked for is in a queued state
when the test fails. Give this test more time for when the
system is under severe load.
Donald Sharp [Sat, 29 Jul 2023 17:22:39 +0000 (13:22 -0400)]
tests: isis_te_topo1 can fail occassionally
Upstream ( and locally ) this test fails. The adj-sid value
being looked for in the testing is a dynamic value that is
assigned based upon how the network comes up. The reality
is that there is no enforced order of what the adj-sid
can be. As such this test looking for this value makes
no sense. Let's remove that from the test.
Additionally bring the isis hello-interval to 1 down
from 3 to make things converge faster.
Donald Sharp [Fri, 28 Jul 2023 14:31:30 +0000 (10:31 -0400)]
tests: zebra_netlink ensure the address is installed
Ran test under high load and system rejected the sharp
install of routes. Only reason that that would happen
would be if the address had not been set by the kernel
yet. The test log files had timestamp precision and the
addition of the sharp routes was under 1/10 of a second
after the address was attempted to be installed.
Starting from step 11, this topotest focuses on validating the TI-LFA
switchover functionality, where the backup nexthops are activated
after an adjacency expires, either with or without BFD.
Currently, the test checks the RIB shortly after the switchover using
a tight 5 seconds interval to ensure that the RIB update is due to the
switchover and not an SPF update (which is configured with an initial
delay of 15 seconds). However, it was observed that the kernel might
take longer than 5 seconds to install routes when the system is under
heavy load. To account for that, double the wait interval so that
this topotest will succeed even in those conditions.
tests: ensure BFD session is up before proceeding to the next step
In this topotest, BFD is configured at the end of step 13. However,
in certain cases where the testing machine is exceptionally fast (e.g.,
Donald's quantum computer), there is a possibility that the interface
shutdown event from step 14 may occur before BFD has had sufficient
time to establish the session, which leads to a test failure. To fix
this problem, ensure the BFD session is up before proceeding to the
next step.
Node-SIDs refer to Prefix-SIDs associated with host prefixes of
loopback addresses. As such, whenever an interface address is added
or deleted, all configured Prefix-SIDs must be reevaluated to check
if the N-flag needs to be set or unset.
This change fixes some race conditions in the TI-LFA topotest where
specific sequence of events could cause Prefix-SIDs to not have the
N-flag set when they should, resulting in various failures.
tests: increase hello multiplier in TI-LFA topotest
In this topotest, the IS-IS hello interval is set to 1 for fast
convergence. However, the current hello multiplier of 3 results in a
tight IS-IS adjacency holdtime of 3 seconds. This tight timeframe can
cause failures when the testing machine is running multiple tests at
full capacity. To improve stability under such conditions, this commit
raises the hello multiplier to 10, providing a more forgiving holdtime
and reducing the likelihood of failures.
When 'no neighbor .. update-source' is issued for a regular peer, that
peer is always reset. This is unnecessary if the peer is a member of a
peer-group and it inherits an identical update-source, so let's skip
the reset/Notification for that condition.
Before:
------------
ub20-2(config-router)# do show ip bgp sum | include .99
192.168.122.99 4 1 36 34 0 0 0 00:00:17 0 0 N/A
ub20-2(config-router)# do show ip bgp neighbors 192.168.122.99 | include Local host
Local host: 100.64.0.3, Local port: 46083
ub20-2(config-router)# no neighbor 192.168.122.99 update-source
ub20-2(config-router)# do show ip bgp sum | include .99
192.168.122.99 4 1 36 35 0 0 0 00:00:01 Idle 0 N/A
ub20-2(config-router)# do show ip bgp neighbors 192.168.122.99 | include Local host
Local host: 100.64.0.3, Local port: 39847
After:
------------
ub20-2(config-router)# do show ip bgp sum | include .99
192.168.122.99 4 1 3 3 0 0 0 00:00:20 0 0 N/A
ub20-2(config-router)# do show ip bgp neighbors 192.168.122.99 | include Local host
Local host: 100.64.0.3, Local port: 39415
ub20-2(config-router)# no neighbor 192.168.122.99 update-source
ub20-2(config-router)# do show ip bgp sum | include .99
192.168.122.99 4 1 3 3 0 0 0 00:00:28 0 0 N/A
ub20-2(config-router)# do show ip bgp neighbors 192.168.122.99 | include Local host
Local host: 100.64.0.3, Local port: 39415
Two changes for debug:
1. Add a field to indicate its vrf for nexthop. When the interface changes
vrf, we can't easily know the vrf of this nexthop according to current log.
2. Add a field to indicate operation type. We can't know whether to add or
remove route according to current log.
Before:
```
zebra_nhg_increment_ref: nhe 0x555623eb82c0 (76[if 6]) 0 => 1
zebra_interface_nhg_reinstall install nhe 75[77.75.1.75 if 6] nh type 3 flags 0x1
Route 77.75.1.0/24(8) queued for processing into sub-queue Early Route Processing
Route 77.75.1.0/24(8) queued for processing into sub-queue Early Route Processing
```
After:
```
zebra_nhg_increment_ref: nhe 0x555623eb82c0 (76[if 6 vrfid 9]) 0 => 1
zebra_interface_nhg_reinstall install nhe 75[77.75.1.75 if 6 vrfid 8] nh type 3 flags 0x1
Route 77.75.1.0/24(8) (add) queued for processing into sub-queue Early Route Processing
Route 77.75.1.0/24(8) (delete) queued for processing into sub-queue Early Route Processing
```
Donald Sharp [Mon, 24 Jul 2023 00:30:47 +0000 (20:30 -0400)]
bgpd: The last_reset_cause in the peer structure is too large
The last_reset_cause is a plain old BGP_MAX_PACKET_SIZE buffer
that is really enlarging the peer data structure. Let's just
copy the stream that failed and only allocate how ever much
the packet size actually was. While it's likely that we have
a reset reason, the packet typically is not going to be 65k
in size. Let's save space.
Quentin Young [Mon, 24 Jul 2023 23:01:51 +0000 (19:01 -0400)]
tests: fix strncpy warning
GCC/clang warns about using strncpy in such a way that it does not copy
the null byte of a string; as implemented it was fine, but to fix the
warning, just use strlcat which was purpose made for the task being
accomplished here.
Signed-off-by: Quentin Young <qlyoung@qlyoung.net>
Donald Sharp [Tue, 2 May 2023 13:25:04 +0000 (09:25 -0400)]
lib: Fix elf_py.c for coverity
David rightly pointed out that having a test for fd > 0 would
technically not be right, but not wrong for this portion of the
code since we know that we would never get a fd = 0 in this section.
In any event let's make coverity happy and move on with our life.
Donald Sharp [Mon, 24 Jul 2023 14:33:21 +0000 (10:33 -0400)]
bgpd: Reduce size of ibuf_work ringbuf
The ringbuf is 650k in size. This is obscenely large and
in practical experimentation FRR never even approaches
that size at all. Let's reduce this to 1.5 max packet sizes.
If a BGP_MAX_PACKET_SIZE packet is ever received having a bit
of extra space ensures that we can read at least 1 packet.
This also will significantly reduce memory usage when the
operator has a lot of peers.
Introduced the idea of a input Q packet limit. Say you read in
635000 bytes of data and the input Q is already at it's limit
(currently 1000) then when bgp_process_reads runs it will
assert because there is less then a BGP_MAX_PACKET_SIZE in ibuf_work.
Don't assert as that it's irrelevant. Even if we can't read a full packet
in let's let the whole system keep working as that as the input Q length
comes down we will start pulling down the ibuf_work and it will be ok.
zebra: fix nhg out of sync between zebra and kernel
PR#13413 introduces reinstall mechanism, but there is problem with the route
leak scenario.
With route leak configuration: ( `x1` and `x2` are binded to `vrf1` )
```
vrf vrf2
ip route 75.75.75.75/32 77.75.1.75 nexthop-vrf vrf1
ip route 75.75.75.75/32 77.75.2.75 nexthop-vrf vrf1
exit-vrf
```
Firstly, all are ok. But after `x1` is set down and up ( The interval
between the down and up operations should be less than 180 seconds. ) ,
`x1` is lost from the nexthop group:
```
anlan# ip nexthop
id 121 group 122/123 proto zebra
id 122 via 77.75.1.75 dev x1 scope link proto zebra
id 123 via 77.75.2.75 dev x2 scope link proto zebra
anlan# ip route show table 2
75.75.75.75 nhid 121 proto 196 metric 20
nexthop via 77.75.1.75 dev x1 weight 1
nexthop via 77.75.2.75 dev x2 weight 1
anlan# ip link set dev x1 down
anlan# ip link set dev x1 up
anlan# ip route show table 2 <- Wrong, one nexthop lost from group
75.75.75.75 nhid 121 via 77.75.2.75 dev x2 proto 196 metric 20
anlan# ip nexthop
id 121 group 123 proto zebra
id 122 via 77.75.1.75 dev x1 scope link proto zebra
id 123 via 77.75.2.75 dev x2 scope link proto zebra
anlan# show ip route vrf vrf2 <- Still ok
VRF vrf2:
S>* 75.75.75.75/32 [1/0] via 77.75.1.75, x1 (vrf vrf1), weight 1, 00:00:05
* via 77.75.2.75, x2 (vrf vrf1), weight 1, 00:00:05
```
From the impact on kernel:
The `nh->type` of `id 122` is *always* `NEXTHOP_TYPE_IPV4` in the route leak
case. Then, `nexthop_is_ifindex_type()` introduced by commit `5bb877` always
returns `false`, so its dependents can't be reinstalled. After `x1` is down,
there is only `id 123` in the group of `id 121`. So, Finally `id 121` remains
unchanged after `x1` is up, i.e., `id 122` is not added to the group even it is
reinstalled itself.
From the impact on zebra:
The `show ip route vrf vrf2` is still ok because the `id`s are reused/reinstalled
successfully within 180 seconds after `x1` is down and up. The group of `id 121`
is with old `NEXTHOP_GROUP_INSTALLED` flag, and it is still the group of `id 122`
and `id 123` as before.
In this way, kernel and zebra have become out of sync.
The `nh->type` of `id 122` should be adjusted to `NEXTHOP_TYPE_IPV4_IFINDEX`
after nexthop resolved. This commit is for doing this to make that reinstall
mechanism work.
Currently, json output of show ip route command are no pretty format.
This is an extremely expensive operation at high scale
(with high number of routes with many paths).
Donald Sharp [Fri, 21 Jul 2023 17:10:03 +0000 (13:10 -0400)]
bgpd: Replace peer->ibuf_scratch
The peer->ibuf_scratch was allocating 65535 * 10 bytes
for scratch space to hold data incoming from a read
from a peer. When you have 4k peers this is 262,1400,000
or 262 mb of data. Which is crazy large. Especially
since the i/o pthread is reading per peer without
any chance of having the data interfere with other reads.
G. Paul Ziemba [Mon, 17 Jul 2023 16:31:06 +0000 (09:31 -0700)]
lib: zapi PBR common encode/decode
bgpd, pbrd: use common pbr encoder
zebra: use common pbr decoder
tests: pbr_topo1: check more filter fields
Purpose:
1. Reduce likelihood of zapi format mismatches when adding
PBR fields due to multiple parallel encoder implementations
2. Encourage common PBR structure usage among various daemons
3. Reduce coding errors via explicit per-field enable flags
G. Paul Ziemba [Wed, 19 Jul 2023 14:58:02 +0000 (07:58 -0700)]
pbrd: add vlan filters pcp/vlan-id/vlan-flags; ip-protocol any (pbr feature)
Subset: feature in PBR
New PBR rule fields:
match ip-protocol (was only tcp|udp, now any value in /etc/protocols)
match pcp (0-7)
match vlan (1-4094)
match vlan (tagged|untagged|untagged-or-zero)
Filter flags
Add filter_bm (flags) field internally to indicate which
filter fields should be considered active. Bit definitions
as in lib/pbr.h.
This commit uses only the PBR_FILTER_PCP bit, but other
fields will be added in future commits. (Fixes bug related
to determining set/not-set state of pcp filter)
Shift vlan filter flags to lib/pbr.h
Changes by:
Josh Werner <joshuawerner@mitre.org>
Eli Baum <ebaum@mitre.org>
G. Paul Ziemba <paulz@labn.net>
zebra:fix a zebra crash issue caused by mac change
When the MAC address of the neighbor changes, a possible crash issue may occur.
In the zebra_evpn_local_neigh_update function, the value of old_zmac (n->mac) will be updated to the new MAC address when the neighbor's MAC address changes.
The pointer to the memory that this pointer points to may be released in the zebra_evpn_local_neigh_deref_mac function. This will cause old_zmac to become a dangling pointer. Accessing this dangling pointer in the zebra_evpn_ip_inherit_dad_from_mac function below will cause the zebra process to crash.
Here is the backtrace:
(gdb) bt
0 0x00007fc12c5f1fbf in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
1 0x00007fc12d52e19c in core_handler (signo=11, siginfo=0x7ffda1fd1570, context=<optimized out>) at lib/sigevent.c:262
2 <signal handler called>
3 zebra_evpn_ip_inherit_dad_from_mac (zvrf=<optimized out>, old_zmac=0x5579ac3ca520, new_zmac=0x5579aba82f80, nbr=0x5579abd65ec0) at zebra/ze
4 0x00005579aa8dbf6d in zebra_evpn_local_neigh_update (zevpn=0x5579abb81440, ifp=ifp@entry=0x5579ab8a1640, ip=ip@entry=0x7ffda1fd1b40, macadd
local_inactive=local_inactive@entry=253, dp_static=false) at zebra/zebra_evpn_neigh.c:1729
5 0x00005579aa9190a9 in zebra_vxlan_handle_kernel_neigh_update (ifp=ifp@entry=0x5579ab8a1640, link_if=link_if@entry=0x5579abd14f90, ip=ip@ent
is_ext=is_ext@entry=false, is_router=<optimized out>, local_inactive=false, dp_static=false) at zebra/zebra_vxlan.c:3791
6 0x00005579aa8b3048 in netlink_ipneigh_change (h=0x7ffda1fd1d50, len=<optimized out>, ns_id=<optimized out>) at zebra/rt_netlink.c:3649
7 0x00005579aa8ac667 in netlink_parse_info (filter=filter@entry=0x5579aa8ab630 <netlink_information_fetch>, nl=nl@entry=0x5579ab5861e8, zns=z
startup=startup@entry=0) at zebra/kernel_netlink.c:965
8 0x00005579aa8ac8c8 in kernel_read (thread=<optimized out>) at zebra/kernel_netlink.c:402
9 0x00007fc12d53e60b in thread_call (thread=thread@entry=0x7ffda1fd9fd0) at lib/thread.c:1834
10 0x00007fc12d4fba78 in frr_run (master=0x5579ab3a1740) at lib/libfrr.c:1155
11 0x00005579aa89c6e3 in main (argc=11, argv=0x7ffda1fda3c8) at zebra/main.c:485
(gdb) f 3
3 zebra_evpn_ip_inherit_dad_from_mac (zvrf=<optimized out>, old_zmac=0x5579ac3ca520, new_zmac=0x5579aba82f80, nbr=0x5579abd65ec0) at zebra/ze
1230 zebra/zebra_evpn_neigh.c: No such file or directory.
(gdb) p *old_zmac
Cannot access memory at address 0x5579ac3ca520
(gdb)
To fix this issue, the ZEBRA_MAC_DUPLICATE flag should be retrieved before old_zmac is released and used in the zebra_evpn_ip_inherit_dad_from_mac function.
tests: Update join state in verify_upstream_iif API
When JoinState is not passed to API it is expected to
be in Joined state, there was a minor bug in API, where
it was printng JoinState as None, which is default value
in API. Updated value to print Joined when verification
fails.
ospfd: fix default-metric change if external LSAs already sent
Currently, when redistribution of routes was configured, external LSAs
were already advertised to peers, and then default-metric is changed,
external LSAs refresh will not occur. In other words, the peers will not
receive the refreshed external LSAs with the new metric.
With this fix, changing default-metric will cause external LSAs to be
refreshed and flooded.
There is a similar task to refresh external LSAs when NSSA settings are
changed. And there is a function that accomplishes it -
ospf_schedule_asbr_nssa_redist_update(). Since the function does the
general work of refreshing external LSAs and is not specific to NSSA
settings, the idea is to give it a more general name and call it when
default-metric changes in order to fix the problem.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
When route_next return node, it has lock the node. if return or break loop, should unlock node. Signed-off-by: guozhongfeng <guozhongfeng.gzf@alibaba-inc.com>
A route-map applied on incoming BGP updates is not able
to exclude the unwanted as segments, based on an AS path
access-list.
The below configuration illustrates the case:
router bgp 65001
address-family ipv4 unicast
neighbor 192.168.1.2 route-map rule_2 in
exit-address-family
bgp as-path access-list RULE permit ^65
route-map rule_2 permit 10
set as-path exclude as-path-access-list RULE
```
BGP routing table entry for 10.10.10.10/32, version 13
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.10.65
65000 1 2 3 123
192.168.10.65 from 192.168.10.65 (10.10.10.11)
Origin IGP, metric 0, valid, external, best (First path received)
```
After:
```
do show ip bgp 10.10.10.10/32
BGP routing table entry for 10.10.10.10/32, version 15
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.10.65
2 3 123
192.168.10.65 from 192.168.10.65 (10.10.10.11)
Origin IGP, metric 0, valid, external, best (First path
received)
```