Donald Sharp [Wed, 1 Dec 2021 22:03:38 +0000 (17:03 -0500)]
lib: Update hash.h documentation to warn of a possible crash
Multiple deletions from the hash_walk or hash_iteration calls
during a single invocation of the passed in function can and
will cause the program to crash. Warn against doing such a
thing.
Donald Sharp [Wed, 1 Dec 2021 21:28:42 +0000 (16:28 -0500)]
zebra: Ensure zebra_nhg_sweep_table accounts for double deletes
I'm seeing this crash in various forms:
Program terminated with signal SIGSEGV, Segmentation fault.
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f418efbc7c0 (LWP 3580253))]
(gdb) bt
(gdb) f 4
267 (*func)(hb, arg);
(gdb) p hb
$1 = (struct hash_bucket *) 0x558cdaafb250
(gdb) p *hb
$2 = {len = 0, next = 0x0, key = 0, data = 0x0}
(gdb)
I've also seen a crash where data is 0x03.
My suspicion is that hash_iterate is calling zebra_nhg_sweep_entry which
does delete the particular entry we are looking at as well as possibly other
entries when the ref count for those entries gets set to 0 as well.
Then we have this loop in hash_iterate.c:
for (i = 0; i < hash->size; i++)
for (hb = hash->index[i]; hb; hb = hbnext) {
/* get pointer to next hash bucket here, in case (*func)
* decides to delete hb by calling hash_release
*/
hbnext = hb->next;
(*func)(hb, arg);
}
Suppose in the previous loop hbnext is set to hb->next and we call
zebra_nhg_sweep_entry. This deletes the previous entry and also
happens to cause the hbnext entry to be deleted as well, because of nhg
refcounts. At this point in time the memory pointed to by hbnext is
not owned by the pthread anymore and we can end up on a state where
it's overwritten by another pthread in zebra with data for other incoming events.
What to do? Let's change the sweep function to a hash_walk and have
it stop iterating and to start over if there is a possible double
delete operation.
Donald Sharp [Mon, 24 Jan 2022 12:02:37 +0000 (07:02 -0500)]
bgpd: Prevent use after variable goes out of scope
`struct prefix p` was declared inside an if statement
where we assign the address of to a pointer that is
then passed to a sub function. This will eventually
leave us in a bad state.
Tomi Salminen [Wed, 2 Feb 2022 09:19:09 +0000 (11:19 +0200)]
ospfd: Core in ospf_if_down during shutdown.
Skip marking routes as changed in ospf_if_down if there's now
new_table present, which might be the case when the instance is
being finished
The backtrace for the core was:
raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:50
core_handler (signo=11, siginfo=0x7fffffffe170, context=<optimized out>) at lib/sigevent.c:262
<signal handler called>
route_top (table=0x0) at lib/table.c:401
ospf_if_down (oi=oi@entry=0x555555999090) at ospfd/ospf_interface.c:849
ospf_if_free (oi=0x555555999090) at ospfd/ospf_interface.c:339
ospf_finish_final (ospf=0x55555599c830) at ospfd/ospfd.c:749
ospf_deferred_shutdown_finish (ospf=0x55555599c830) at ospfd/ospfd.c:578
ospf_deferred_shutdown_check (ospf=<optimized out>) at ospfd/ospfd.c:627
ospf_finish (ospf=<optimized out>) at ospfd/ospfd.c:683
ospf_terminate () at ospfd/ospfd.c:653
sigint () at ospfd/ospf_main.c:109
quagga_sigevent_process () at lib/sigevent.c:130
thread_fetch (m=m@entry=0x5555556e45e0, fetch=fetch@entry=0x7fffffffe9b0) at lib/thread.c:1709
frr_run (master=0x5555556e45e0) at lib/libfrr.c:1174
main (argc=9, argv=0x7fffffffecb8) at ospfd/ospf_main.c:254
Signed-off-by: Tomi Salminen <tsalminen@forcepoint.com>
anlan_cs [Wed, 19 Jan 2022 07:10:18 +0000 (02:10 -0500)]
bfdd: fix broken FSM in passive mode
Problem:
One is with active mode, the other is with passive mode. Sometimes
the one with active mode is in `Down` stauts, but the other one
with passive mode is unluckily stuck in `Init` status:
It doesn't answer its peer with any packets, even receiving continuous
`Down` packets.
Root Cause:
bfdd with passive mode answers its peer only *one* packet in `Down` status,
then it enters into `Init` status and ignores subsequent `Down` packets.
Unluckily that *one* answered packet is lost, at that moment its peer
with active mode can only have to send `Down` packets.
Fix:
1) With passive mode, bfdd should start xmittimer after received `Down` packet.
Refer to RFC5880:
"A system taking the Passive role MUST NOT begin sending BFD packets for
a particular session until it has received a BFD packet for that session, and
thus has learned the remote system's discriminator value."
2) Currently this added xmittimer for passive mode can be safely removed
except receiving `AdminDown` packet:
- `bfd_session_enable/bfd_set_passive_mode` doesn't start xmittimer
- `ptm_bfd_sess_dn/bfd_set_shutdown` can remove xmittimer
Per RFC5880, receiving `AdminDown` packet should be also regarded as `Down`,
so just call `ptm_bfd_sess_dn`, which will safely remove the added xmittimer
for passive mode. In summary, call `ptm_bfd_sess_dn` for two status changes
on receiving `AdminDown`: `Init`->`Down` and `Up`->`Down`.
Mark Stapp [Tue, 1 Feb 2022 18:41:17 +0000 (13:41 -0500)]
zebra: reduce incoming netlink messages for dplane thread
The dataplane pthread only processes a limited set of incoming
netlink notifications: only register for that set of events,
reducing duplicate incoming netlink messages.
Igor Ryzhov [Sun, 23 Jan 2022 17:22:42 +0000 (20:22 +0300)]
zebra: fix cleanup of meta queues on vrf disable
Current code treats all metaqueues as lists of route_node structures.
However, some queues contain other structures that need to be cleaned up
differently. Casting the elements of those queues to struct route_node
and dereferencing them leads to a crash. The crash may be seen when
executing bgp_multi_vrf_topo2.
Fix the code by using the proper list element types.
ckishimo [Mon, 31 Jan 2022 22:58:14 +0000 (23:58 +0100)]
ospfd: restart spf when distance is updated
if r1 has a route received from a neighbor with the default administrative
distance configured
r1# sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O>* 1.1.1.1/32 [110/20] via 10.0.12.2, r1-r2-eth0, weight 1, 00:00:41
this is not applied as there are no changes in the routing table
r1# sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O>* 1.1.1.1/32 [110/20] via 10.0.12.2, r1-r2-eth0, weight 1, 00:00:13
This commit will force the update of the routing table with the new configured distance
r1# sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O>* 1.1.1.1/32 [50/20] via 10.0.12.2, r1-r2-eth0, weight 1, 00:00:14
Donald Sharp [Mon, 31 Jan 2022 17:49:55 +0000 (12:49 -0500)]
ospfd: Convert output to host order from network order for route_tag
FRR stores the route_tag in network byte order. Bug filed indicates
that the `show ip ospf route` command shows the correct value.
Every place route_tag is dumped in ospf_vty.c the ntohl function
is used first.
Fixes: #10450 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Fri, 21 Jan 2022 17:03:04 +0000 (12:03 -0500)]
zebra: Make Router Advertisement warnings show up once every 6 hours
RA packets are pretty chatty and when there is a warning from
a missconfiguration on the network, the log file gets filed
up with warnings. Modify the code in rtadv.c to only spit
out the warning in these cases at most every 6 hours.
ckishimo [Wed, 26 Jan 2022 21:41:36 +0000 (22:41 +0100)]
ospf6d: restart spf when distance is updated
if r1 has a route received from a neighbor and the same route
configured as static, the administrative distance will determine
which route to use
r1(config)# ipv6 route 1:1::1/128 Null0 70
r1# sh ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
S>* 1:1::1/128 [70/0] unreachable (blackhole), weight 1, 00:00:12
O 1:1::1/128 [110/20] via fe80::1833:c9ff:fe7b:3e43, r1-r2-eth0, weight 1, 00:00:49
The static route is selected. If we now change the administrative distance
in ospf6, the OSPF route should be selected
r1# sh ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
S>* 1:1::1/128 [70/0] unreachable (blackhole), weight 1, 00:00:39
O 1:1::1/128 [110/20] via fe80::1833:c9ff:fe7b:3e43, r1-r2-eth0, weight 1, 00:01:16
However the distance is not applied as there are no changes in the routing table
This commit will force the update of the routing table with the new configured distance
r1# sh ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O>* 1:1::1/128 [50/20] via fe80::8cb7:e6ff:fef5:2344, r1-r2-eth0, weight 1, 00:00:03
S 1:1::1/128 [70/0] unreachable (blackhole), weight 1, 00:00:19
Igor Ryzhov [Thu, 27 Jan 2022 18:05:40 +0000 (21:05 +0300)]
vrrpd: use ipaddr_is_zero when needed
Replace custom implementation or call to ipaddr_isset with a call to
ipaddr_is_zero.
ipaddr_isset is not fully correct, because it's fine to have some
non-zero bytes at the end of the struct in case of IPv4 and the function
doesn't allow that.
Donald Sharp [Mon, 24 Jan 2022 20:37:54 +0000 (15:37 -0500)]
zebra: Don't double delete the table we are cleaning up
vrf_disable is always called first before
vrf_delete. The rnh_table and rnh_table_multicast tables
are already deleted as part of vrf_disable. No need
to do it again.
ckishimo [Tue, 25 Jan 2022 17:49:27 +0000 (18:49 +0100)]
ospf6d: show if area is NSSA
This PR will include if the area is NSSA in the output of "show ipv6 ospf"
r2# show ipv6 ospf
...
Area 0.0.0.0
Number of Area scoped LSAs is 8
Interface attached to this area: r2-eth1
SPF last executed 20.46717s ago
Area 0.0.0.1[Stub]
Number of Area scoped LSAs is 9
Interface attached to this area: r2-eth0
SPF last executed 20.46911s ago
Area 0.0.0.2[NSSA]
Number of Area scoped LSAs is 14
Interface attached to this area: r2-eth2
SPF last executed 20.46801s ago