zyxwvu Shi [Wed, 15 Feb 2023 15:55:00 +0000 (23:55 +0800)]
zebra: Fix other table inactive when ip import-table is on
In `rib_link`, if is_zebra_import_table_enabled returns
true, `rib_queue_add` will not called, resulting in other
table route node never processed. This actually should not
be dependent on whether the route is imported.
In `rib_delnode`, if is_zebra_import_table_enabled returns
true, it will use `rib_unlink` instead of enqueuing the
route node for process. There is no reason that imported
route nodes should not be reprocessed. Long ago, the
behaviour was dependent on whether the route_entry comes
from a table other than main.
Donald Sharp [Thu, 2 Feb 2023 21:28:27 +0000 (16:28 -0500)]
lib: Fix non-use of option
Commit d7c6467ba2f55d1055babbb7fe82716ca3efdc7e added the
ability to specify non pretty printing but unfortunately
forgot to use the option variable to make the whole
thing work.
vivek [Fri, 18 Dec 2020 18:55:40 +0000 (10:55 -0800)]
bgpd: Prevent multipathing among EVPN and non-EVPN paths
Ensure that a multipath set is fully comprised of EVPN paths (i.e.,
paths imported into the VRF from EVPN address-family) or non-EVPN
paths. This is actually a condition that existed already in the code
but was not properly enforced.
This change, as a side effect, eliminates the known trigger condition
for bad or missing RMAC programming in an EVPN deployment, described
in tickets CM-29043 and CM-31222. Routes (actually, paths) in a VRF
routing table that require VXLAN tunneling to the next hop currently
need some special handling in zebra to deal with the nexthop (neigh)
and RMAC programming, and this is implemented for the entire route
(prefix), not per-path. This can lead to the bad or missing RMAC
situation, which is now eliminated by ensuring all paths in the route
are 'similar'.
The longer-term solution in CL 5.x will be to deal with the special
programming by means of explicit communication between bgpd and zebra.
This is already implemented for EVPN-MH via CM-31398. These changes
will be extended to non-MH also and the special code in zebra removed
or refined.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com> Acked-by: Trey Aspelund <taspelund@nvidia.com> Acked-by: Anuradha Karuppiah <anuradhak@nvidia.com> Acked-by: Chirag Shah <chirag@nvidia.com>
Ticket: CM-29043
Testing Done:
1. Manual testing
2. precommit on both MLX and BCM platforms
3. evpn-smoke - BCM and VX
vivek [Thu, 3 Dec 2020 04:04:19 +0000 (20:04 -0800)]
bgpd: Fix deterministic-med check for stale paths
When performing deterministic MED processing, ensure that the peer
status is not checked when we encounter a stale path. Otherwise, this
path will be skipped from the DMED consideration leading to it potentially
not being installed.
Test scenario: Consider a prefix with 2 (multi)paths. The peer that
announces the path with the winning DMED undergoes a graceful-restart.
Before it comes back up, the other path goes away. Prior to the fix, a
third router that receives both these paths would have ended up not
having any path installed to the prefix after the above events.
bgpd: Intern default-originate attributes to avoid use-after-free
When we receive a default route from a peer and we originate default route
using `neighbor default-originate`, we do not track of struct attr we use,
and when we do `no neighbor default-originate` we withdraw our generated
default route, but we announce default-route from the peer.
After we do this, we unintern aspath (which was used for default-originate),
BUT it was used also for peer's default route we received.
And here we have a use-after-free crash, because bgp_process_main_one()
reaps old paths that are marked as BGP_PATH_REMOVED with aspath->refcnt > 0,
but here it's 0.
```
0 0x55c24bbcd022 in aspath_key_make bgpd/bgp_aspath.c:2070
1 0x55c24b8f1140 in attrhash_key_make bgpd/bgp_attr.c:777
2 0x7f52322e66c9 in hash_release lib/hash.c:220
3 0x55c24b8f6017 in bgp_attr_unintern bgpd/bgp_attr.c:1271
4 0x55c24ba0acaa in bgp_path_info_free_with_caller bgpd/bgp_route.c:283
5 0x55c24ba0a7de in bgp_path_info_unlock bgpd/bgp_route.c:309
6 0x55c24ba0af6d in bgp_path_info_reap bgpd/bgp_route.c:426
7 0x55c24ba17b9a in bgp_process_main_one bgpd/bgp_route.c:3333
8 0x55c24ba18a1d in bgp_process_wq bgpd/bgp_route.c:3425
9 0x7f52323c2cd5 in work_queue_run lib/workqueue.c:282
10 0x7f52323aab92 in thread_call lib/thread.c:2006
11 0x7f5232300dc7 in frr_run lib/libfrr.c:1198
12 0x55c24b8ea792 in main bgpd/bgp_main.c:520
13 0x7f5231c3a082 in __libc_start_main ../csu/libc-start.c:308
14 0x55c24b8ef0bd in _start (/usr/lib/frr/bgpd+0x2c90bd)
```
anlan_cs [Mon, 6 Feb 2023 01:27:05 +0000 (09:27 +0800)]
bgpd: fix use-after-free crash for evpn
```
anlan(config-router-af)# vni 33
anlan(config-router-af-vni)# route-target both 44:55
anlan(config-router-af-vni)# no route-target both 44:55
vtysh: error reading from bgpd: Resource temporarily unavailable (11)Warning: closing connection to bgpd because of an I/O error!
```
When `bgp_evpn_vni_rt_cmd` deals with "both" type, it wrongly created
only one node ( should be two nodes ) for lists of both `vpn->import_rtl` and
`vpn->export_rtl`. At this time, the two lists are already wrong.
In `no route-target both RT`, it will free the single node from lists of both
`vpn->import_rtl` and `vpn->export_rtl`. After freed from `vpn->import_rtl`,
it is "use-after-free" at the time of freeing it from `vpn->export_rtl`.
It causes crash sometimes, or other unexpected behaviours.
This issue is introduced by commit `3b7e8d`, which have adjusted both
`bgp_evpn_vni_rt_cmd` and `bgp_evpn_vrf_rt_cmd`.
Since `bgp_evpn_vrf_rt_cmd/no_bgp_evpn_vrf_rt_cmd` works well again
unintentionally with commit `7022da`, only `bgp_evpn_vni_rt_cmd` needs to
modify - add two nodes for "both" type and some explicit comments for this
special case of "both" type.
Donald Sharp [Thu, 2 Feb 2023 15:40:07 +0000 (10:40 -0500)]
bgpd: Convert evpn output to not pretty print json
Commit: 3cdb03fba7b40240fb38469a12b7b05a11043e09
changed the vty_json output to not be pretty printing.
The previous commit in the tree added vty_json_no_pretty
let's use that instead
Donald Sharp [Thu, 2 Feb 2023 15:28:19 +0000 (10:28 -0500)]
lib, bgpd: Add ability to specify that some json output should not be pretty
Initial commit: 23b2a7ef524c9fe083b217c7f6ebaec0effc8f52
changed the json output of `show bgp <afi> <safi> json` to
not have pretty print because when under a situation where
there are a bunch of routes with a large scale ecmp show
output was taking forever and this commit cut 2 minutes out
of vtysh run time.
When upgrading to latest version the long run time was noticed
due to testing. Let's add back this functionality such that
FRR can have reduced run times with vtysh when it's really
needed.
Chirag Shah [Tue, 24 Jan 2023 06:18:24 +0000 (22:18 -0800)]
bgpd: evpn route detail json display non prett
For BGP evpn route table detail json to use
non pretty form of display.
Problem:
In scaled evpn route table detail json dump
occupies high resources (CPU + memory) of the system.
In high scale evpn route dump using pretty form
hogs CPU for a while which can trigger watchfrr
to kill bgpd.
Solution:
Avoid pretty JSON print for detail version dump
- bfdd: fix ipv4 socket source selection
- bgpd : fix crash for `set ipv4/ipv6 vpn next-hop` command
- bgpd: stop overriding nexthop when bgp unnumbered
- bgpd: fix aggregated routes are withdrawn abnormally
- bgpd: fix a few memory leaks
- build: enable pim6d by default
- build: fix sed regex in lua macro
- doc : add freebsd 13 build docs
- isisd: fix memory leak
- lib: disable vrf before terminating interfaces
- lib: do not log `echo ping` commands from watchfrr
- ospf6d: fix infinite loop when adding asbr route
- ospfd: fix rfc conformance test cases 25.19 and 27.6
- ospfd: fix typo and report the P2P link name in the warning
- ospfd: report the router IP with opaque capability mismatch
- ospfd: fixing memory leak
- pimd: consistently ignore prefix list mask len
- staticd: do not crash when modifying an existing static route with color
- zebra: free all memory associated ctx->u.iptable.interface_name_list
- zebra: fix tracepoint changes for lttng
- zebra: free up route map name memory on vrf deletion event
- zebra: use `mpls enable`, not `mpls` when generating a config
- tools: Ignore agentx command for frr-reload.py
David Lamparter [Fri, 6 Jan 2023 15:59:55 +0000 (16:59 +0100)]
lib: disable xref ELF note on mips64el
mips64el does not have a 64-bit PC-relative relocation, which is needed
to emit the ELF note for xrefs. Disabling the ELF note means clippy
takes the fallback path using section headers, so everything does still
work (... unless you strip the section headers.)
David Lamparter [Fri, 6 Jan 2023 15:55:38 +0000 (16:55 +0100)]
zebra: do not load/store wider-than-ptr atomics
Most 32-bit architectures cannot do atomic loads and stores of data
wider than their pointer size, i.e. 32 bit. Funnily enough they
generally *can* do a CAS2, i.e., 64-bit compare-and-swap, but while a
CAS can emulate atomic add/bitops, loads and stores aren't available.
Replace with a mutex; since this is 99% used from the zserv thread, the
mutex should take the local-to-thread fast path anyway. And while one
atomic might be faster than a mutex lock/unlock, we're doing several
here, and at some point a mutex wins on speed anyway.
This fixes build on armel, mipsel, m68k, powerpc, and sh4.
Donald Sharp [Wed, 21 Dec 2022 13:04:34 +0000 (08:04 -0500)]
vtysh: Remove double retrieve of env VTYSH_HISTFILE
The code is double checking the VTYSH_HISTFILE env variable,
additionally clang-15 when running SA over it doesn't fully
understand the code pattern. Reduce the double check to
one check to reduce program run-time (ha!) and make SA happy.
liuze03 [Wed, 7 Dec 2022 04:22:43 +0000 (12:22 +0800)]
bgpd: Fix aggregated routes are withdrawn abnormally.
The withdraw message and announcement message of a prefix are received continuously within 50ms, which may lead to abnormal aggregation route reference count.
Steps to reproduce:
--------------------------
step1:
local config aggregate route 111.0.0.0/24
received route:111.0.0.1/32 111.0.0.02/32
ref_count:2
step2:
peer withdraw 111.0.0.1/32 and network 111.0.0.1/32 in 50ms
received route:111.0.0.1/32 111.0.0.02/32
ref_count:1
step3:
peer withdraw 111.0.0.1/32
received route:111.0.0.02/32
ref_count:0
aggregate route will be withdrawn abnormally
Donald Sharp [Thu, 22 Dec 2022 00:26:58 +0000 (19:26 -0500)]
bgpd: bgp_connected_add memory was being leaked in some cases
On shutdown, bgp calls an unlock for the bnc connected table,
via the bgp_connected_cleanup function. This function is
only ever called on shutdown, so we know that bgp is going
away. The refcount for the connected data can be more than
1. Let's not worry about the refcount on shutdown and
just delete the nodes instead of leaving them around.
Donald Sharp [Wed, 21 Dec 2022 17:11:56 +0000 (12:11 -0500)]
bgpd: static routes are leaked on shutdown
Shutdown of bgp results in both the bgp_path_info,
bgp_dest and bgp_table's not being freed because
the bgp_path_info remains locked.
Effectively static routes are scheduled for deletion but bgp_process
skips the work because the work queue sees that the bgp router
is marked for deletion. Effectively not doing any work and leaving
data on the floor.
Modify the code when attempting to put into the work queue to
notice and not do so but just unlock the path info.
This is effectively the same as what goes on for normal peering
as that it checks for shutdown and just calls bgp_path_info_free
too.
Donatas Abraitis [Thu, 22 Dec 2022 15:55:40 +0000 (17:55 +0200)]
tools: Ignore agentx command for frr-reload.py
agentx can't be disabled once enabled, so we should ignore it for frr-reload.py.
```
$ /usr/lib/frr/frr-reload.py --reload /etc/frr/bgpd.conf --bindir /usr/local/bin
"no agentx" we failed to remove this command
SNMP AgentX support cannot be disabled once enabled
```
Ryoga Saito [Tue, 6 Dec 2022 17:27:16 +0000 (02:27 +0900)]
tests: Add topotest bgp_vrf_leaking_5549_routes
To verify previous changes, this PR introduces topotest to verify
whether imported routes learnt from BGP unnumbered peers will be active
on VPN RIB and other VRF RIB.
Ryoga Saito [Sun, 4 Dec 2022 14:51:45 +0000 (23:51 +0900)]
bgpd: Stop overriding nexthop when BGP unnumberred
When we use vrf-to-vrf export, the nexthop has already not been
overridden when the peer is BGP unnumberred. However, when we use normal
export, the nexthop will be oberridden. This behavior will make the VPN
routes invalid in VPN RIB.
This PR stops overriding nexthop even if we use normal export.
rgirada [Tue, 22 Nov 2022 09:59:40 +0000 (09:59 +0000)]
ospfd: Fixing memleak.
Description:
As part of signal handler ospf_finish_final(), lsas are originated
and added to refresh queues are not freed.
One such leak is :
==2869285== 432 (40 direct, 392 indirect) bytes in 1 blocks are definitely lost in loss record 159 of 221
==2869285== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2869285== by 0x4910EC3: qcalloc (memory.c:116)
==2869285== by 0x199024: ospf_refresher_register_lsa (ospf_lsa.c:4017)
==2869285== by 0x199024: ospf_refresher_register_lsa (ospf_lsa.c:3979)
==2869285== by 0x19A37F: ospf_network_lsa_install (ospf_lsa.c:2680)
==2869285== by 0x19A37F: ospf_lsa_install (ospf_lsa.c:2941)
==2869285== by 0x19C18F: ospf_network_lsa_update (ospf_lsa.c:1099)
==2869285== by 0x1931ED: ism_change_state (ospf_ism.c:556)
==2869285== by 0x1931ED: ospf_ism_event (ospf_ism.c:596)
==2869285== by 0x494E0B0: thread_call (thread.c:2006)
==2869285== by 0x494E395: _thread_execute (thread.c:2098)
==2869285== by 0x19FBC6: nsm_change_state (ospf_nsm.c:695)
==2869285== by 0x19FBC6: ospf_nsm_event (ospf_nsm.c:861)
==2869285== by 0x494E0B0: thread_call (thread.c:2006)
==2869285== by 0x494E395: _thread_execute (thread.c:2098)
==2869285== by 0x19020B: ospf_if_cleanup (ospf_interface.c:322)
==2869285== by 0x192D0C: ism_interface_down (ospf_ism.c:393)
==2869285== by 0x193028: ospf_ism_event (ospf_ism.c:584)
==2869285== by 0x494E0B0: thread_call (thread.c:2006)
==2869285== by 0x494E395: _thread_execute (thread.c:2098)
==2869285== by 0x190F10: ospf_if_down (ospf_interface.c:851)
==2869285== by 0x1911D6: ospf_if_free (ospf_interface.c:341)
==2869285== by 0x1E6E98: ospf_finish_final (ospfd.c:748)
==2869285== by 0x1E6E98: ospf_deferred_shutdown_finish (ospfd.c:578)
==2869285== by 0x1E7727: ospf_finish (ospfd.c:682)
==2869285== by 0x1E7727: ospf_terminate (ospfd.c:652)
==2869285== by 0x18852B: sigint (ospf_main.c:105)
==2869285== by 0x493BE12: frr_sigevent_process (sigevent.c:130)
==2869285== by 0x494DCD4: thread_fetch (thread.c:1775)
==2869285== by 0x4905022: frr_run (libfrr.c:1197)
==2869285== by 0x187891: main (ospf_main.c:235)
Added a fix to cleanup all these queue pointers and corresponing lsas in it.
Rafael Zalamena [Thu, 24 Nov 2022 14:16:18 +0000 (11:16 -0300)]
bfdd: fix IPv4 socket source selection
The imported BFD code had some logic to ignore the source address when
using single hop IPv4. The BFD peer socket function should allow the
source to be selected so we can:
1. Select the source address in the outgoing packets
2. Only receive packets from that specific source
Renato Westphal [Thu, 24 Nov 2022 01:14:51 +0000 (22:14 -0300)]
ospf6d: fix infinite loop when adding ASBR route
Commit 8f359e1593c414322 removed a check that prevented the same route
from being added twice. In certain topologies, that change resulted in
the following infinite loop when adding an ASBR route:
ospf6_route_add
ospf6_top_brouter_hook_add
ospf6_abr_examin_brouter
ospf6_abr_examin_summary
ospf6_route_add
(repeat until stack overflow)
Revert the offending commit and update `ospf6_route_is_identical()` to
not do comparison using `memcmp()`.
Donald Sharp [Sat, 26 Nov 2022 14:23:50 +0000 (09:23 -0500)]
lib: Do not log `echo PING` commands from watchfrr
Since the `echo PING` commands are from watchfrr and are sent
a whole bunch when an operator has `log commands` on the amount
of logging done is quite significant.
ospfd: Fix RFC conformance test cases 25.19 and 27.6
Steps to reproduce:
--------------------------
1. ANVL: Establish full adjacency with DUT for neighbor Rtr-0-A on DIface-0 with DUT as DR.
2. ANVL: Listen (for up to 2 * <RxmtInterval> seconds) on DIface-0.
3. DUT: Send <OSPF-LSU> packet.
4. ANVL: Verify that the received <OSPF-LSU> packet contains a Network- LSA for network N1
originated by DUT, and the LS Sequence Number is set to <InitialSequenceNumber>.
5. ANVL: Establish full adjacency with DUT for neighbor Rtr-0-B on DIface-0 with DUT as DR.
6. ANVL: Listen (for up to 2 * <RxmtInterval> seconds) on DIface-0.
7. DUT: Send <OSPF-LSU> packet.
8. ANVL: Verify that the received <OSPF-LSU> packet contains a new instance of the
Network-LSA for network N1 originated by DUT, and the LS Sequence Number
is set to (<InitialSequenceNumber> + 1).
Both the test cases were failing while verifying the initial sequence number for network LSA.
This is because currently OSPF does not reset its LSA sequence number when it is going down.
Stephen Worley [Tue, 22 Nov 2022 22:18:02 +0000 (17:18 -0500)]
lib: disable vrf before terminating interfaces
We must disable the vrf before we start terminating interfaces.
On termination, we free the 'zebra_if' struct from the interface ->info
pointer. We rely on that for subsystems like vxlan for cleanup when
shutting down.
'''
==497406== Invalid read of size 8
==497406== at 0x47E70A: zebra_evpn_del (zebra_evpn.c:1103)
==497406== by 0x47F004: zebra_evpn_cleanup_all (zebra_evpn.c:1363)
==497406== by 0x4F2404: zebra_evpn_vxlan_cleanup_all (zebra_vxlan.c:1158)
==497406== by 0x4917041: hash_iterate (hash.c:267)
==497406== by 0x4F25E2: zebra_vxlan_cleanup_tables (zebra_vxlan.c:5676)
==497406== by 0x4D52EC: zebra_vrf_disable (zebra_vrf.c:209)
==497406== by 0x49A247F: vrf_disable (vrf.c:340)
==497406== by 0x49A2521: vrf_delete (vrf.c:245)
==497406== by 0x49A2E2B: vrf_terminate_single (vrf.c:533)
==497406== by 0x49A2D8F: vrf_terminate (vrf.c:561)
==497406== by 0x441240: sigint (main.c:192)
==497406== by 0x4981F6D: frr_sigevent_process (sigevent.c:130)
==497406== Address 0x6d68c68 is 200 bytes inside a block of size 272 free'd
==497406== at 0x48470E4: free (vg_replace_malloc.c:872)
==497406== by 0x4942CF0: qfree (memory.c:141)
==497406== by 0x49196A9: if_delete (if.c:293)
==497406== by 0x491C54C: if_terminate (if.c:1031)
==497406== by 0x49A2E22: vrf_terminate_single (vrf.c:532)
==497406== by 0x49A2D8F: vrf_terminate (vrf.c:561)
==497406== by 0x441240: sigint (main.c:192)
==497406== by 0x4981F6D: frr_sigevent_process (sigevent.c:130)
==497406== by 0x499A5F0: thread_fetch (thread.c:1775)
==497406== by 0x492850E: frr_run (libfrr.c:1197)
==497406== by 0x441746: main (main.c:476)
==497406== Block was alloc'd at
==497406== at 0x4849464: calloc (vg_replace_malloc.c:1328)
==497406== by 0x49429A5: qcalloc (memory.c:116)
==497406== by 0x491D971: if_new (if.c:174)
==497406== by 0x491ACC8: if_create_name (if.c:228)
==497406== by 0x491ABEB: if_get_by_name (if.c:613)
==497406== by 0x427052: netlink_interface (if_netlink.c:1178)
==497406== by 0x43BC18: netlink_parse_info (kernel_netlink.c:1188)
==497406== by 0x4266D7: interface_lookup_netlink (if_netlink.c:1288)
==497406== by 0x42B634: interface_list (if_netlink.c:2368)
==497406== by 0x4ABF83: zebra_ns_enable (zebra_ns.c:127)
==497406== by 0x4AC17E: zebra_ns_init (zebra_ns.c:216)
==497406== by 0x44166C: main (main.c:408)
'''
David Lamparter [Fri, 4 Nov 2022 15:55:55 +0000 (16:55 +0100)]
pimd: consistently ignore prefix list mask len
... the prefix length wasn't ignored as expected. Both S and G are
always /32. But expecting "le 32" in prefix-list config is unexpected &
counterintuitive.
- zebra: reuse netinet/if_ether.h to avoid redefinition of struct ethhdr
- zebra: fix build without AF_MPLS
- ospfd: get route-map name for default-information originate
- ospfd: allow unnumbered and numbered addresses to co-exist better
- ospfd: prevent from crashing when processing external lsa
- bgpd: fix "storing the address of local variable"
- bgpd: rpki was decrementing the node lock one time too many
- bgpd: do not send Deconfig/Shutdown message when restarting
- pimd: convert zlog_warn to debug
Ryoga Saito [Sat, 12 Nov 2022 08:45:19 +0000 (17:45 +0900)]
bgpd: fix invalid ipv4-vpn nexthop for IPv6 peer
Given that two routers are connected each other and they have IPv6
addresses and they establish BGP peer with extended-nexthop capability
and one router tries to advertise locally-generated IPv4-VPN routes to
other router.
In this situation, bgpd on the router that tries to advertise IPv4-VPN
routes will be crashed with "invalid MP nexthop length (AFI IP6)".
This issue is happened because MP_REACH_NLRI path attribute is not
generated correctly when ipv4-vpn routes are advertised to IPv6 peer.
When IPv4 routes are leaked from VRF RIB, the nexthop of these routes
are also IPv4 address (0.0.0.0/0 or specific addresses). However,
bgp_packet_mpattr_start only covers the case of IPv6 nexthop (for IPv6
peer).
ipv4-unicast routes were not affected by this issue because the case of
IPv4 nexthop is covered in `else` block.