This a convenience release/tag for house keeping. We currently don't plan to publish
binary packages with this release.
Changelog:
babeld
Fix filtering against all interfaces
bgpd
Avoid double aspath_dup() for confederation when remote-as != as_specified
Convert evpn output to not pretty print json
Evpn route detail json display non prett
Fix deterministic-med check for stale paths
Fix use-after-free crash for evpn
Flowspec overflow issue
Free rfapi callback object always
Increment version number even when no data is sent
Intern default-originate attributes to avoid use-after-free
Pass global asn for confederation peers if not as_specified
Prevent multipathing among evpn and non-evpn paths
lib, bgpd
Add ability to specify that some json output should not be pretty
lib
Fix non-use of option
Free dnode before returning if yang translator model is already loaded
Release memory of yang translation module on error
ospfclient
Bugfix and no implicit register
Remove register "ready" requirement
ospfd
Compare prefix values in host order
Inc. opaque data in `show opaque info detail json` output
Small bugfix for miscounting a[s]brs
pbrd
Fix large tableids displayed as negative
pceplib
Add <time.h> include for time_t
pimd
In_multicast needs host order
Make logs useful for input drops
Try to reinstall mfc when we get nocache
V6 intentionally turned off route-maps. turn it on.
tests
Check if bgp confederation works with as_external (!as_specified)
Improve the ospfapi test (move to square topology)
tools
Fix missing pbrd in rsyslog.d 45-frr.conf file
Frr-reload fix list value not present
vrrpd
Give null when using null ifp to lookup vr
zebrad
Add debug for bridge capability
Fix other table inactive when ip import-table is on
Chirag Shah [Wed, 15 Mar 2023 04:32:40 +0000 (21:32 -0700)]
tools: frr-reload fix list value not present
Check for value present in list before removing
as in certain python3 ValueError traceback is observed.
Traceback (most recent call last):
File "/usr/lib/frr/frr-reload.py",
line 2278, in <module>
(lines_to_add, lines_to_del, restart_frr)
= compare_context_objects(newconf, running)
File "/usr/lib/frr/frr-reload.py",
line 1933, in compare_context_objects
lines_to_add, lines_to_del
File "/usr/lib/frr/frr-reload.py",
line 1549, in ignore_delete_re_add_lines
lines_to_del.remove((ctx_keys, line))
ValueError: list.remove(x): x not in list
Testing Done:
With fix perform frr-relaod on frr.conf config where earlier
traceback was seen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com> Signed-off-by: Chirag Shah <chirag@nvidia.com>
(cherry picked from commit 1543f58b5541c0ddb5e53bb7994136dcb5f836cb)
Donald Sharp [Sun, 12 Mar 2023 00:37:21 +0000 (19:37 -0500)]
pimd: IN_MULTICAST needs host order
New correct behavior:
eva# conf
eva(config)# ip pim rp 192.168.1.224 224.0.0.0/24
No Path to RP address specified: 192.168.1.224
eva(config)# ip pim rp 224.1.2.3 224.0.0.0/24
% Bad RP address specified: 224.1.2.3
eva(config)#
Donald Sharp [Sat, 11 Mar 2023 17:05:44 +0000 (12:05 -0500)]
bgpd: Increment version number even when no data is sent
When an update group decides to not send a prefix
announcement because it has not changed, still increment
the version number. Why? To allow for the situation
where you have say 2 peers in 1 peer group and shortly
after they come up a 3rd peer comes up. It will be
placed into a separate update group and could be
coalesced down, when it finishes updating all data
to it. Now imagine that a single prefix changes at
this point in time as well. Then first 2 peers may
decide to not send the data, since nothing has changed.
While the 3rd peer will and since the versions numbers
never match they will never coalesce. So when the decision
is made to skip, update the version number as well.
Donald Sharp [Thu, 23 Feb 2023 18:29:32 +0000 (13:29 -0500)]
bgpd: Flowspec overflow issue
According to the flowspec RFC 8955 a flowspec nlri is <length, <nlri data>>
Specifying 0 as a length makes BGP get all warm on the inside. Which
in this case is not a good thing at all. Prevent warmth, stay cold
on the inside.
Reported-by: Iggy Frankovic <iggyfran@amazon.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 0b999c886e241c52bd1f7ef0066700e4b618ebb3)
David Lamparter [Wed, 1 Jun 2022 07:54:31 +0000 (09:54 +0200)]
pimd: try to reinstall MFC when we get NOCACHE
Whether due to a pimd bug, some expiry, or someone just deleting MFC
entries, when we're in NOCACHE we *know* there's no MFC entry. Add an
install call to make sure pimd's MFC view aligns with the actual kernel
MFC.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Thu, 10 Mar 2022 12:59:26 +0000 (13:59 +0100)]
pimd: make logs useful for input drops
This path here is pretty far on top of the list of issues that operators
will run into and have to debug when setting up PIM. Make the log
messages actually tell what's going on. Also escalate some from
`debug mroute detail` to `debug mroute`.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donatas Abraitis [Tue, 21 Feb 2023 21:10:45 +0000 (23:10 +0200)]
bgpd: Pass global ASN for confederation peers if not AS_SPECIFIED
When we specify remote-as as external/internal, we need to set local_as to
bgp->as, instead of bgp->confed_id. Before this patch, (bgp->as != *as) is
always valid for such a case because *as is always 0.
Also, append peer->local_as as CONFED_SEQ to avoid other side withdrawing
the routes due to confederation own AS received and/or malformed as-path.
zyxwvu Shi [Wed, 15 Feb 2023 15:55:00 +0000 (23:55 +0800)]
zebra: Fix other table inactive when ip import-table is on
In `rib_link`, if is_zebra_import_table_enabled returns
true, `rib_queue_add` will not called, resulting in other
table route node never processed. This actually should not
be dependent on whether the route is imported.
In `rib_delnode`, if is_zebra_import_table_enabled returns
true, it will use `rib_unlink` instead of enqueuing the
route node for process. There is no reason that imported
route nodes should not be reprocessed. Long ago, the
behaviour was dependent on whether the route_entry comes
from a table other than main.
Donald Sharp [Thu, 2 Feb 2023 21:28:27 +0000 (16:28 -0500)]
lib: Fix non-use of option
Commit d7c6467ba2f55d1055babbb7fe82716ca3efdc7e added the
ability to specify non pretty printing but unfortunately
forgot to use the option variable to make the whole
thing work.
vivek [Fri, 18 Dec 2020 18:55:40 +0000 (10:55 -0800)]
bgpd: Prevent multipathing among EVPN and non-EVPN paths
Ensure that a multipath set is fully comprised of EVPN paths (i.e.,
paths imported into the VRF from EVPN address-family) or non-EVPN
paths. This is actually a condition that existed already in the code
but was not properly enforced.
This change, as a side effect, eliminates the known trigger condition
for bad or missing RMAC programming in an EVPN deployment, described
in tickets CM-29043 and CM-31222. Routes (actually, paths) in a VRF
routing table that require VXLAN tunneling to the next hop currently
need some special handling in zebra to deal with the nexthop (neigh)
and RMAC programming, and this is implemented for the entire route
(prefix), not per-path. This can lead to the bad or missing RMAC
situation, which is now eliminated by ensuring all paths in the route
are 'similar'.
The longer-term solution in CL 5.x will be to deal with the special
programming by means of explicit communication between bgpd and zebra.
This is already implemented for EVPN-MH via CM-31398. These changes
will be extended to non-MH also and the special code in zebra removed
or refined.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com> Acked-by: Trey Aspelund <taspelund@nvidia.com> Acked-by: Anuradha Karuppiah <anuradhak@nvidia.com> Acked-by: Chirag Shah <chirag@nvidia.com>
Ticket: CM-29043
Testing Done:
1. Manual testing
2. precommit on both MLX and BCM platforms
3. evpn-smoke - BCM and VX
vivek [Thu, 3 Dec 2020 04:04:19 +0000 (20:04 -0800)]
bgpd: Fix deterministic-med check for stale paths
When performing deterministic MED processing, ensure that the peer
status is not checked when we encounter a stale path. Otherwise, this
path will be skipped from the DMED consideration leading to it potentially
not being installed.
Test scenario: Consider a prefix with 2 (multi)paths. The peer that
announces the path with the winning DMED undergoes a graceful-restart.
Before it comes back up, the other path goes away. Prior to the fix, a
third router that receives both these paths would have ended up not
having any path installed to the prefix after the above events.
bgpd: Intern default-originate attributes to avoid use-after-free
When we receive a default route from a peer and we originate default route
using `neighbor default-originate`, we do not track of struct attr we use,
and when we do `no neighbor default-originate` we withdraw our generated
default route, but we announce default-route from the peer.
After we do this, we unintern aspath (which was used for default-originate),
BUT it was used also for peer's default route we received.
And here we have a use-after-free crash, because bgp_process_main_one()
reaps old paths that are marked as BGP_PATH_REMOVED with aspath->refcnt > 0,
but here it's 0.
```
0 0x55c24bbcd022 in aspath_key_make bgpd/bgp_aspath.c:2070
1 0x55c24b8f1140 in attrhash_key_make bgpd/bgp_attr.c:777
2 0x7f52322e66c9 in hash_release lib/hash.c:220
3 0x55c24b8f6017 in bgp_attr_unintern bgpd/bgp_attr.c:1271
4 0x55c24ba0acaa in bgp_path_info_free_with_caller bgpd/bgp_route.c:283
5 0x55c24ba0a7de in bgp_path_info_unlock bgpd/bgp_route.c:309
6 0x55c24ba0af6d in bgp_path_info_reap bgpd/bgp_route.c:426
7 0x55c24ba17b9a in bgp_process_main_one bgpd/bgp_route.c:3333
8 0x55c24ba18a1d in bgp_process_wq bgpd/bgp_route.c:3425
9 0x7f52323c2cd5 in work_queue_run lib/workqueue.c:282
10 0x7f52323aab92 in thread_call lib/thread.c:2006
11 0x7f5232300dc7 in frr_run lib/libfrr.c:1198
12 0x55c24b8ea792 in main bgpd/bgp_main.c:520
13 0x7f5231c3a082 in __libc_start_main ../csu/libc-start.c:308
14 0x55c24b8ef0bd in _start (/usr/lib/frr/bgpd+0x2c90bd)
```
anlan_cs [Mon, 6 Feb 2023 01:27:05 +0000 (09:27 +0800)]
bgpd: fix use-after-free crash for evpn
```
anlan(config-router-af)# vni 33
anlan(config-router-af-vni)# route-target both 44:55
anlan(config-router-af-vni)# no route-target both 44:55
vtysh: error reading from bgpd: Resource temporarily unavailable (11)Warning: closing connection to bgpd because of an I/O error!
```
When `bgp_evpn_vni_rt_cmd` deals with "both" type, it wrongly created
only one node ( should be two nodes ) for lists of both `vpn->import_rtl` and
`vpn->export_rtl`. At this time, the two lists are already wrong.
In `no route-target both RT`, it will free the single node from lists of both
`vpn->import_rtl` and `vpn->export_rtl`. After freed from `vpn->import_rtl`,
it is "use-after-free" at the time of freeing it from `vpn->export_rtl`.
It causes crash sometimes, or other unexpected behaviours.
This issue is introduced by commit `3b7e8d`, which have adjusted both
`bgp_evpn_vni_rt_cmd` and `bgp_evpn_vrf_rt_cmd`.
Since `bgp_evpn_vrf_rt_cmd/no_bgp_evpn_vrf_rt_cmd` works well again
unintentionally with commit `7022da`, only `bgp_evpn_vni_rt_cmd` needs to
modify - add two nodes for "both" type and some explicit comments for this
special case of "both" type.
Donald Sharp [Thu, 2 Feb 2023 15:40:07 +0000 (10:40 -0500)]
bgpd: Convert evpn output to not pretty print json
Commit: 3cdb03fba7b40240fb38469a12b7b05a11043e09
changed the vty_json output to not be pretty printing.
The previous commit in the tree added vty_json_no_pretty
let's use that instead
Donald Sharp [Thu, 2 Feb 2023 15:28:19 +0000 (10:28 -0500)]
lib, bgpd: Add ability to specify that some json output should not be pretty
Initial commit: 23b2a7ef524c9fe083b217c7f6ebaec0effc8f52
changed the json output of `show bgp <afi> <safi> json` to
not have pretty print because when under a situation where
there are a bunch of routes with a large scale ecmp show
output was taking forever and this commit cut 2 minutes out
of vtysh run time.
When upgrading to latest version the long run time was noticed
due to testing. Let's add back this functionality such that
FRR can have reduced run times with vtysh when it's really
needed.
Chirag Shah [Tue, 24 Jan 2023 06:18:24 +0000 (22:18 -0800)]
bgpd: evpn route detail json display non prett
For BGP evpn route table detail json to use
non pretty form of display.
Problem:
In scaled evpn route table detail json dump
occupies high resources (CPU + memory) of the system.
In high scale evpn route dump using pretty form
hogs CPU for a while which can trigger watchfrr
to kill bgpd.
Solution:
Avoid pretty JSON print for detail version dump
- bfdd: fix ipv4 socket source selection
- bgpd : fix crash for `set ipv4/ipv6 vpn next-hop` command
- bgpd: stop overriding nexthop when bgp unnumbered
- bgpd: fix aggregated routes are withdrawn abnormally
- bgpd: fix a few memory leaks
- build: enable pim6d by default
- build: fix sed regex in lua macro
- doc : add freebsd 13 build docs
- isisd: fix memory leak
- lib: disable vrf before terminating interfaces
- lib: do not log `echo ping` commands from watchfrr
- ospf6d: fix infinite loop when adding asbr route
- ospfd: fix rfc conformance test cases 25.19 and 27.6
- ospfd: fix typo and report the P2P link name in the warning
- ospfd: report the router IP with opaque capability mismatch
- ospfd: fixing memory leak
- pimd: consistently ignore prefix list mask len
- staticd: do not crash when modifying an existing static route with color
- zebra: free all memory associated ctx->u.iptable.interface_name_list
- zebra: fix tracepoint changes for lttng
- zebra: free up route map name memory on vrf deletion event
- zebra: use `mpls enable`, not `mpls` when generating a config
- tools: Ignore agentx command for frr-reload.py
David Lamparter [Fri, 6 Jan 2023 15:59:55 +0000 (16:59 +0100)]
lib: disable xref ELF note on mips64el
mips64el does not have a 64-bit PC-relative relocation, which is needed
to emit the ELF note for xrefs. Disabling the ELF note means clippy
takes the fallback path using section headers, so everything does still
work (... unless you strip the section headers.)
David Lamparter [Fri, 6 Jan 2023 15:55:38 +0000 (16:55 +0100)]
zebra: do not load/store wider-than-ptr atomics
Most 32-bit architectures cannot do atomic loads and stores of data
wider than their pointer size, i.e. 32 bit. Funnily enough they
generally *can* do a CAS2, i.e., 64-bit compare-and-swap, but while a
CAS can emulate atomic add/bitops, loads and stores aren't available.
Replace with a mutex; since this is 99% used from the zserv thread, the
mutex should take the local-to-thread fast path anyway. And while one
atomic might be faster than a mutex lock/unlock, we're doing several
here, and at some point a mutex wins on speed anyway.
This fixes build on armel, mipsel, m68k, powerpc, and sh4.
Donald Sharp [Wed, 21 Dec 2022 13:04:34 +0000 (08:04 -0500)]
vtysh: Remove double retrieve of env VTYSH_HISTFILE
The code is double checking the VTYSH_HISTFILE env variable,
additionally clang-15 when running SA over it doesn't fully
understand the code pattern. Reduce the double check to
one check to reduce program run-time (ha!) and make SA happy.
liuze03 [Wed, 7 Dec 2022 04:22:43 +0000 (12:22 +0800)]
bgpd: Fix aggregated routes are withdrawn abnormally.
The withdraw message and announcement message of a prefix are received continuously within 50ms, which may lead to abnormal aggregation route reference count.
Steps to reproduce:
--------------------------
step1:
local config aggregate route 111.0.0.0/24
received route:111.0.0.1/32 111.0.0.02/32
ref_count:2
step2:
peer withdraw 111.0.0.1/32 and network 111.0.0.1/32 in 50ms
received route:111.0.0.1/32 111.0.0.02/32
ref_count:1
step3:
peer withdraw 111.0.0.1/32
received route:111.0.0.02/32
ref_count:0
aggregate route will be withdrawn abnormally
Donald Sharp [Thu, 22 Dec 2022 00:26:58 +0000 (19:26 -0500)]
bgpd: bgp_connected_add memory was being leaked in some cases
On shutdown, bgp calls an unlock for the bnc connected table,
via the bgp_connected_cleanup function. This function is
only ever called on shutdown, so we know that bgp is going
away. The refcount for the connected data can be more than
1. Let's not worry about the refcount on shutdown and
just delete the nodes instead of leaving them around.
Donald Sharp [Wed, 21 Dec 2022 17:11:56 +0000 (12:11 -0500)]
bgpd: static routes are leaked on shutdown
Shutdown of bgp results in both the bgp_path_info,
bgp_dest and bgp_table's not being freed because
the bgp_path_info remains locked.
Effectively static routes are scheduled for deletion but bgp_process
skips the work because the work queue sees that the bgp router
is marked for deletion. Effectively not doing any work and leaving
data on the floor.
Modify the code when attempting to put into the work queue to
notice and not do so but just unlock the path info.
This is effectively the same as what goes on for normal peering
as that it checks for shutdown and just calls bgp_path_info_free
too.
Donatas Abraitis [Thu, 22 Dec 2022 15:55:40 +0000 (17:55 +0200)]
tools: Ignore agentx command for frr-reload.py
agentx can't be disabled once enabled, so we should ignore it for frr-reload.py.
```
$ /usr/lib/frr/frr-reload.py --reload /etc/frr/bgpd.conf --bindir /usr/local/bin
"no agentx" we failed to remove this command
SNMP AgentX support cannot be disabled once enabled
```
Ryoga Saito [Tue, 6 Dec 2022 17:27:16 +0000 (02:27 +0900)]
tests: Add topotest bgp_vrf_leaking_5549_routes
To verify previous changes, this PR introduces topotest to verify
whether imported routes learnt from BGP unnumbered peers will be active
on VPN RIB and other VRF RIB.
Ryoga Saito [Sun, 4 Dec 2022 14:51:45 +0000 (23:51 +0900)]
bgpd: Stop overriding nexthop when BGP unnumberred
When we use vrf-to-vrf export, the nexthop has already not been
overridden when the peer is BGP unnumberred. However, when we use normal
export, the nexthop will be oberridden. This behavior will make the VPN
routes invalid in VPN RIB.
This PR stops overriding nexthop even if we use normal export.
rgirada [Tue, 22 Nov 2022 09:59:40 +0000 (09:59 +0000)]
ospfd: Fixing memleak.
Description:
As part of signal handler ospf_finish_final(), lsas are originated
and added to refresh queues are not freed.
One such leak is :
==2869285== 432 (40 direct, 392 indirect) bytes in 1 blocks are definitely lost in loss record 159 of 221
==2869285== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2869285== by 0x4910EC3: qcalloc (memory.c:116)
==2869285== by 0x199024: ospf_refresher_register_lsa (ospf_lsa.c:4017)
==2869285== by 0x199024: ospf_refresher_register_lsa (ospf_lsa.c:3979)
==2869285== by 0x19A37F: ospf_network_lsa_install (ospf_lsa.c:2680)
==2869285== by 0x19A37F: ospf_lsa_install (ospf_lsa.c:2941)
==2869285== by 0x19C18F: ospf_network_lsa_update (ospf_lsa.c:1099)
==2869285== by 0x1931ED: ism_change_state (ospf_ism.c:556)
==2869285== by 0x1931ED: ospf_ism_event (ospf_ism.c:596)
==2869285== by 0x494E0B0: thread_call (thread.c:2006)
==2869285== by 0x494E395: _thread_execute (thread.c:2098)
==2869285== by 0x19FBC6: nsm_change_state (ospf_nsm.c:695)
==2869285== by 0x19FBC6: ospf_nsm_event (ospf_nsm.c:861)
==2869285== by 0x494E0B0: thread_call (thread.c:2006)
==2869285== by 0x494E395: _thread_execute (thread.c:2098)
==2869285== by 0x19020B: ospf_if_cleanup (ospf_interface.c:322)
==2869285== by 0x192D0C: ism_interface_down (ospf_ism.c:393)
==2869285== by 0x193028: ospf_ism_event (ospf_ism.c:584)
==2869285== by 0x494E0B0: thread_call (thread.c:2006)
==2869285== by 0x494E395: _thread_execute (thread.c:2098)
==2869285== by 0x190F10: ospf_if_down (ospf_interface.c:851)
==2869285== by 0x1911D6: ospf_if_free (ospf_interface.c:341)
==2869285== by 0x1E6E98: ospf_finish_final (ospfd.c:748)
==2869285== by 0x1E6E98: ospf_deferred_shutdown_finish (ospfd.c:578)
==2869285== by 0x1E7727: ospf_finish (ospfd.c:682)
==2869285== by 0x1E7727: ospf_terminate (ospfd.c:652)
==2869285== by 0x18852B: sigint (ospf_main.c:105)
==2869285== by 0x493BE12: frr_sigevent_process (sigevent.c:130)
==2869285== by 0x494DCD4: thread_fetch (thread.c:1775)
==2869285== by 0x4905022: frr_run (libfrr.c:1197)
==2869285== by 0x187891: main (ospf_main.c:235)
Added a fix to cleanup all these queue pointers and corresponing lsas in it.
Rafael Zalamena [Thu, 24 Nov 2022 14:16:18 +0000 (11:16 -0300)]
bfdd: fix IPv4 socket source selection
The imported BFD code had some logic to ignore the source address when
using single hop IPv4. The BFD peer socket function should allow the
source to be selected so we can:
1. Select the source address in the outgoing packets
2. Only receive packets from that specific source