Donald Sharp [Sat, 12 Mar 2022 16:05:23 +0000 (11:05 -0500)]
zebra: prefixlen is not afi/safi dependant in encoding nexthops
When encoding a response to the upper level protocol the
prefixlen is not something that needs to be part of the
switch statement for handling of a prefix.
Donald Sharp [Sat, 12 Mar 2022 15:47:16 +0000 (10:47 -0500)]
*: When matching against a nexthop send and process what it matched against
Currently the nexthop tracking code is only sending to the requestor
what it was requested to match against. When the nexthop tracking
code was simplified to not need an import check and a nexthop check
in b8210849b8ac1abe2d5d9a5ab2459abfde65efa5 for bgpd. It was not
noticed that a longer prefix could match but it would be seen
as a match because FRR was not sending up both the resolved
route prefix and the route FRR was asked to match against.
This code change causes the nexthop tracking code to pass
back up the matched requested route (so that the calling
protocol can figure out which one it is being told about )
as well as the actual prefix that was matched to.
Rafael Zalamena [Mon, 21 Feb 2022 11:28:11 +0000 (06:28 -0500)]
lib: tweak northbound gRPC default timeout
Don't let open sockets hang for too long. This will fix an issue where a
improperly coded client (e.g. socat) could exaust the amount of open
file descriptors.
Passing argument "&rec" of type "struct pfx_record *" and argument
"1UL" to function "read" is suspicious because
"sizeof (struct pfx_record) /*40*/" is expected.
The FRRouting community would like to announce FRR Release 8.2.
This release consists of just over 800 commits from 62 authors.
Selected features and bug fixes are listed below.
babeld:
Fix the checks for truncated packets
bfdd:
Correct one spelling error of comment
Fix detection timeout update
Fix possibly wrong counter of control packets
bgpd:
Add "json" option to a few more show commands
Add 'show bgp <afi> <safi> json detail' header data
Add a 6 hour warning to missing policy
Add an ability to match ipv6 next-hop by prefix-list
Add autocomplete for access-list under bmp node
Add autocomplete for as-path filters
Add autocomplete for set/match community/large/ext lists
Add long-lived graceful restart capability
Add peer-groups to neighbor autocomplete
Adjust symbolic names for cease notifications according to rfc4486
Deprecate dpa, advertiser and rcid_path path attributes
Extended bgp administrative shutdown communication
Fix crash when using "show bgp vrf all"
Fix inconsistency of match ip/ipv6 next-hop commands
Fix missing name of default vrf
Handle TCP connection errors with connection callbacks for RPKI
Implement llgr helper mode
Implement rfc9072
Support redirect import more than one route-target ipv6
docker:
Update alpine build enable set own version
isisd:
Add link state traffic engineering support
Fix router capability tlv parsing issues
Fix running-config for fast-reroute
Make isis work with default vrf name different than 'default'
ospf6d:
Add missing vrf parameter to "clear ipv6 ospf6 interface"
Add prompt for commands with non-exist vrf
Add support for nssa type-7 address ranges
Add the ability of specifying router-id/area-id in no debug ospf6
Do not originate type-4 lsa when nssa
Do not send type-5 into stub area
Fix ecmp inter-area route nexthop update
Fix memory leak for `show ipv6 ospf6 zebra json`
ospfd:
Fix wrong comparison of routemap name
Fix crash on "ospf send-extra-data zebra"
Fix incorrect detection of topology changes in helper mode
Fix loss of mixed form in "range" command
Fix no-form of "graceful-restart" command
Fix summary-address deletion
Fix wrong parsing of te subtlv
pbrd:
Add vlan actions to vty
Pbr route maps get addr family of nhgs
Protect from a possible null dereference
pimd:
Do not allow 224.0.0.0/24 range in igmp join
Fix igmp user config
Fix msdp mesh grp with wildcard member addr
Fix stale forwarding entries left around after join goes away
Fix FRR drops IGMP packets for TOS value other than 0XC0
redhat:
Check if frr.conf already exists
Logrotate file has typo for staticd
ripd:
Fix packet send for non primary addresses
vtysh:
Add missing rpki node when showing config
Improve startup time by ca. ×6
remove `address-family evpn`
watchfrr:
Allow an integrated config to work within a namespace
zebra:
Add optional nhg id output to `show ip ro`
Add resolver flag for nexthop in json
Add support for json output in srv6 locator detail command
Don't lose next hop weights while exporting via fpm
Fix buffer overflow
Fix netns deletion
Fix route-map application when when using vrfs
* Contributors
Abhishek Naik
Adriano Marto Reis
Ahmad Caracalli
anlan_cs
Anuradha Karuppiah
ARShreenidhi
Baptiste Jonglez
Chirag Shah
Christian Hopps
ckishimo
David Lamparter
David Schweizer
Donald Lee
Donald Sharp
Donatas Abraitis
Eli Baum
ewlumpkin
Fabrice Fontaine
Fredi Raspall
github login name
Hiroki Shirokura
Igor Ryzhov
Iqra Siddiqui
Jafar Al-Gharaibeh
Javier Garcia
Jonas Gorski
Juraj Vijtiuk
Kantesh Mundaragi
Karel Van Hecke
kiselev99
LEI BAO
Lou Berger
Louis Scalbert
Manoj Naragund
Mark Stapp
Marlin Cremers
Martin Buck
Martin Winter
Mobashshera Rasool
Olivier Dugeon
Philippe Guibert
Punith Kumar
qingkaishi
Quentin Young
Rafael Zalamena
Renato Westphal
rgirada
ron
Ruslan Babayev
Ryoga Saito
Sai Gomathi
Sarita Patra
Solyn
Stephen Worley
Tomi Salminen
Trey Aspelund
wangshengjun
Xiao Liang
Yamato Sugawara
Yuan Yuan
zyxwvu Shi
pimd: FRR drops IGMP packets for TOS value other than 0XC0
Currently the code is expecting the TOS value for received
packet to be 0xC0 and hence it is discarding packets having
TOS value other than 0xc0.
We need to make sure that we are sending the packet with
TOS 0xC0 and while receiving we can allow any TOS value.
Let's follow Postel's law.
Checked Cisco behavior as well. It also accepts all TOS values.
On FreeBSD I have noticed that subsuquent calls to clock_gettime(..)
can return an after time that is before first calls value.
This in turn is generating CPU_HOG's because the subtraction
is wrapping into very very large numbers:
2022/02/28 20:12:58 SHARP: [PTDQA-70FG5] start: 35.741981000 now: 35.740581000
2022/02/28 20:12:58 SHARP: [XK9YH-ZD8FA][EC 100663313] CPU HOG: task zclient_read (800744240) ran for 0ms (cpu time 18446744073709550ms)
(Please note I added the first line of debug to figure this issue out).
I have been asked to open a FreeBSD bug report and have done so.
In the mean time I think that it is important that FRR does
not generate bogus CPU HOG's on FreeBSD ( especially since
this may or may not be easily fixed and FRR has no control
over what version of the operating system, operators are
going to be running with FRR.
So, add a bit of specialized code that checks to see if
the after time in FreeBSD is before the now time in
thread_consumed_time and do some quick manipulations
to not have this issue.
Donald Sharp [Sun, 27 Feb 2022 19:00:41 +0000 (14:00 -0500)]
zebra: Prevent crash if ZEBRA_ROUTE_ALL is used for a route type
FRR will crash when the re->type is a ZEBRA_ROUTE_ALL and it
is inserted into the meta-queue. Let's just put some basic
code in place to prevent a crash from happening. No routing
protocol should be using ZEBRA_ROUTE_ALL as a value but
bugs do happen. Let's just accept the weird route type
gracefully and move on.
Donald Sharp [Sun, 27 Feb 2022 19:11:13 +0000 (14:11 -0500)]
zebra: Get zebra graceful restart working when restarting on *BSD
Upon restart zebra reads in the kernel state. Under linux
there is a mechanism to read the route and convert the protocol
to the correct internal FRR protocol to allow the zebra graceful
restart efforts to work properly.
Under *BSD I do not see a mechanism to convey the original FRR
protocol into the kernel and thus back out of it. Thus when
zebra crashes ( or restarts ) the routes read back in are kernel
routes and are effectively lost to the system and FRR cannot
remove them properly. Why? Because FRR see's kernel routes
as routes that it should not own and in general the admin
distance for those routes will be a better one than the
admin distance from a routing protocol. This is even
worse because when the graceful restart timer pops and rib_sweep
is run, FRR becomes out of sync with the state of the kernel forwarding
on *BSD.
On restart, notice that the route is a self route that there
is no way to know it's originating protocol. In this case
let's set the protocol to ZEBRA_ROUTE_STATIC and set the admin
distance to 255.
This way when an upper level protocol reinstalls it's route
the general zebra graceful restart code still works. The
high admin distance allows the code to just work in a way
that is graceful( HA! )
The drawback here is that the route shows up as a static
route for the time the system is doing it's work. FRR
could introduce *another* route type but this seems like
a bad idea and the STATIC route type is loosely analagous
to the type of route it has become.
Donald Sharp [Sun, 27 Feb 2022 19:18:09 +0000 (14:18 -0500)]
doc: Update documentation to indicate *BSD struggles
*BSD has some special struggles associated with the graceful
restart code in zebra. Add a bit of documentation to outline
this problem and how it is solved.
Has broken `make check` with recently new compilers:
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): warning: relocation against `zebra_ecmp_count' in read-only section `.text'
CCLD tests/bgpd/test_peer_attr
CCLD tests/bgpd/test_packet
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_capabilities':
/home/sharpd/frr5/staticd/static_zebra.c:208: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_route_add':
/home/sharpd/frr5/staticd/static_zebra.c:418: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): in function `static_nexthop_create':
/home/sharpd/frr5/staticd/static_nb_config.c:174: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: /home/sharpd/frr5/staticd/static_nb_config.c:175: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make: *** [Makefile:8679: tests/lib/test_grpc] Error 1
make: *** Waiting for unfinished jobs....
Essentially the newly introduced variable zebra_ecmp_count is not available in the
libstatic.a compiled and make check has code that compiles against it.
The fix is to just move the variable to the library.
Donald Sharp [Sat, 26 Feb 2022 20:40:15 +0000 (15:40 -0500)]
zebra: Allow *BSD to specify a receive buffer size
End operator is reporting that they are receiving buffer overruns
when attempting to read from the kernel receive socket. It is
possible to adjust this size to more modern levels especially
for when the system is under load. Modify the code base
so that *BSD operators can use the zebra `-s XXX` option
to specify a read buffer.
Additionally setup the default receive buffer size on *BSD
to be 128k instead of the 8k so that FRR does not run into
this issue again.
rgirada [Thu, 24 Feb 2022 17:33:08 +0000 (09:33 -0800)]
ospfd: NULL passed instead of ei pointer in external lsa origination
Description:
NULL pointer wrongly passed instead of 'ei' pointer to
ospf_external_lsa_originate() API in opaque capability enable/disable
which always make it to fail in origination.
Corrected it by passing actual ei pointer.
Donald Sharp [Fri, 18 Feb 2022 15:45:46 +0000 (10:45 -0500)]
bfdd: Fix overflow possibility with time statements
If time ( a uint64_t ) is large enough doing division
and subtraction can still lead to situations where
the resulting number is greater than a uint32_t.
Just use uint32_t as an intermediate storage spot.
This is unlikely to every occur in a time frame
I could possibly care about but makes Coverity happy.
Donald Sharp [Wed, 16 Feb 2022 00:47:23 +0000 (19:47 -0500)]
ripd: Fix packet send for non primary addresses
When rip is configured to work on secondary addresses
on an interface, rip was not properly sending out
the packets on secondary addresses because the source of the
packet was never properly being setup and rip would
send the packet out multiple times for the primary address
not once for each address on the interface that is setup to work.
Donald Sharp [Tue, 15 Feb 2022 20:53:30 +0000 (15:53 -0500)]
bgpd: Convert bgp error codes for cli input to an enum
Conversion of bgp error codes returned for cli input into
an enum and then properly handling all the error cases
in bgp_vty_return.
Because not all error codes returned were properly handled
in this function there existed configuration examples that
were accepted on the cli without an error message but not
saved.
Donald Sharp [Tue, 15 Feb 2022 21:04:50 +0000 (16:04 -0500)]
bgpd: Move some error codes to bgp_vty_return handling
BGP_ERR_PEER_GROUP_MEMBER and BGP_ERR_PEER_GROUP_PEER_TYPE_DIFFERENT
both are not handled by bgp_vty_return, but both can be handled by
this function as that there is nothing special going on here.
Donald Sharp [Tue, 15 Feb 2022 20:54:53 +0000 (15:54 -0500)]
bgpd: Remove impossible invalid state
confederations are checking to see that the bgp pointer
is non-null. But it's impossible to have a null pointer
in the cli and in all paths we have already deref'ed the bgp
pointer. Let's remove that error code as that it is impossible
to happen.
Donald Sharp [Mon, 14 Feb 2022 12:57:45 +0000 (07:57 -0500)]
bgp: Add a 15 minute warning to missing policy
Add a 15 minute warning to the logging system when
bgp policy is not setup properly. Operators keep asking
about the missing policy( on upgrade typically ). Let's
try to give them a bit more of a hint when something is
going wrong as that they are clearly missing the other
various places FRR tells them about it.
Igor Ryzhov [Wed, 9 Feb 2022 23:51:49 +0000 (02:51 +0300)]
tools: fix frr-reload context keywords
There are singline-line commands inside `router bgp` that start with
`vnc ` or `bmp `. Those commands are currently treated as node-entering
commands. We need to specify such commands more precisely.
Igor Ryzhov [Wed, 9 Feb 2022 22:23:41 +0000 (01:23 +0300)]
bgpd: fix aspath memleak on error in vnc_direct_bgp_add_nve
bgp_attr_default_set creates a new empty aspath. If family error happens,
this aspath is not freed. Move attr initialization after we checked the
family.
Juraj Vijtiuk [Wed, 13 Oct 2021 16:32:53 +0000 (18:32 +0200)]
isisd: fix router capability TLV parsing issues
isis_tlvs.c would fail at multiple places if incorrect TLVs were
received causing stream assertion violations.
This patch fixes the issues by adding missing length checks, missing
consumed length updates and handling malformed Segment Routing subTLVs.
Signed-off-by: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
Small adjustments by Igor Ryzhov:
- fix incorrect replacement of srgb by srlb on lines 3052 and 3054
- add length check for ISIS_SUBTLV_ALGORITHM
- fix conflict in fuzzing data during rebase
Donald Sharp [Wed, 1 Dec 2021 22:03:38 +0000 (17:03 -0500)]
lib: Update hash.h documentation to warn of a possible crash
Multiple deletions from the hash_walk or hash_iteration calls
during a single invocation of the passed in function can and
will cause the program to crash. Warn against doing such a
thing.
Donald Sharp [Wed, 1 Dec 2021 21:28:42 +0000 (16:28 -0500)]
zebra: Ensure zebra_nhg_sweep_table accounts for double deletes
I'm seeing this crash in various forms:
Program terminated with signal SIGSEGV, Segmentation fault.
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f418efbc7c0 (LWP 3580253))]
(gdb) bt
(gdb) f 4
267 (*func)(hb, arg);
(gdb) p hb
$1 = (struct hash_bucket *) 0x558cdaafb250
(gdb) p *hb
$2 = {len = 0, next = 0x0, key = 0, data = 0x0}
(gdb)
I've also seen a crash where data is 0x03.
My suspicion is that hash_iterate is calling zebra_nhg_sweep_entry which
does delete the particular entry we are looking at as well as possibly other
entries when the ref count for those entries gets set to 0 as well.
Then we have this loop in hash_iterate.c:
for (i = 0; i < hash->size; i++)
for (hb = hash->index[i]; hb; hb = hbnext) {
/* get pointer to next hash bucket here, in case (*func)
* decides to delete hb by calling hash_release
*/
hbnext = hb->next;
(*func)(hb, arg);
}
Suppose in the previous loop hbnext is set to hb->next and we call
zebra_nhg_sweep_entry. This deletes the previous entry and also
happens to cause the hbnext entry to be deleted as well, because of nhg
refcounts. At this point in time the memory pointed to by hbnext is
not owned by the pthread anymore and we can end up on a state where
it's overwritten by another pthread in zebra with data for other incoming events.
What to do? Let's change the sweep function to a hash_walk and have
it stop iterating and to start over if there is a possible double
delete operation.
qingkaishi [Fri, 4 Feb 2022 21:41:11 +0000 (16:41 -0500)]
babeld: fix #10502 #10503 by repairing the checks on length
This patch repairs the checking conditions on length in four functions:
babel_packet_examin, parse_hello_subtlv, parse_ihu_subtlv, and parse_update_subtlv