Donald Sharp [Fri, 25 Mar 2022 11:44:55 +0000 (07:44 -0400)]
bgpd: Fix possible insufficient stream data
When reading the BGP_PREFIX_SID_SRV6_L3_SERVICE_SID_STRUCTURE
it is possible that the length read in the packet is insufficiently
large enough to read a BGP_PREFIX_SID_SRV6_L3_SERVICE_SID_STRUCTURE.
Let's ensure that it is.
Fixes: #10860 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Sat, 26 Mar 2022 20:20:53 +0000 (16:20 -0400)]
lib: Ensure order of operations is expected with SECONDS
These 3 values:
ONE_DAY_SECOND
ONE_WEEK_SECOND
ONE_YEAR_SECOND
Are defined based upon the number of seconds. Unfortunately doing math
on these values say something like:
days = t->tv_sec / ONE_DAY_SECOND;
Once you go over about a day causes the order of operations to cause the multiplication
to get messed up:
204 if (!t)
(gdb) n
207 w = d = h = m = ms = 0;
(gdb) set t->tv_sec = ONE_DAY_SECOND + 30
(gdb) n
208 memset(buf, 0, size);
(gdb)
210 us = t->tv_usec;
(gdb)
211 if (us >= 1000) {
(gdb)
212 ms = us / 1000;
(gdb)
213 us %= 1000;
(gdb)
217 if (ms >= 1000) {
(gdb)
222 if (t->tv_sec > ONE_WEEK_SECOND) {
(gdb)
227 if (t->tv_sec > ONE_DAY_SECOND) {
(gdb)
228 d = t->tv_sec / ONE_DAY_SECOND;
(gdb) n
229 t->tv_sec -= d * ONE_DAY_SECOND;
(gdb) n
232 if (t->tv_sec >= HOUR_IN_SECONDS) {
(gdb) p d
$6 = 2073600
(gdb) p t->tv_sec
$7 = -179158953570
(gdb)
Converting to adding paranthesis around around the ONE_DAY_SECOND causes
the order of operations to work as expected.
Donald Sharp [Fri, 25 Mar 2022 00:02:33 +0000 (20:02 -0400)]
zebra: Fix use after deletion event in freebsd
In the FreeBSD code if you delete the interface
and it has no configuration, the ifp pointer will
be deleted from the system *but* zebra continues
to dereference the just freed pointer.
==58624== Invalid read of size 1
==58624== at 0x48539F3: strlcpy (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x2B0565: ifreq_set_name (ioctl.c:48)
==58624== by 0x2B0565: if_get_flags (ioctl.c:416)
==58624== by 0x2B2D9E: ifan_read (kernel_socket.c:455)
==58624== by 0x2B2D9E: kernel_read (kernel_socket.c:1403)
==58624== by 0x499F46E: thread_call (thread.c:2002)
==58624== by 0x495D2B7: frr_run (libfrr.c:1196)
==58624== by 0x2B40B8: main (main.c:471)
==58624== Address 0x6baa7f0 is 64 bytes inside a block of size 432 free'd
==58624== at 0x484ECDC: free (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x4953A64: if_delete (if.c:283)
==58624== by 0x2A93C1: if_delete_update (interface.c:874)
==58624== by 0x2B2DF3: ifan_read (kernel_socket.c:453)
==58624== by 0x2B2DF3: kernel_read (kernel_socket.c:1403)
==58624== by 0x499F46E: thread_call (thread.c:2002)
==58624== by 0x495D2B7: frr_run (libfrr.c:1196)
==58624== by 0x2B40B8: main (main.c:471)
==58624== Block was alloc'd at
==58624== at 0x4851381: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x496A022: qcalloc (memory.c:116)
==58624== by 0x49546BC: if_new (if.c:164)
==58624== by 0x49546BC: if_create_name (if.c:218)
==58624== by 0x49546BC: if_get_by_name (if.c:603)
==58624== by 0x2B1295: ifm_read (kernel_socket.c:628)
==58624== by 0x2A7FB6: interface_list (if_sysctl.c:129)
==58624== by 0x2E99C8: zebra_ns_enable (zebra_ns.c:127)
==58624== by 0x2E99C8: zebra_ns_init (zebra_ns.c:214)
==58624== by 0x2B3FF2: main (main.c:401)
==58624==
Zebra needs to pass back whether or not the ifp pointer
was freed when if_delete_update is called and it should
then check in ifan_read as well as ifm_read that the
ifp pointer is still valid for use.
Donatas Abraitis [Thu, 24 Mar 2022 10:00:57 +0000 (12:00 +0200)]
bgpd: Turn off thread when running `no bmp targets X`
Avoid use-after-free and prevent from crashing:
```
(gdb) bt
0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
1 0x00007f2a15c2c30d in core_handler (signo=11, siginfo=0x7fffb915e630, context=<optimized out>) at lib/sigevent.c:261
2 <signal handler called>
3 0x00007f2a156201e4 in bmp_stats (thread=<optimized out>) at bgpd/bgp_bmp.c:1330
4 0x00007f2a15c3d553 in thread_call (thread=thread@entry=0x7fffb915ebf0) at lib/thread.c:2001
5 0x00007f2a15bfa570 in frr_run (master=0x55c43a393ae0) at lib/libfrr.c:1196
6 0x000055c43930627c in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:519
(gdb)
```
Manoj Naragund [Thu, 17 Mar 2022 11:28:02 +0000 (16:58 +0530)]
ospf6d: crash in ospf6_decrement_retrans_count.
Problem:
ospf6d crash is observed when lsack is received from the neighbour for
AS External LSA.
RCA:
The crash is observed in ospf6_decrement_retrans_count while decrementing
retransmit counter for the LSA when lsack is recived. This is because in
ospf6_flood_interace when new LSA is being added to the neighbour's list
the incrementing is happening on the received LSA instead of the already
present LSA in scope DB which is already carrying counters.
when this new LSA replaces the old one, the already present counters are
not copied on the new LSA this creates counter mismatch which results in
a crash when lsack is recevied due to counter going to negative.
Fix:
The fix involves following changes.
1. In ospf6_flood_interace when LSA is being added to retrans list
check if there is alreday lsa in the scoped db and increment
the counter on that if present.
2. In ospf6_lsdb_add copy the retrans counter from old to new lsa
when its being replaced.
Donald Sharp [Sat, 12 Mar 2022 16:05:23 +0000 (11:05 -0500)]
zebra: prefixlen is not afi/safi dependant in encoding nexthops
When encoding a response to the upper level protocol the
prefixlen is not something that needs to be part of the
switch statement for handling of a prefix.
Donald Sharp [Sat, 12 Mar 2022 15:47:16 +0000 (10:47 -0500)]
*: When matching against a nexthop send and process what it matched against
Currently the nexthop tracking code is only sending to the requestor
what it was requested to match against. When the nexthop tracking
code was simplified to not need an import check and a nexthop check
in b8210849b8ac1abe2d5d9a5ab2459abfde65efa5 for bgpd. It was not
noticed that a longer prefix could match but it would be seen
as a match because FRR was not sending up both the resolved
route prefix and the route FRR was asked to match against.
This code change causes the nexthop tracking code to pass
back up the matched requested route (so that the calling
protocol can figure out which one it is being told about )
as well as the actual prefix that was matched to.
Rafael Zalamena [Mon, 21 Feb 2022 11:28:11 +0000 (06:28 -0500)]
lib: tweak northbound gRPC default timeout
Don't let open sockets hang for too long. This will fix an issue where a
improperly coded client (e.g. socat) could exaust the amount of open
file descriptors.
Passing argument "&rec" of type "struct pfx_record *" and argument
"1UL" to function "read" is suspicious because
"sizeof (struct pfx_record) /*40*/" is expected.
The FRRouting community would like to announce FRR Release 8.2.
This release consists of just over 800 commits from 62 authors.
Selected features and bug fixes are listed below.
babeld:
Fix the checks for truncated packets
bfdd:
Correct one spelling error of comment
Fix detection timeout update
Fix possibly wrong counter of control packets
bgpd:
Add "json" option to a few more show commands
Add 'show bgp <afi> <safi> json detail' header data
Add a 6 hour warning to missing policy
Add an ability to match ipv6 next-hop by prefix-list
Add autocomplete for access-list under bmp node
Add autocomplete for as-path filters
Add autocomplete for set/match community/large/ext lists
Add long-lived graceful restart capability
Add peer-groups to neighbor autocomplete
Adjust symbolic names for cease notifications according to rfc4486
Deprecate dpa, advertiser and rcid_path path attributes
Extended bgp administrative shutdown communication
Fix crash when using "show bgp vrf all"
Fix inconsistency of match ip/ipv6 next-hop commands
Fix missing name of default vrf
Handle TCP connection errors with connection callbacks for RPKI
Implement llgr helper mode
Implement rfc9072
Support redirect import more than one route-target ipv6
docker:
Update alpine build enable set own version
isisd:
Add link state traffic engineering support
Fix router capability tlv parsing issues
Fix running-config for fast-reroute
Make isis work with default vrf name different than 'default'
ospf6d:
Add missing vrf parameter to "clear ipv6 ospf6 interface"
Add prompt for commands with non-exist vrf
Add support for nssa type-7 address ranges
Add the ability of specifying router-id/area-id in no debug ospf6
Do not originate type-4 lsa when nssa
Do not send type-5 into stub area
Fix ecmp inter-area route nexthop update
Fix memory leak for `show ipv6 ospf6 zebra json`
ospfd:
Fix wrong comparison of routemap name
Fix crash on "ospf send-extra-data zebra"
Fix incorrect detection of topology changes in helper mode
Fix loss of mixed form in "range" command
Fix no-form of "graceful-restart" command
Fix summary-address deletion
Fix wrong parsing of te subtlv
pbrd:
Add vlan actions to vty
Pbr route maps get addr family of nhgs
Protect from a possible null dereference
pimd:
Do not allow 224.0.0.0/24 range in igmp join
Fix igmp user config
Fix msdp mesh grp with wildcard member addr
Fix stale forwarding entries left around after join goes away
Fix FRR drops IGMP packets for TOS value other than 0XC0
redhat:
Check if frr.conf already exists
Logrotate file has typo for staticd
ripd:
Fix packet send for non primary addresses
vtysh:
Add missing rpki node when showing config
Improve startup time by ca. ×6
remove `address-family evpn`
watchfrr:
Allow an integrated config to work within a namespace
zebra:
Add optional nhg id output to `show ip ro`
Add resolver flag for nexthop in json
Add support for json output in srv6 locator detail command
Don't lose next hop weights while exporting via fpm
Fix buffer overflow
Fix netns deletion
Fix route-map application when when using vrfs
* Contributors
Abhishek Naik
Adriano Marto Reis
Ahmad Caracalli
anlan_cs
Anuradha Karuppiah
ARShreenidhi
Baptiste Jonglez
Chirag Shah
Christian Hopps
ckishimo
David Lamparter
David Schweizer
Donald Lee
Donald Sharp
Donatas Abraitis
Eli Baum
ewlumpkin
Fabrice Fontaine
Fredi Raspall
github login name
Hiroki Shirokura
Igor Ryzhov
Iqra Siddiqui
Jafar Al-Gharaibeh
Javier Garcia
Jonas Gorski
Juraj Vijtiuk
Kantesh Mundaragi
Karel Van Hecke
kiselev99
LEI BAO
Lou Berger
Louis Scalbert
Manoj Naragund
Mark Stapp
Marlin Cremers
Martin Buck
Martin Winter
Mobashshera Rasool
Olivier Dugeon
Philippe Guibert
Punith Kumar
qingkaishi
Quentin Young
Rafael Zalamena
Renato Westphal
rgirada
ron
Ruslan Babayev
Ryoga Saito
Sai Gomathi
Sarita Patra
Solyn
Stephen Worley
Tomi Salminen
Trey Aspelund
wangshengjun
Xiao Liang
Yamato Sugawara
Yuan Yuan
zyxwvu Shi
pimd: FRR drops IGMP packets for TOS value other than 0XC0
Currently the code is expecting the TOS value for received
packet to be 0xC0 and hence it is discarding packets having
TOS value other than 0xc0.
We need to make sure that we are sending the packet with
TOS 0xC0 and while receiving we can allow any TOS value.
Let's follow Postel's law.
Checked Cisco behavior as well. It also accepts all TOS values.
On FreeBSD I have noticed that subsuquent calls to clock_gettime(..)
can return an after time that is before first calls value.
This in turn is generating CPU_HOG's because the subtraction
is wrapping into very very large numbers:
2022/02/28 20:12:58 SHARP: [PTDQA-70FG5] start: 35.741981000 now: 35.740581000
2022/02/28 20:12:58 SHARP: [XK9YH-ZD8FA][EC 100663313] CPU HOG: task zclient_read (800744240) ran for 0ms (cpu time 18446744073709550ms)
(Please note I added the first line of debug to figure this issue out).
I have been asked to open a FreeBSD bug report and have done so.
In the mean time I think that it is important that FRR does
not generate bogus CPU HOG's on FreeBSD ( especially since
this may or may not be easily fixed and FRR has no control
over what version of the operating system, operators are
going to be running with FRR.
So, add a bit of specialized code that checks to see if
the after time in FreeBSD is before the now time in
thread_consumed_time and do some quick manipulations
to not have this issue.
Donald Sharp [Sun, 27 Feb 2022 19:00:41 +0000 (14:00 -0500)]
zebra: Prevent crash if ZEBRA_ROUTE_ALL is used for a route type
FRR will crash when the re->type is a ZEBRA_ROUTE_ALL and it
is inserted into the meta-queue. Let's just put some basic
code in place to prevent a crash from happening. No routing
protocol should be using ZEBRA_ROUTE_ALL as a value but
bugs do happen. Let's just accept the weird route type
gracefully and move on.
Donald Sharp [Sun, 27 Feb 2022 19:11:13 +0000 (14:11 -0500)]
zebra: Get zebra graceful restart working when restarting on *BSD
Upon restart zebra reads in the kernel state. Under linux
there is a mechanism to read the route and convert the protocol
to the correct internal FRR protocol to allow the zebra graceful
restart efforts to work properly.
Under *BSD I do not see a mechanism to convey the original FRR
protocol into the kernel and thus back out of it. Thus when
zebra crashes ( or restarts ) the routes read back in are kernel
routes and are effectively lost to the system and FRR cannot
remove them properly. Why? Because FRR see's kernel routes
as routes that it should not own and in general the admin
distance for those routes will be a better one than the
admin distance from a routing protocol. This is even
worse because when the graceful restart timer pops and rib_sweep
is run, FRR becomes out of sync with the state of the kernel forwarding
on *BSD.
On restart, notice that the route is a self route that there
is no way to know it's originating protocol. In this case
let's set the protocol to ZEBRA_ROUTE_STATIC and set the admin
distance to 255.
This way when an upper level protocol reinstalls it's route
the general zebra graceful restart code still works. The
high admin distance allows the code to just work in a way
that is graceful( HA! )
The drawback here is that the route shows up as a static
route for the time the system is doing it's work. FRR
could introduce *another* route type but this seems like
a bad idea and the STATIC route type is loosely analagous
to the type of route it has become.
Donald Sharp [Sun, 27 Feb 2022 19:18:09 +0000 (14:18 -0500)]
doc: Update documentation to indicate *BSD struggles
*BSD has some special struggles associated with the graceful
restart code in zebra. Add a bit of documentation to outline
this problem and how it is solved.
Has broken `make check` with recently new compilers:
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): warning: relocation against `zebra_ecmp_count' in read-only section `.text'
CCLD tests/bgpd/test_peer_attr
CCLD tests/bgpd/test_packet
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_capabilities':
/home/sharpd/frr5/staticd/static_zebra.c:208: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_route_add':
/home/sharpd/frr5/staticd/static_zebra.c:418: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): in function `static_nexthop_create':
/home/sharpd/frr5/staticd/static_nb_config.c:174: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: /home/sharpd/frr5/staticd/static_nb_config.c:175: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make: *** [Makefile:8679: tests/lib/test_grpc] Error 1
make: *** Waiting for unfinished jobs....
Essentially the newly introduced variable zebra_ecmp_count is not available in the
libstatic.a compiled and make check has code that compiles against it.
The fix is to just move the variable to the library.
Donald Sharp [Sat, 26 Feb 2022 20:40:15 +0000 (15:40 -0500)]
zebra: Allow *BSD to specify a receive buffer size
End operator is reporting that they are receiving buffer overruns
when attempting to read from the kernel receive socket. It is
possible to adjust this size to more modern levels especially
for when the system is under load. Modify the code base
so that *BSD operators can use the zebra `-s XXX` option
to specify a read buffer.
Additionally setup the default receive buffer size on *BSD
to be 128k instead of the 8k so that FRR does not run into
this issue again.
rgirada [Thu, 24 Feb 2022 17:33:08 +0000 (09:33 -0800)]
ospfd: NULL passed instead of ei pointer in external lsa origination
Description:
NULL pointer wrongly passed instead of 'ei' pointer to
ospf_external_lsa_originate() API in opaque capability enable/disable
which always make it to fail in origination.
Corrected it by passing actual ei pointer.
Donald Sharp [Fri, 18 Feb 2022 15:45:46 +0000 (10:45 -0500)]
bfdd: Fix overflow possibility with time statements
If time ( a uint64_t ) is large enough doing division
and subtraction can still lead to situations where
the resulting number is greater than a uint32_t.
Just use uint32_t as an intermediate storage spot.
This is unlikely to every occur in a time frame
I could possibly care about but makes Coverity happy.
Donald Sharp [Wed, 16 Feb 2022 00:47:23 +0000 (19:47 -0500)]
ripd: Fix packet send for non primary addresses
When rip is configured to work on secondary addresses
on an interface, rip was not properly sending out
the packets on secondary addresses because the source of the
packet was never properly being setup and rip would
send the packet out multiple times for the primary address
not once for each address on the interface that is setup to work.
Donald Sharp [Tue, 15 Feb 2022 20:53:30 +0000 (15:53 -0500)]
bgpd: Convert bgp error codes for cli input to an enum
Conversion of bgp error codes returned for cli input into
an enum and then properly handling all the error cases
in bgp_vty_return.
Because not all error codes returned were properly handled
in this function there existed configuration examples that
were accepted on the cli without an error message but not
saved.
Donald Sharp [Tue, 15 Feb 2022 21:04:50 +0000 (16:04 -0500)]
bgpd: Move some error codes to bgp_vty_return handling
BGP_ERR_PEER_GROUP_MEMBER and BGP_ERR_PEER_GROUP_PEER_TYPE_DIFFERENT
both are not handled by bgp_vty_return, but both can be handled by
this function as that there is nothing special going on here.
Donald Sharp [Tue, 15 Feb 2022 20:54:53 +0000 (15:54 -0500)]
bgpd: Remove impossible invalid state
confederations are checking to see that the bgp pointer
is non-null. But it's impossible to have a null pointer
in the cli and in all paths we have already deref'ed the bgp
pointer. Let's remove that error code as that it is impossible
to happen.
Donald Sharp [Mon, 14 Feb 2022 12:57:45 +0000 (07:57 -0500)]
bgp: Add a 15 minute warning to missing policy
Add a 15 minute warning to the logging system when
bgp policy is not setup properly. Operators keep asking
about the missing policy( on upgrade typically ). Let's
try to give them a bit more of a hint when something is
going wrong as that they are clearly missing the other
various places FRR tells them about it.