Added this api to fill all multicast group address based on IP version.
For PIMv4 its 224.0.0.0/4, for PIMv6 its FF00::0/8.
Changed the code where its being used currently.
pim6d: Modify pim_*_cmd_worker api passing pim_addr parameter
Pass pim_addr as parameter for rp address to accomodate ipv6.
Modifying pim_rp_cmd_worker and pim_no_rp_cmd_worker function
parameters from in_addr to pim_addr.
Changes in the caller functions are done as well to make it work
for IPv6.
pim6d: Return type and parameter changes for api pim_rp_del_config
1. Return value of this function pim_rp_del_config is nowhere used.
So made it as a void function.
2. Paramater const char *rp is first converted to string from prefix
in the caller and then back to prefix in this api pim_rp_del_config.
Fixed it by directly passing the address instead of string.
Modified the bgp_clear_stale_route function to have
better indentation, but in the process changed some
`continue;` statements to `break;` which modified
the looping and caused stale paths to not always be
removed upon an update.
To reproduce: A ---- B, setup with addpath and GR
One side has a prefix with nhop1 and nhop2, kill one
side and then resend the same prefix with nhop3,
paths nhop1 and 2 become stale and never removed.
Code inspection clearly shows that that `continue`
statements became `break` statements causing the
loop over all paths to stop prematurely.
The fix is to change the break back to continue
statements so the loop can continue instead of
stopping.
Rafael Zalamena [Mon, 21 Feb 2022 11:28:11 +0000 (06:28 -0500)]
lib: tweak northbound gRPC default timeout
Don't let open sockets hang for too long. This will fix an issue where a
improperly coded client (e.g. socat) could exaust the amount of open
file descriptors.
lib: Route-map failed for OSPF routes even for matching prefixes
This issue is applicable to other protocols as well.
When user has used route-map, even though the prefixes are falling
under the permit rule, the prefixes were denied and were shown
as inactive route in zebra.
Reason being the parameter which is of type enum was passed to the api
route_map_get_index and was typecasted to uint8_t *.
This problem is visible in case of Big Endian systems because we are
accessing the most significant byte.
'match_ret' field is an enum in the caller and so it is of 4 bytes,
the typecasting it to 1 byte and passing it to the api made
the api to put the value in the most significant byte
which was already zero previously. Therefore the actual value
RMAP_NOMATCH which was 1 never gets reset in this case.
Therefore the api always returns 'RMAP_NOMATCH' and hence
the prefixes are always denied.
Chirag Shah [Thu, 4 Jun 2020 16:41:31 +0000 (09:41 -0700)]
zebra: fix crash in evpn neigh cleanup all
zebra crash is seen during shutdown (frr restart).
During shutdown, remote neigh and remote mac clean up
is triggered first, followed by per vni all neigh
(including local) and macs cleanup is triggered.
The crash occurs when a remote mac is cleaned up first
and its reference is remained in local neigh.
When local neigh attempt removes itself from its associated
mac's neigh_list it triggers inaccessible memory crash.
The fix is during mac deletion if its neigh_list is non-empty
then retain the MAC in AUTO state.
This can arise when MAC and neigh duo are in different state
(remote/local). Otherwise, the order of cleanup operation
is neighs followed by macs.
The auto mac will be cleaned up when per vni all neighs and macs
are cleaned up.
On FreeBSD I have noticed that subsuquent calls to clock_gettime(..)
can return an after time that is before first calls value.
This in turn is generating CPU_HOG's because the subtraction
is wrapping into very very large numbers:
2022/02/28 20:12:58 SHARP: [PTDQA-70FG5] start: 35.741981000 now: 35.740581000
2022/02/28 20:12:58 SHARP: [XK9YH-ZD8FA][EC 100663313] CPU HOG: task zclient_read (800744240) ran for 0ms (cpu time 18446744073709550ms)
(Please note I added the first line of debug to figure this issue out).
I have been asked to open a FreeBSD bug report and have done so.
In the mean time I think that it is important that FRR does
not generate bogus CPU HOG's on FreeBSD ( especially since
this may or may not be easily fixed and FRR has no control
over what version of the operating system, operators are
going to be running with FRR.
So, add a bit of specialized code that checks to see if
the after time in FreeBSD is before the now time in
thread_consumed_time and do some quick manipulations
to not have this issue.
Donald Sharp [Sun, 27 Feb 2022 19:18:09 +0000 (14:18 -0500)]
doc: Update documentation to indicate *BSD struggles
*BSD has some special struggles associated with the graceful
restart code in zebra. Add a bit of documentation to outline
this problem and how it is solved.
Donald Sharp [Sun, 27 Feb 2022 19:11:13 +0000 (14:11 -0500)]
zebra: Get zebra graceful restart working when restarting on *BSD
Upon restart zebra reads in the kernel state. Under linux
there is a mechanism to read the route and convert the protocol
to the correct internal FRR protocol to allow the zebra graceful
restart efforts to work properly.
Under *BSD I do not see a mechanism to convey the original FRR
protocol into the kernel and thus back out of it. Thus when
zebra crashes ( or restarts ) the routes read back in are kernel
routes and are effectively lost to the system and FRR cannot
remove them properly. Why? Because FRR see's kernel routes
as routes that it should not own and in general the admin
distance for those routes will be a better one than the
admin distance from a routing protocol. This is even
worse because when the graceful restart timer pops and rib_sweep
is run, FRR becomes out of sync with the state of the kernel forwarding
on *BSD.
On restart, notice that the route is a self route that there
is no way to know it's originating protocol. In this case
let's set the protocol to ZEBRA_ROUTE_STATIC and set the admin
distance to 255.
This way when an upper level protocol reinstalls it's route
the general zebra graceful restart code still works. The
high admin distance allows the code to just work in a way
that is graceful( HA! )
The drawback here is that the route shows up as a static
route for the time the system is doing it's work. FRR
could introduce *another* route type but this seems like
a bad idea and the STATIC route type is loosely analagous
to the type of route it has become.
Donald Sharp [Sun, 27 Feb 2022 19:00:41 +0000 (14:00 -0500)]
zebra: Prevent crash if ZEBRA_ROUTE_ALL is used for a route type
FRR will crash when the re->type is a ZEBRA_ROUTE_ALL and it
is inserted into the meta-queue. Let's just put some basic
code in place to prevent a crash from happening. No routing
protocol should be using ZEBRA_ROUTE_ALL as a value but
bugs do happen. Let's just accept the weird route type
gracefully and move on.
Has broken `make check` with recently new compilers:
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): warning: relocation against `zebra_ecmp_count' in read-only section `.text'
CCLD tests/bgpd/test_peer_attr
CCLD tests/bgpd/test_packet
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_capabilities':
/home/sharpd/frr5/staticd/static_zebra.c:208: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_route_add':
/home/sharpd/frr5/staticd/static_zebra.c:418: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): in function `static_nexthop_create':
/home/sharpd/frr5/staticd/static_nb_config.c:174: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: /home/sharpd/frr5/staticd/static_nb_config.c:175: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make: *** [Makefile:8679: tests/lib/test_grpc] Error 1
make: *** Waiting for unfinished jobs....
Essentially the newly introduced variable zebra_ecmp_count is not available in the
libstatic.a compiled and make check has code that compiles against it.
The fix is to just move the variable to the library.