Optional recognized and unrecognized BGP attributes,
whether transitive or non-transitive, SHOULD NOT be updated by the
route server (unless enforced by local IXP operator configuration)
and SHOULD be passed on to other route server clients.
By default LB ext-community works with iBGP peers. When we receive a route
from eBGP peer, we can send LB ext-community to iBGP peers.
With this patch, allow sending LB ext-community to iBGP/eBGP peers if they
are set as RS clients.
FRR does not send non-transitive ext-communities to eBGP peers, but for
example GoBGP sends and if it's set as RS client, we should pass those attributes
towards another RS client.
bgpd: Do not forget to update conditional advertisements rmaps for peer-groups
When the peer is configured for the first time:
```
neighbor P1 peer-group
neighbor P1 remote-as external
neighbor P1 advertise-map ADV exist-map EXIST
neighbor 10.10.10.1 peer-group P1
```
Conditional advertisements route-maps are not updated and cond. advertisements
do not work until FRR restarted. BGP sessions clear does not help.
Or even changing peer-group for a peer, causes this bug to kick in.
```
no neighbor 10.10.10.1
neighbor 10.10.10.1 peer-group P2
```
With this fix, cond. advertisements start working immediatelly.
bgpd: Allow setting BGP [large]community in route-maps
Before:
```
spine1-debian-11(config-route-map)# bgp community alias 65001:65001 test1
spine1-debian-11(config)# route-map rm permit 10
spine1-debian-11(config-route-map)# set community 65001:65001
% Malformed communities attribute
```
After:
```
spine1-debian-11(config)# bgp community alias 65001:65001 test1
spine1-debian-11(config)# route-map rm permit 10
spine1-debian-11(config-route-map)# set community 65001:65001
spine1-debian-11(config-route-map)#
```
Donald Sharp [Thu, 31 Mar 2022 17:07:06 +0000 (13:07 -0400)]
pimd: Send immediate join( with possible SG RPT prune bit set
When pimd has this setup:
src ----- rtr ------ receiver
|
rp
And the receiver sends a *,G join to rtr. When the
src starts sending a S,G, rtr can wait up to one join/prune
interval before sending a S,G rpt prune. This interval
causes the pimreg device to be in the S,G OIL as that the
RP does not know to prune this leg off.
rtr receives a *,G igmp join, sends a *,G join towards the rp
rtr receives a S,G packet <WRVIFWHOLE>
creates the S,G upstream, sends the register packet to the rp
the rp sees that it still has downstream interest so it forwards the packet on
After (up to 60 seconds ) the rtr, sends the normally scheduled join for
the G and sends the S,GRPT prune as part of it.
What is being done to fix it:
In wrvifwhole handling, when pimd detects that this is the FHR
and is not the RP *and* the incoming interface for the *,G
is different than the incomding interface for the S,G immediately
send a single join packet for the G( which will have the S,G RPT
prune in it ). Only do this on the first time receiving the
WRVIFWHOLE.
Donald Sharp [Fri, 1 Apr 2022 12:47:38 +0000 (08:47 -0400)]
tests: Reduce some pim test timings to more manageable levels
a) Remove the retry mechanism to continue looking for 75%
of the time for pim code.
This alone saves a bunch of time in tests that use lib/pim.py
Effectively all the times given for retry are already long
enough. Additionally some tests are gathering data with
the expectation that they will not find data so the entire
time is being taken up in retry's. Extending the retry
mechanism makes this even worse. This is especially bad
for pim in that keep alive timers are counting down and
state can be removed due to excessive time waiting.
b) Reduce verify verify_multicast_traffic from 40 seconds
to 20 seconds to gather traffic data.
A bunch of tests are doing this:
a) gather pre test start traffic data( taking about 70
seconds to run, because a bunch of time it was looking
for data that does not exist yet)
b) run a change to introduce a different traffic flow
c) gather post test traffic data ( taking about 70
seconds to run )
Why does this matter? Tests were iterating through
all the different routers looking for traffic flow
as well as different mroute state. This is against
the keepalive timer of 210 seconds. It does not take
long before the stream can be removed and the test is
still looking for data that is no longer there due
to state timeout.
The multicast_pim_sm_topo3/test_multicast_pim_sm_topo3.py
test reduced run time from 398 seconds to 297 seconds.
Greatly reducing keepalive timeout problems.
Trey Aspelund [Thu, 31 Mar 2022 16:35:18 +0000 (16:35 +0000)]
zebra: don't send RAs w/o LLv6 or on bridge-ports
It's confusing for a user to see 'Tx RA failed' in the logs when
they've enabled RAs (either through interface config or BGP unnumbered)
on an interface that can't send them. Let's avoid sending RAs on
interfaces that are bridge_slaves or don't have a link-local address,
since they are the two of the most common reasons for RA Tx failures.
Donald Sharp [Thu, 31 Mar 2022 19:56:24 +0000 (15:56 -0400)]
isisd, lib, ospfd, pathd: Null out free'd pointer
The commands:
router isis 1
mpls-te on
no mpls-te on
mpls-te on
no mpls-te on
!
Will crash
Valgrind gives us this:
==652336== Invalid read of size 8
==652336== at 0x49AB25C: typed_rb_min (typerb.c:495)
==652336== by 0x4943B54: vertices_const_first (link_state.h:424)
==652336== by 0x493DCE4: vertices_first (link_state.h:424)
==652336== by 0x493DADC: ls_ted_del_all (link_state.c:1010)
==652336== by 0x47E77B: isis_instance_mpls_te_destroy (isis_nb_config.c:1871)
==652336== by 0x495BE20: nb_callback_destroy (northbound.c:1131)
==652336== by 0x495B5AC: nb_callback_configuration (northbound.c:1356)
==652336== by 0x4958127: nb_transaction_process (northbound.c:1473)
==652336== by 0x4958275: nb_candidate_commit_apply (northbound.c:906)
==652336== by 0x49585B8: nb_candidate_commit (northbound.c:938)
==652336== by 0x495CE4A: nb_cli_classic_commit (northbound_cli.c:64)
==652336== by 0x495D6C5: nb_cli_apply_changes_internal (northbound_cli.c:250)
==652336== Address 0x6f928e0 is 272 bytes inside a block of size 320 free'd
==652336== at 0x48399AB: free (vg_replace_malloc.c:538)
==652336== by 0x494BA30: qfree (memory.c:141)
==652336== by 0x493D99D: ls_ted_del (link_state.c:997)
==652336== by 0x493DC20: ls_ted_del_all (link_state.c:1018)
==652336== by 0x47E77B: isis_instance_mpls_te_destroy (isis_nb_config.c:1871)
==652336== by 0x495BE20: nb_callback_destroy (northbound.c:1131)
==652336== by 0x495B5AC: nb_callback_configuration (northbound.c:1356)
==652336== by 0x4958127: nb_transaction_process (northbound.c:1473)
==652336== by 0x4958275: nb_candidate_commit_apply (northbound.c:906)
==652336== by 0x49585B8: nb_candidate_commit (northbound.c:938)
==652336== by 0x495CE4A: nb_cli_classic_commit (northbound_cli.c:64)
==652336== by 0x495D6C5: nb_cli_apply_changes_internal (northbound_cli.c:250)
==652336== Block was alloc'd at
==652336== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==652336== by 0x494B6F8: qcalloc (memory.c:116)
==652336== by 0x493D7D2: ls_ted_new (link_state.c:967)
==652336== by 0x47E4DD: isis_instance_mpls_te_create (isis_nb_config.c:1832)
==652336== by 0x495BB29: nb_callback_create (northbound.c:1034)
==652336== by 0x495B547: nb_callback_configuration (northbound.c:1348)
==652336== by 0x4958127: nb_transaction_process (northbound.c:1473)
==652336== by 0x4958275: nb_candidate_commit_apply (northbound.c:906)
==652336== by 0x49585B8: nb_candidate_commit (northbound.c:938)
==652336== by 0x495CE4A: nb_cli_classic_commit (northbound_cli.c:64)
==652336== by 0x495D6C5: nb_cli_apply_changes_internal (northbound_cli.c:250)
==652336== by 0x495D23E: nb_cli_apply_changes (northbound_cli.c:268)
Let's null out the pointer. After this change. Valgrind no longer reports issues
and isisd no longer crashes.
Donatas Abraitis [Mon, 28 Mar 2022 08:41:35 +0000 (11:41 +0300)]
bgpd: Stop LLGR timer when the connection is established
When the connection goes up, the timer is not stopped and if we have a
subsequent GR event we have an old timer which is not as we expect.
Before:
```
spine1-debian-11# sh ip bgp 192.168.100.1/32
BGP routing table entry for 192.168.100.1/32, version 95
Paths: (1 available, best #1, table default, mark routes to be retained for a longer time. Requires support for Long-lived BGP Graceful Restart)
Not advertised to any peer
65001 47583, (stale)
192.168.0.1 from 192.168.0.1 (100.100.200.100)
Origin incomplete, valid, external, best (First path received)
Community: llgr-stale
Last update: Mon Mar 28 08:27:53 2022
Time until Long-lived stale route deleted: 23 <<<<<<<<<<<<
spine1-debian-11# sh ip bgp 192.168.100.1/32
BGP routing table entry for 192.168.100.1/32, version 103
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.0.1
65001 47583
192.168.0.1 from 192.168.0.1 (100.100.200.100)
Origin incomplete, valid, external, best (First path received)
Last update: Mon Mar 28 08:43:29 2022
spine1-debian-11# sh ip bgp 192.168.100.1/32
BGP routing table entry for 192.168.100.1/32, version 103
Paths: (1 available, best #1, table default, mark routes to be retained for a longer time. Requires support for Long-lived BGP Graceful Restart)
Not advertised to any peer
65001 47583, (stale)
192.168.0.1 from 192.168.0.1 (100.100.200.100)
Origin incomplete, valid, external, best (First path received)
Community: llgr-stale
Last update: Mon Mar 28 08:43:30 2022
Time until Long-lived stale route deleted: 17 <<<<<<<<<<<<<<<
```
After:
```
spine1-debian-11# sh ip bgp 192.168.100.1/32
BGP routing table entry for 192.168.100.1/32, version 79
Paths: (1 available, best #1, table default, mark routes to be retained for a longer time. Requires support for Long-lived BGP Graceful Restart)
Not advertised to any peer
65001 47583, (stale)
192.168.0.1 from 192.168.0.1 (0.0.0.0)
Origin incomplete, valid, external, best (First path received)
Community: llgr-stale
Last update: Mon Mar 28 09:05:18 2022
Time until Long-lived stale route deleted: 24 <<<<<<<<<<<<<<<
spine1-debian-11# sh ip bgp 192.168.100.1/32
BGP routing table entry for 192.168.100.1/32, version 87
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.0.1
65001 47583
192.168.0.1 from 192.168.0.1 (100.100.200.100)
Origin incomplete, valid, external, best (First path received)
Last update: Mon Mar 28 09:05:25 2022
spine1-debian-11# sh ip bgp 192.168.100.1/32
BGP routing table entry for 192.168.100.1/32, version 87
Paths: (1 available, best #1, table default, mark routes to be retained for a longer time. Requires support for Long-lived BGP Graceful Restart)
Not advertised to any peer
65001 47583, (stale)
192.168.0.1 from 192.168.0.1 (100.100.200.100)
Origin incomplete, valid, external, best (First path received)
Community: llgr-stale
Last update: Mon Mar 28 09:05:29 2022
Time until Long-lived stale route deleted: 29 <<<<<<<<<<<<<<
```
rgirada [Mon, 21 Mar 2022 09:48:46 +0000 (02:48 -0700)]
ospfd: Default route becomes stale route in nbrs even after flush from originator.
Description:
Default route is not getting flushed from neighbours though originator
triggered flush and deleted LSA from its database. It become as stale
LSA in neighbours databse forever. This could seen in the following
sequence of configurations with less than a second interval b/w configs.
And this could happen only when originator shouldnt have default route
in its rib so it originates default route only when configure with 'always'
option.
In step-1, default route will be originated to AS.
In step-2, default route will be flushed to AS, but neighbours will be
discarding this update due to minlsainterval condition.
And it is expected that DUT need to keep send this update
until it receives the ack from neighbours by adding each
neighbour's retransmission list.
In Step-3: It is deleting the lsas from nbr's retransmission list
by assuming it initiated the flush. This is cuasing to not
send the lsa update anymore to neighbours which makes
stale lsa in nbrs forever.
Fix:
Allowed to delete the lsa from retransmission list only when lsa is
not in maxage during flushing procedure.
Donald Sharp [Fri, 25 Mar 2022 11:44:55 +0000 (07:44 -0400)]
bgpd: Fix possible insufficient stream data
When reading the BGP_PREFIX_SID_SRV6_L3_SERVICE_SID_STRUCTURE
it is possible that the length read in the packet is insufficiently
large enough to read a BGP_PREFIX_SID_SRV6_L3_SERVICE_SID_STRUCTURE.
Let's ensure that it is.
Fixes: #10860 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Sat, 26 Mar 2022 20:20:53 +0000 (16:20 -0400)]
lib: Ensure order of operations is expected with SECONDS
These 3 values:
ONE_DAY_SECOND
ONE_WEEK_SECOND
ONE_YEAR_SECOND
Are defined based upon the number of seconds. Unfortunately doing math
on these values say something like:
days = t->tv_sec / ONE_DAY_SECOND;
Once you go over about a day causes the order of operations to cause the multiplication
to get messed up:
204 if (!t)
(gdb) n
207 w = d = h = m = ms = 0;
(gdb) set t->tv_sec = ONE_DAY_SECOND + 30
(gdb) n
208 memset(buf, 0, size);
(gdb)
210 us = t->tv_usec;
(gdb)
211 if (us >= 1000) {
(gdb)
212 ms = us / 1000;
(gdb)
213 us %= 1000;
(gdb)
217 if (ms >= 1000) {
(gdb)
222 if (t->tv_sec > ONE_WEEK_SECOND) {
(gdb)
227 if (t->tv_sec > ONE_DAY_SECOND) {
(gdb)
228 d = t->tv_sec / ONE_DAY_SECOND;
(gdb) n
229 t->tv_sec -= d * ONE_DAY_SECOND;
(gdb) n
232 if (t->tv_sec >= HOUR_IN_SECONDS) {
(gdb) p d
$6 = 2073600
(gdb) p t->tv_sec
$7 = -179158953570
(gdb)
Converting to adding paranthesis around around the ONE_DAY_SECOND causes
the order of operations to work as expected.
Donald Sharp [Fri, 25 Mar 2022 00:02:33 +0000 (20:02 -0400)]
zebra: Fix use after deletion event in freebsd
In the FreeBSD code if you delete the interface
and it has no configuration, the ifp pointer will
be deleted from the system *but* zebra continues
to dereference the just freed pointer.
==58624== Invalid read of size 1
==58624== at 0x48539F3: strlcpy (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x2B0565: ifreq_set_name (ioctl.c:48)
==58624== by 0x2B0565: if_get_flags (ioctl.c:416)
==58624== by 0x2B2D9E: ifan_read (kernel_socket.c:455)
==58624== by 0x2B2D9E: kernel_read (kernel_socket.c:1403)
==58624== by 0x499F46E: thread_call (thread.c:2002)
==58624== by 0x495D2B7: frr_run (libfrr.c:1196)
==58624== by 0x2B40B8: main (main.c:471)
==58624== Address 0x6baa7f0 is 64 bytes inside a block of size 432 free'd
==58624== at 0x484ECDC: free (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x4953A64: if_delete (if.c:283)
==58624== by 0x2A93C1: if_delete_update (interface.c:874)
==58624== by 0x2B2DF3: ifan_read (kernel_socket.c:453)
==58624== by 0x2B2DF3: kernel_read (kernel_socket.c:1403)
==58624== by 0x499F46E: thread_call (thread.c:2002)
==58624== by 0x495D2B7: frr_run (libfrr.c:1196)
==58624== by 0x2B40B8: main (main.c:471)
==58624== Block was alloc'd at
==58624== at 0x4851381: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x496A022: qcalloc (memory.c:116)
==58624== by 0x49546BC: if_new (if.c:164)
==58624== by 0x49546BC: if_create_name (if.c:218)
==58624== by 0x49546BC: if_get_by_name (if.c:603)
==58624== by 0x2B1295: ifm_read (kernel_socket.c:628)
==58624== by 0x2A7FB6: interface_list (if_sysctl.c:129)
==58624== by 0x2E99C8: zebra_ns_enable (zebra_ns.c:127)
==58624== by 0x2E99C8: zebra_ns_init (zebra_ns.c:214)
==58624== by 0x2B3FF2: main (main.c:401)
==58624==
Zebra needs to pass back whether or not the ifp pointer
was freed when if_delete_update is called and it should
then check in ifan_read as well as ifm_read that the
ifp pointer is still valid for use.
Donatas Abraitis [Thu, 24 Mar 2022 10:00:57 +0000 (12:00 +0200)]
bgpd: Turn off thread when running `no bmp targets X`
Avoid use-after-free and prevent from crashing:
```
(gdb) bt
0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
1 0x00007f2a15c2c30d in core_handler (signo=11, siginfo=0x7fffb915e630, context=<optimized out>) at lib/sigevent.c:261
2 <signal handler called>
3 0x00007f2a156201e4 in bmp_stats (thread=<optimized out>) at bgpd/bgp_bmp.c:1330
4 0x00007f2a15c3d553 in thread_call (thread=thread@entry=0x7fffb915ebf0) at lib/thread.c:2001
5 0x00007f2a15bfa570 in frr_run (master=0x55c43a393ae0) at lib/libfrr.c:1196
6 0x000055c43930627c in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:519
(gdb)
```
Manoj Naragund [Thu, 17 Mar 2022 11:28:02 +0000 (16:58 +0530)]
ospf6d: crash in ospf6_decrement_retrans_count.
Problem:
ospf6d crash is observed when lsack is received from the neighbour for
AS External LSA.
RCA:
The crash is observed in ospf6_decrement_retrans_count while decrementing
retransmit counter for the LSA when lsack is recived. This is because in
ospf6_flood_interace when new LSA is being added to the neighbour's list
the incrementing is happening on the received LSA instead of the already
present LSA in scope DB which is already carrying counters.
when this new LSA replaces the old one, the already present counters are
not copied on the new LSA this creates counter mismatch which results in
a crash when lsack is recevied due to counter going to negative.
Fix:
The fix involves following changes.
1. In ospf6_flood_interace when LSA is being added to retrans list
check if there is alreday lsa in the scoped db and increment
the counter on that if present.
2. In ospf6_lsdb_add copy the retrans counter from old to new lsa
when its being replaced.
Donald Sharp [Sat, 12 Mar 2022 16:05:23 +0000 (11:05 -0500)]
zebra: prefixlen is not afi/safi dependant in encoding nexthops
When encoding a response to the upper level protocol the
prefixlen is not something that needs to be part of the
switch statement for handling of a prefix.
Donald Sharp [Sat, 12 Mar 2022 15:47:16 +0000 (10:47 -0500)]
*: When matching against a nexthop send and process what it matched against
Currently the nexthop tracking code is only sending to the requestor
what it was requested to match against. When the nexthop tracking
code was simplified to not need an import check and a nexthop check
in b8210849b8ac1abe2d5d9a5ab2459abfde65efa5 for bgpd. It was not
noticed that a longer prefix could match but it would be seen
as a match because FRR was not sending up both the resolved
route prefix and the route FRR was asked to match against.
This code change causes the nexthop tracking code to pass
back up the matched requested route (so that the calling
protocol can figure out which one it is being told about )
as well as the actual prefix that was matched to.
Rafael Zalamena [Mon, 21 Feb 2022 11:28:11 +0000 (06:28 -0500)]
lib: tweak northbound gRPC default timeout
Don't let open sockets hang for too long. This will fix an issue where a
improperly coded client (e.g. socat) could exaust the amount of open
file descriptors.
Passing argument "&rec" of type "struct pfx_record *" and argument
"1UL" to function "read" is suspicious because
"sizeof (struct pfx_record) /*40*/" is expected.
The FRRouting community would like to announce FRR Release 8.2.
This release consists of just over 800 commits from 62 authors.
Selected features and bug fixes are listed below.
babeld:
Fix the checks for truncated packets
bfdd:
Correct one spelling error of comment
Fix detection timeout update
Fix possibly wrong counter of control packets
bgpd:
Add "json" option to a few more show commands
Add 'show bgp <afi> <safi> json detail' header data
Add a 6 hour warning to missing policy
Add an ability to match ipv6 next-hop by prefix-list
Add autocomplete for access-list under bmp node
Add autocomplete for as-path filters
Add autocomplete for set/match community/large/ext lists
Add long-lived graceful restart capability
Add peer-groups to neighbor autocomplete
Adjust symbolic names for cease notifications according to rfc4486
Deprecate dpa, advertiser and rcid_path path attributes
Extended bgp administrative shutdown communication
Fix crash when using "show bgp vrf all"
Fix inconsistency of match ip/ipv6 next-hop commands
Fix missing name of default vrf
Handle TCP connection errors with connection callbacks for RPKI
Implement llgr helper mode
Implement rfc9072
Support redirect import more than one route-target ipv6
docker:
Update alpine build enable set own version
isisd:
Add link state traffic engineering support
Fix router capability tlv parsing issues
Fix running-config for fast-reroute
Make isis work with default vrf name different than 'default'
ospf6d:
Add missing vrf parameter to "clear ipv6 ospf6 interface"
Add prompt for commands with non-exist vrf
Add support for nssa type-7 address ranges
Add the ability of specifying router-id/area-id in no debug ospf6
Do not originate type-4 lsa when nssa
Do not send type-5 into stub area
Fix ecmp inter-area route nexthop update
Fix memory leak for `show ipv6 ospf6 zebra json`
ospfd:
Fix wrong comparison of routemap name
Fix crash on "ospf send-extra-data zebra"
Fix incorrect detection of topology changes in helper mode
Fix loss of mixed form in "range" command
Fix no-form of "graceful-restart" command
Fix summary-address deletion
Fix wrong parsing of te subtlv
pbrd:
Add vlan actions to vty
Pbr route maps get addr family of nhgs
Protect from a possible null dereference
pimd:
Do not allow 224.0.0.0/24 range in igmp join
Fix igmp user config
Fix msdp mesh grp with wildcard member addr
Fix stale forwarding entries left around after join goes away
Fix FRR drops IGMP packets for TOS value other than 0XC0
redhat:
Check if frr.conf already exists
Logrotate file has typo for staticd
ripd:
Fix packet send for non primary addresses
vtysh:
Add missing rpki node when showing config
Improve startup time by ca. ×6
remove `address-family evpn`
watchfrr:
Allow an integrated config to work within a namespace
zebra:
Add optional nhg id output to `show ip ro`
Add resolver flag for nexthop in json
Add support for json output in srv6 locator detail command
Don't lose next hop weights while exporting via fpm
Fix buffer overflow
Fix netns deletion
Fix route-map application when when using vrfs
* Contributors
Abhishek Naik
Adriano Marto Reis
Ahmad Caracalli
anlan_cs
Anuradha Karuppiah
ARShreenidhi
Baptiste Jonglez
Chirag Shah
Christian Hopps
ckishimo
David Lamparter
David Schweizer
Donald Lee
Donald Sharp
Donatas Abraitis
Eli Baum
ewlumpkin
Fabrice Fontaine
Fredi Raspall
github login name
Hiroki Shirokura
Igor Ryzhov
Iqra Siddiqui
Jafar Al-Gharaibeh
Javier Garcia
Jonas Gorski
Juraj Vijtiuk
Kantesh Mundaragi
Karel Van Hecke
kiselev99
LEI BAO
Lou Berger
Louis Scalbert
Manoj Naragund
Mark Stapp
Marlin Cremers
Martin Buck
Martin Winter
Mobashshera Rasool
Olivier Dugeon
Philippe Guibert
Punith Kumar
qingkaishi
Quentin Young
Rafael Zalamena
Renato Westphal
rgirada
ron
Ruslan Babayev
Ryoga Saito
Sai Gomathi
Sarita Patra
Solyn
Stephen Worley
Tomi Salminen
Trey Aspelund
wangshengjun
Xiao Liang
Yamato Sugawara
Yuan Yuan
zyxwvu Shi
pimd: FRR drops IGMP packets for TOS value other than 0XC0
Currently the code is expecting the TOS value for received
packet to be 0xC0 and hence it is discarding packets having
TOS value other than 0xc0.
We need to make sure that we are sending the packet with
TOS 0xC0 and while receiving we can allow any TOS value.
Let's follow Postel's law.
Checked Cisco behavior as well. It also accepts all TOS values.
On FreeBSD I have noticed that subsuquent calls to clock_gettime(..)
can return an after time that is before first calls value.
This in turn is generating CPU_HOG's because the subtraction
is wrapping into very very large numbers:
2022/02/28 20:12:58 SHARP: [PTDQA-70FG5] start: 35.741981000 now: 35.740581000
2022/02/28 20:12:58 SHARP: [XK9YH-ZD8FA][EC 100663313] CPU HOG: task zclient_read (800744240) ran for 0ms (cpu time 18446744073709550ms)
(Please note I added the first line of debug to figure this issue out).
I have been asked to open a FreeBSD bug report and have done so.
In the mean time I think that it is important that FRR does
not generate bogus CPU HOG's on FreeBSD ( especially since
this may or may not be easily fixed and FRR has no control
over what version of the operating system, operators are
going to be running with FRR.
So, add a bit of specialized code that checks to see if
the after time in FreeBSD is before the now time in
thread_consumed_time and do some quick manipulations
to not have this issue.
Donald Sharp [Sun, 27 Feb 2022 19:00:41 +0000 (14:00 -0500)]
zebra: Prevent crash if ZEBRA_ROUTE_ALL is used for a route type
FRR will crash when the re->type is a ZEBRA_ROUTE_ALL and it
is inserted into the meta-queue. Let's just put some basic
code in place to prevent a crash from happening. No routing
protocol should be using ZEBRA_ROUTE_ALL as a value but
bugs do happen. Let's just accept the weird route type
gracefully and move on.
Donald Sharp [Sun, 27 Feb 2022 19:11:13 +0000 (14:11 -0500)]
zebra: Get zebra graceful restart working when restarting on *BSD
Upon restart zebra reads in the kernel state. Under linux
there is a mechanism to read the route and convert the protocol
to the correct internal FRR protocol to allow the zebra graceful
restart efforts to work properly.
Under *BSD I do not see a mechanism to convey the original FRR
protocol into the kernel and thus back out of it. Thus when
zebra crashes ( or restarts ) the routes read back in are kernel
routes and are effectively lost to the system and FRR cannot
remove them properly. Why? Because FRR see's kernel routes
as routes that it should not own and in general the admin
distance for those routes will be a better one than the
admin distance from a routing protocol. This is even
worse because when the graceful restart timer pops and rib_sweep
is run, FRR becomes out of sync with the state of the kernel forwarding
on *BSD.
On restart, notice that the route is a self route that there
is no way to know it's originating protocol. In this case
let's set the protocol to ZEBRA_ROUTE_STATIC and set the admin
distance to 255.
This way when an upper level protocol reinstalls it's route
the general zebra graceful restart code still works. The
high admin distance allows the code to just work in a way
that is graceful( HA! )
The drawback here is that the route shows up as a static
route for the time the system is doing it's work. FRR
could introduce *another* route type but this seems like
a bad idea and the STATIC route type is loosely analagous
to the type of route it has become.
Donald Sharp [Sun, 27 Feb 2022 19:18:09 +0000 (14:18 -0500)]
doc: Update documentation to indicate *BSD struggles
*BSD has some special struggles associated with the graceful
restart code in zebra. Add a bit of documentation to outline
this problem and how it is solved.
Has broken `make check` with recently new compilers:
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): warning: relocation against `zebra_ecmp_count' in read-only section `.text'
CCLD tests/bgpd/test_peer_attr
CCLD tests/bgpd/test_packet
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_capabilities':
/home/sharpd/frr5/staticd/static_zebra.c:208: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_route_add':
/home/sharpd/frr5/staticd/static_zebra.c:418: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): in function `static_nexthop_create':
/home/sharpd/frr5/staticd/static_nb_config.c:174: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: /home/sharpd/frr5/staticd/static_nb_config.c:175: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make: *** [Makefile:8679: tests/lib/test_grpc] Error 1
make: *** Waiting for unfinished jobs....
Essentially the newly introduced variable zebra_ecmp_count is not available in the
libstatic.a compiled and make check has code that compiles against it.
The fix is to just move the variable to the library.
Donald Sharp [Sat, 26 Feb 2022 20:40:15 +0000 (15:40 -0500)]
zebra: Allow *BSD to specify a receive buffer size
End operator is reporting that they are receiving buffer overruns
when attempting to read from the kernel receive socket. It is
possible to adjust this size to more modern levels especially
for when the system is under load. Modify the code base
so that *BSD operators can use the zebra `-s XXX` option
to specify a read buffer.
Additionally setup the default receive buffer size on *BSD
to be 128k instead of the 8k so that FRR does not run into
this issue again.