Igor Zhukov [Fri, 4 Oct 2024 06:16:02 +0000 (13:16 +0700)]
zebra: Fix crash during reconnect
fpm_enqueue_rmac_table expects an fpm_rmac_arg* as its argument.
The issue can be reproduced by dropping the TCP session using:
ss -K dst 127.0.0.1 dport = 2620
I used Fedora 40 and frr 9.1.2 and I got the gdb backtrace:
(gdb) bt
0 0x00007fdd7d6997ea in fpm_enqueue_rmac_table (bucket=0x2134dd0, arg=0x2132b60) at zebra/dplane_fpm_nl.c:1217
1 0x00007fdd7dd1560d in hash_iterate (hash=0x21335f0, func=0x7fdd7d6997a0 <fpm_enqueue_rmac_table>, arg=0x2132b60) at lib/hash.c:252
2 0x00007fdd7dd1560d in hash_iterate (hash=0x1e5bf10, func=func@entry=0x7fdd7d698900 <fpm_enqueue_l3vni_table>,
arg=arg@entry=0x7ffed983bef0) at lib/hash.c:252
3 0x00007fdd7d698b5c in fpm_rmac_send (t=<optimized out>) at zebra/dplane_fpm_nl.c:1262
4 0x00007fdd7dd6ce22 in event_call (thread=thread@entry=0x7ffed983c010) at lib/event.c:1970
5 0x00007fdd7dd20758 in frr_run (master=0x1d27f10) at lib/libfrr.c:1213
6 0x0000000000425588 in main (argc=10, argv=0x7ffed983c2e8) at zebra/main.c:492
bgpd: Actually make ` --v6-with-v4-nexthops` it work
It was using `-v` which is actually a _version_.
Fixes: 0435b31bb8ed55377f83d0e19bc085abc3c71b44 ("bgpd: Allow bgp to specify if it will allow v6 routing with v4 nexthops") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Donald Sharp [Tue, 1 Oct 2024 18:31:08 +0000 (14:31 -0400)]
lib: nexthop code should use uint16_t for nexthop counting
It's possible to specify via the cli and configure how many
nexthops that are allowed on the system. If you happen to
have > 255 then things are about to get interesting otherwise.
Donald Sharp [Mon, 30 Sep 2024 19:09:42 +0000 (15:09 -0400)]
bgpd: Cleanup multipath figuring out in bgp
Currently bgp multipath has these properties:
a) mp_info may or may not be on a single path, based
upon path perturbations in the past.
b) mp_info->count started counting at 0( meaning 1 ). As that the
bestpath path_info was never included in the count
c) The first mp_info in the list held the multipath data associated
with the multipath. As such if you were at any other node that data
was not filled in.
d) As such the mp_info's that are not first on the list basically
were just pointers to the corresponding bgp_path_info that was in
the multipath.
e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was
created.
f) This linklist was passed in to a comparison function that took the
old mpinfo list and compared it item by item to the linklist and
doing magic to figure out how to create a new mp_info list.
g) the old mp_info and the link list had to be memory managed and
freed up.
h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the
multipath.
This is really complicated. Let's change the algorithm to this:
a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as
BGP_PATH_MULTIPATH_NEW.
b) When running multipath, just walk the list of bgp_path_info's and if
it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH.
If we run out of space to put in the ecmp, clear the flag on the rest.
c) Clean up the counting of sometimes adding 1 to the mpath count.
d) Only allocate a mpath_info node for the bestpath. Clean it up
when done with it.
e) remove the unneeded list management associated with the linklist and
the mp_list.
This greatly simplifies multipath computation for bgp and reduces memory
load for large scale deployments.
Donald Sharp [Thu, 26 Sep 2024 14:46:23 +0000 (10:46 -0400)]
bgpd: Ensure mpath data is only on bestpath
The mpath data structure has data that is only relevant
for the first mpath in the list. It is not being used
anywhere else. Let's document that a bit more.
Rafael Zalamena [Mon, 30 Sep 2024 14:31:56 +0000 (11:31 -0300)]
lib: fix calloc warning on recent compiler
Fix the following compiler warning:
```
lib/elf_py.c: In function _elffile_load_:
lib/elf_py.c:1310:34: warning: _calloc_ sizes specified with _sizeof_ in the earlier argument and not in the later argument [-Wcalloc-transposed-args]
1310 | w->sects = calloc(sizeof(PyObject *), w->ehdr->e_shnum);
| ^~~~~~~~
lib/elf_py.c:1310:34: note: earlier argument should specify number of elements, later size of each element
```
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
tools: fix missing check interfaces for reloading pim
Without checking interfaces, the other interfaces' changes will be wrongly
lost.
Running config:
```
interface A
ip pim
ip pim use-source 11.0.0.1
exit
!
interface B
ip pim
ip pim use-source 22.0.0.1
exit
!
```
Reload the new config:
```
interface A
exit
!
interface B
ip pim
exit
```
Before:
```
2024-09-29 10:08:27,686 INFO: Executed "interface A no ip pim exit"
```
After:
```
2024-09-29 10:05:01,356 INFO: Executed "interface A no ip pim exit"
2024-09-29 10:05:01,376 INFO: Executed "interface B no ip pim use-source 22.0.0.1 exit"
```
pimd: fix a possible use after free bug when doing pim trace
```
ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000aecf0 at pc 0x5555557ecdb9 bp 0x7fffffffe350 sp 0x7fffffffe340
READ of size 4 at 0x6160000aecf0 thread T0
#0 0x5555557ecdb8 in igmp_source_delete pimd/pim_igmpv3.c:340
#1 0x5555557ed475 in igmp_source_delete_expired pimd/pim_igmpv3.c:405
#2 0x5555557de574 in igmp_group_timer pimd/pim_igmp.c:1346
#3 0x7ffff7275421 in event_call lib/event.c:1996
#4 0x7ffff7140797 in frr_run lib/libfrr.c:1237
#5 0x5555557f5840 in main pimd/pim_main.c:166
#6 0x7ffff6a54082 in __libc_start_main ../csu/libc-start.c:308
#7 0x555555686eed in _start (/usr/lib/frr/pimd+0x132eed)
0x6160000aecf0 is located 112 bytes inside of 600-byte region [0x6160000aec80,0x6160000aeed8)
freed by thread T0 here:
#0 0x7ffff767b40f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122
#1 0x7ffff716ed34 in qfree lib/memory.c:131
#2 0x5555557169ae in pim_channel_oil_free pimd/pim_oil.c:84
#3 0x555555717981 in pim_channel_oil_del pimd/pim_oil.c:199
#4 0x55555573c42c in tib_sg_gm_prune pimd/pim_tib.c:196
#5 0x5555557d6d04 in igmp_source_forward_stop pimd/pim_igmp.c:229
#6 0x5555557d5855 in igmp_anysource_forward_stop pimd/pim_igmp.c:61
#7 0x5555557de539 in igmp_group_timer pimd/pim_igmp.c:1344
#8 0x7ffff7275421 in event_call lib/event.c:1996
#9 0x7ffff7140797 in frr_run lib/libfrr.c:1237
#10 0x5555557f5840 in main pimd/pim_main.c:166
#11 0x7ffff6a54082 in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
#0 0x7ffff767ba06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
#1 0x7ffff716ebe1 in qcalloc lib/memory.c:106
#2 0x555555716eb7 in pim_channel_oil_add pimd/pim_oil.c:133
#3 0x55555573b2b9 in tib_sg_oil_setup pimd/pim_tib.c:30
#4 0x55555573bdd3 in tib_sg_gm_join pimd/pim_tib.c:119
#5 0x5555557d6788 in igmp_source_forward_start pimd/pim_igmp.c:193
#6 0x5555557d5771 in igmp_anysource_forward_start pimd/pim_igmp.c:51
#7 0x5555557ecaa0 in group_exclude_fwd_anysrc_ifempty pimd/pim_igmpv3.c:310
#8 0x5555557ef937 in toex_incl pimd/pim_igmpv3.c:839
#9 0x5555557f00a2 in igmpv3_report_toex pimd/pim_igmpv3.c:938
#10 0x5555557f543d in igmp_v3_recv_report pimd/pim_igmpv3.c:2000
#11 0x5555557da2b4 in pim_igmp_packet pimd/pim_igmp.c:787
#12 0x5555556ee46a in process_igmp_packet pimd/pim_mroute.c:763
#13 0x5555556ee5f3 in pim_mroute_msg pimd/pim_mroute.c:787
#14 0x5555556eef58 in mroute_read pimd/pim_mroute.c:877
#15 0x7ffff7275421 in event_call lib/event.c:1996
#16 0x7ffff7140797 in frr_run lib/libfrr.c:1237
#17 0x5555557f5840 in main pimd/pim_main.c:166
#18 0x7ffff6a54082 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free pimd/pim_igmpv3.c:340 in igmp_source_delete
Shadow bytes around the buggy address:
0x0c2c8000dd40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2c8000dd50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2c8000dd60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2c8000dd70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2c8000dd80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2c8000dd90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd
0x0c2c8000dda0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2c8000ddb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2c8000ddc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2c8000ddd0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
0x0c2c8000dde0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
```
Donald Sharp [Wed, 25 Sep 2024 16:09:40 +0000 (12:09 -0400)]
zebra: Correctly report metrics
Report the routes metric in IPFORWARDMETRIC1 and return
-1 for the other metrics as required by the IP-FORWARD-MIB.
inetCidrRouteMetric2 OBJECT-TYPE
SYNTAX Integer32
MAX-ACCESS read-create
STATUS current
DESCRIPTION
"An alternate routing metric for this route. The
semantics of this metric are determined by the routing-
protocol specified in the route's inetCidrRouteProto
value. If this metric is not used, its value should be
set to -1."
DEFVAL { -1 }
::= { inetCidrRouteEntry 13 }
I've included metric2 but it's the same for all of them.
Donald Sharp [Wed, 25 Sep 2024 16:06:29 +0000 (12:06 -0400)]
zebra: Fix snmp walk of zebra rib
The snmp walk of the zebra rib was skipping entries
because in_addr_cmp was replaced with a prefix_cmp
which worked slightly differently causing parts
of the zebra rib tree to be skipped.
Donald Sharp [Wed, 25 Sep 2024 13:30:37 +0000 (09:30 -0400)]
bgpd: Remove dead code from recent commit
Recent commit 4d0e7a49cf8d4311a485281fa50bbff6ee8ca6cc
brought in changes that moved a check for ret up
in the code, caused some code to be left around
and be effectively dead since it would never be called.
paths key is not there for
'show bgp l2vpn evpn route rd <rd-id> mac <mac> json' uses
evpn prefix as key for each path.
Replace the evpn prefix with "paths".
This aligned with overall EVPN RIB json output like
'show bgp l2vpn evpn route json'
'show bgp l2vpn evpn route rd <> type 2 json'
tests: Addition of AutoRP Discovery uncovered broken PIM test
With AutoRP discovery running by default, that adds a new
IGMP group that needs to be accounted for in IGMP output.
For pim.py
The clear IGMP interfaces function is in a broken state. It was
already ignoring any errors and returned true always, but with
the addition of the AutoRP discovery group, you could end up
with a different group order in the json which would cause a key
error making the test fail. For now I just added a check to
avoid the key error.
Uses hardcoded sample AutoRP packets injected in to test
message parsing and proper application of AutoRP learned
RP info. Tests mix of AutoRP and static RP's.
autorp discovery
Enables Auto RP discovery for learning dynamic RP information using the
AutoRP protocol.
autorp announce RP-ADDR [GROUP | group-list PREFIX-LIST]
Enable announcements of a candidate RP with the given group range, or
prefix list of group ranges, to an AutoRP mapping agent.
autorp announce {scope (1-255) | interval (1-65535) | holdtime (0-65535)}
Configure the parameters of the AutoRP announcement messages.
The scope sets the packet TTL.
The interval sets the time between TX of announcements.
The holdtime sets the hold time in the message, the time the mapping
agent should wait before invalidating the candidate RP information.
debug pim autorp
Enable debug logging of the AutoRP protocol
show ip pim [vrf NAME] autorp [json]
Show details of the AutoRP protocol.
To view learned RP info, use the existing command 'show ip pim rp-info'
Perform AutoRP discovery and candidate RP announcements using the
AutoRP protocol.
Mapping agent is not yet implemented, but this feature is not
necessary for FRR to support AutoRP as we only need one AutoRP
mapping agent in the network.
Donald Sharp [Mon, 23 Sep 2024 19:07:28 +0000 (15:07 -0400)]
tests: Add v4/v6 forwarding off/on
There are no tests that ensured that turning off then on
v4 and v6 forwarding actually worked. This does so.
This was found via looking at the code coverage.
bgpd: Delete EVPN VRF instace if it's not a hidden instance only
```
==5445==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x7ff4c6bedb19 bp 0x7ffc95f2e400 sp 0x7ffc95f2e3c0 T0)
==5445==The signal is caused by a READ memory access.
==5445==Hint: address points to the zero page.
#0 0x7ff4c6bedb19 in hash_iterate lib/hash.c:246
#1 0x5618f41f5f59 in bgp_evpn_nh_finish bgpd/bgp_evpn_mh.c:4663
#2 0x5618f41dcbe8 in bgp_evpn_vrf_delete bgpd/bgp_evpn.c:7336
#3 0x5618f43bdd35 in bgp_delete bgpd/bgpd.c:4098
#4 0x5618f417ef6e in bgp_exit bgpd/bgp_main.c:206
#5 0x5618f417ef6e in sigint bgpd/bgp_main.c:164
#6 0x7ff4c6cac6c4 in frr_sigevent_process lib/sigevent.c:117
#7 0x7ff4c6cd8258 in event_fetch lib/event.c:1767
#8 0x7ff4c6c0dcbc in frr_run lib/libfrr.c:1230
#9 0x5618f418080d in main bgpd/bgp_main.c:555
#10 0x7ff4c670c249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#11 0x7ff4c670c304 in __libc_start_main_impl ../csu/libc-start.c:360
#12 0x5618f417ea20 in _start (/usr/lib/frr/bgpd+0x2e4a20)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV lib/hash.c:246 in hash_iterate
```