bgpd
Add peers back to peer hash when peer_xfer_conn fails
Do not explicitly print maxttl value for ebgp-multihop vty output
Do not process nlris if the attribute length is zero
Do not try to redistribute routes if we are shutting down
Don't read the first byte of orf header if we are ahead of stream
Evpn code was not properly unlocking rd_dest
Fix `show bgp all rpki notfound`
Fix session reset issue caused by malformed core attributes
Free bgp vpn policy
Free previously dup'ed aspath attribute for aggregate routes
Free temporary memory after using argv_concat()
Intern attributes before putting into rib-out
Make sure we have enough data to read two bytes when validating aigp
Prevent use after free
Rfapi memleak fixes, clean ce tables at exit
Unlock dest if we return earlier for aggregate install
Use treat-as-withdraw for tunnel encapsulation attribute
lib
Allow unsetting walltime-warning and cpu-warning
Skip route-map optimization if !af_inet(6)
Use max_bitlen instead of magic number
ospf6d
Fix crash because neighbor structure was freed
Stop crash in ospf6_write
ospfd
Check for nulls in vty code
Prevent use after free( and crash of ospf ) when no router ospf
pbrd
Fix crash with match command
pimd
Prevent crash when receiving register message when the rp() is unknown
When receiving a packet be more careful with length in pim_pim_packet
ripd, ripngd
Revert "Cleanup memory allocations on shutdown"
tools
Add what frr thinks as the fib routes for support_bundle
vtysh
Print uniq lines when parsing `no service ...`
zebra
Abstract `dplane_ctx_route_init` to init route without copying
Fix crash when `dplane_fpm_nl` fails to process received routes
Further handle route replace semantics
Fix command ipv6 nht xxx
Fix evpn nexthop config order
Donald Sharp [Wed, 30 Aug 2023 11:25:06 +0000 (07:25 -0400)]
bgpd: Add peers back to peer hash when peer_xfer_conn fails
It was noticed that occassionally peering failed in a testbed
upon investigation it was found that the peer was not in the
peer hash and we saw these failure messages:
Aug 25 21:31:15 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: %NOTIFICATION: sent to neighbor 2001:cafe:1ead:4::4 4/0 (Hold Timer Expired) 0 bytes
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] %bgp_getsockname() failed for peer 2001:cafe:1ead:4::4 fd 27 (from_peer fd -1)
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 33554464] %Neighbor failed in xfer_conn
Upon looking at the code the peer_xfer_conn function can fail
and the bgp_establish code will then return before adding the
peer back to the peerhash.
This is only part of the failure. The peer also appears to
be in a state where it is no longer initiating connection attempts
but that will be another commited fix when we figure that one out.
The command "show bgp all rpki notfound" includes not only RPKI
notfound routes but also RPKI valid and invalid routes in its results.
Fix the code to display only RPKI notfound routes.
Old output:
```
frr# show bgp all rpki notfound
For address family: IPv4 Unicast
BGP table version is 0, local router ID is 10.0.0.1, vrf id 0
Default local pref 100, local AS 64512
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
N x.x.x.0/18 a.a.a.a 100 0 64513 i
V y.y.y.0/19 a.a.a.a 200 0 64513 i
I z.z.z.0/16 a.a.a.a 10 0 64513 i
Displayed 3 routes and 3 total paths
```
New output:
```
frr# show bgp all rpki notfound
For address family: IPv4 Unicast
BGP table version is 0, local router ID is 10.0.0.1, vrf id 0
Default local pref 100, local AS 64512
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
N x.x.x.0/18 a.a.a.a 100 0 64513 i
Donald Sharp [Wed, 30 Aug 2023 14:33:29 +0000 (10:33 -0400)]
ospfd: Prevent use after free( and crash of ospf ) when no router ospf
Consider this config:
router ospf
redistribute kernel
Then you issue:
no router ospf
ospf will crash with a use after free.
The problem is that the event's associated with the
ospf pointer were shut off then the ospf_external_delete
was called which rescheduled the event. Let's just move
event deletion to the end of the no router ospf.
Donatas Abraitis [Sun, 20 Aug 2023 21:01:42 +0000 (00:01 +0300)]
bgpd: Do not explicitly print MAXTTL value for ebgp-multihop vty output
1. Create /etc/frr/frr.conf
```
frr version 7.5
frr defaults traditional
hostname centos8.localdomain
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
line vty
router bgp 4250001000
neighbor 192.168.122.207 remote-as 65512
neighbor 192.168.122.207 ebgp-multihop
```
2. Start FRR
`# systemctl start frr
`
3. Show running configuration. Note that FRR explicitly set and shows the default TTL (225)
```
Building configuration...
Current configuration:
!
frr version 7.5
frr defaults traditional
hostname centos8.localdomain
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 4250001000
neighbor 192.168.122.207 remote-as 65512
neighbor 192.168.122.207 ebgp-multihop 255
!
line vty
!
end
```
4. Copy initial frr.conf to frr.conf.new (no changes)
`# cp /etc/frr/frr.conf /root/frr.conf.new
`
5. Run frr-reload.sh:
```
$ /usr/lib/frr/frr-reload.py --test /root/frr.conf.new
2023-08-20 20:15:48,050 INFO: Called via "Namespace(bindir='/usr/bin', confdir='/etc/frr', daemon='', debug=False, filename='/root/frr.conf.new', input=None, log_level='info', overwrite=False, pathspace=None, reload=False, rundir='/var/run/frr', stdout=False, test=True, vty_socket=None)"
2023-08-20 20:15:48,050 INFO: Loading Config object from file /root/frr.conf.new
2023-08-20 20:15:48,124 INFO: Loading Config object from vtysh show running
Lines To Delete
===============
router bgp 4250001000
no neighbor 192.168.122.207 ebgp-multihop 255
Donatas Abraitis [Fri, 18 Aug 2023 08:28:03 +0000 (11:28 +0300)]
bgpd: Make sure we have enough data to read two bytes when validating AIGP
Found when fuzzing:
```
==3470861==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xffff77801ef7 at pc 0xaaaaba7b3dbc bp 0xffffcff0e760 sp 0xffffcff0df50
READ of size 2 at 0xffff77801ef7 thread T0
0 0xaaaaba7b3db8 in __asan_memcpy (/home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/bgpd/bgpd+0x363db8) (BuildId: cc710a2356e31c7f4e4a17595b54de82145a6e21)
1 0xaaaaba81a8ac in ptr_get_be16 /home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/./lib/stream.h:399:2
2 0xaaaaba819f2c in bgp_attr_aigp_valid /home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/bgpd/bgp_attr.c:504:3
3 0xaaaaba808c20 in bgp_attr_aigp /home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/bgpd/bgp_attr.c:3275:7
4 0xaaaaba7ff4e0 in bgp_attr_parse /home/ubuntu/frr_8_5_2/frr_8_5_2_fuzz_clang/bgpd/bgp_attr.c:3678:10
```
Donatas Abraitis [Fri, 11 Aug 2023 15:21:12 +0000 (18:21 +0300)]
vtysh: Print uniq lines when parsing `no service ...`
Before this patch:
```
no service cputime-warning
no service cputime-warning
no ipv6 forwarding
no service cputime-warning
no service cputime-warning
no service cputime-warning
```
bgpd: Fix session reset issue caused by malformed core attributes
RCA:
On encountering any attribute error for core attributes in update message,
the error handling is set to 'treat as withdraw' and
further parsing of the remaining attributes is skipped.
But the stream pointer is not being correctly adjusted to
point to the next NLRI field skipping the rest of the attributes.
This leads to incorrect parsing of the NLRI field,
which causes BGP session to reset.
Fix:
The stream pointer offset is rightly adjusted to point to the NLRI field correctly
when the malformed attribute is encountered and remaining attribute parsing is skipped.
Trey Aspelund [Fri, 17 Feb 2023 21:47:09 +0000 (21:47 +0000)]
lib: skip route-map optimization if !AF_INET(6)
Currently we unconditionally send a prefix through the optimized
route-map codepath if the v4 and v6 LPM tables have been allocated and
optimization has not been disabled.
However prefixes from address-families that are not IPv4/IPv6 unicast
always fail the optimized route-map index lookup, because they occur on
an LPM tree that is IPv4 or IPv6 specific.
e.g.
Even if you have an empty permit route-map clause, Type-3 EVPN routes
are always denied:
```
--config
route-map soo-foo permit 10
--logs
2023/02/17 19:38:42 BGP: [KZK58-6T4Y6] No best match sequence for pfx: [3]:[0]:[32]:[2.2.2.2] in route-map: soo-foo, result: no match
2023/02/17 19:38:42 BGP: [H5AW4-JFYQC] Route-map: soo-foo, prefix: [3]:[0]:[32]:[2.2.2.2], result: deny
```
There is some existing code that creates an AF_INET/AF_INET6 prefix
using the IP/prefix information from a Type-2/5 EVPN route, which
allowed only these two route-types to successfully attempt an LPM lookup
in the route-map optimization trees via the converted prefix.
This commit does 3 things:
1) Reverts to non-optimized route-map lookup for prefixes that are not
AF_INET or AF_INET6.
2) Cleans up the route-map code so that the AF check is part of the
index lookup + the EVPN RT-2/5 -> AF_INET/6 prefix conversion occurs
outside the index lookup.
3) Adds "debug route-map detail" logs to indicate when we attempt to
convert an AF_EVPN prefix into an AF_INET/6 prefix + when we fallback
to a non-optimized lookup.
Additional functionality for optimized lookups of prefixes from other
address-families can be added prior to the index lookup, similar to how
the existing EVPN conversion works today.
New behavior:
```
2023/02/17 21:44:27 BGP: [WYP1M-NE4SY] Converted EVPN prefix [5]:[0]:[32]:[192.0.2.7] into 192.0.2.7/32 for optimized route-map lookup
2023/02/17 21:44:27 BGP: [MT1SJ-WEJQ1] Best match route-map: soo-foo, sequence: 10 for pfx: 192.0.2.7/32, result: match
2023/02/17 21:44:27 BGP: [H5AW4-JFYQC] Route-map: soo-foo, prefix: 192.0.2.7/32, result: permit
2023/02/17 21:44:27 BGP: [WYP1M-NE4SY] Converted EVPN prefix [2]:[0]:[48]:[aa:bb:cc:00:22:22]:[32]:[20.0.0.2] into 20.0.0.2/32 for optimized route-map lookup
2023/02/17 21:44:27 BGP: [MT1SJ-WEJQ1] Best match route-map: soo-foo, sequence: 10 for pfx: 20.0.0.2/32, result: match
2023/02/17 21:44:27 BGP: [H5AW4-JFYQC] Route-map: soo-foo, prefix: 20.0.0.2/32, result: permit
2023/02/17 21:44:27 BGP: [KHG7H-RH4PN] Unable to convert EVPN prefix [3]:[0]:[32]:[2.2.2.2] into IPv4/IPv6 prefix. Falling back to non-optimized route-map lookup
2023/02/17 21:44:27 BGP: [MT1SJ-WEJQ1] Best match route-map: soo-foo, sequence: 10 for pfx: [3]:[0]:[32]:[2.2.2.2], result: match
2023/02/17 21:44:27 BGP: [H5AW4-JFYQC] Route-map: soo-foo, prefix: [3]:[0]:[32]:[2.2.2.2], result: permit
```
bgpd: Do not try to redistribute routes if we are shutting down
When switching `router bgp`, `no router bgp` and doing redistributions, we should
ignore this action, otherwise memory leak happens:
```
Indirect leak of 400 byte(s) in 2 object(s) allocated from:
0 0x7f81b36b3a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
1 0x7f81b327bd2e in qcalloc lib/memory.c:105
2 0x55f301d28628 in bgp_node_create bgpd/bgp_table.c:92
3 0x7f81b3309d0b in route_node_new lib/table.c:52
4 0x7f81b3309d0b in route_node_set lib/table.c:61
5 0x7f81b330be0a in route_node_get lib/table.c:319
6 0x55f301ce89df in bgp_redistribute_add bgpd/bgp_route.c:8907
7 0x55f301dac182 in zebra_read_route bgpd/bgp_zebra.c:593
8 0x7f81b334dcd7 in zclient_read lib/zclient.c:4179
9 0x7f81b331d702 in event_call lib/event.c:1995
10 0x7f81b325d597 in frr_run lib/libfrr.c:1213
11 0x55f301b94b12 in main bgpd/bgp_main.c:505
12 0x7f81b2b57082 in __libc_start_main ../csu/libc-start.c:308
```
Donald Sharp [Mon, 17 Jul 2023 14:00:32 +0000 (10:00 -0400)]
zebra: Further handle route replace semantics
When an upper level protocol is installing a route X that needs to be
route replaced and at the same time the same or another protocol installs a
different route that depends on route X for nexthop resolution can leave
us with a state where the route is not accepted because zebra is still
really early in the route replace semantics ( route X is still on the work
Queue to be processed ) then the dependent route would not be installed.
This came up in the bgp_default_originate test cases frequently.
Further extendd the ROUTE_ENTR_ROUTE_REPLACING flag to cover this case
as well. This has come up because the early route processing queueing
that was implemented late last year.
Donald Sharp [Fri, 14 Jul 2023 16:14:20 +0000 (12:14 -0400)]
bgpd: Prevent use after free
When running bgp_always_compare_med, I am frequently seeing a crash
After running with valgrind I am seeing this and a invalid write
immediately after this as well.
==311743== Invalid read of size 2
==311743== at 0x4992421: route_map_counter_decrement (routemap.c:3308)
==311743== by 0x35664D: peer_route_map_unset (bgpd.c:7259)
==311743== by 0x306546: peer_route_map_unset_vty (bgp_vty.c:8037)
==311743== by 0x3066AC: no_neighbor_route_map (bgp_vty.c:8081)
==311743== by 0x49078DE: cmd_execute_command_real (command.c:990)
==311743== by 0x4907A63: cmd_execute_command (command.c:1050)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
==311743== by 0x1F7655: main (bgp_main.c:505)
==311743== Address 0x9ec2180 is 64 bytes inside a block of size 120 free'd
==311743== at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==311743== by 0x495A1BA: qfree (memory.c:130)
==311743== by 0x498D412: route_map_free_map (routemap.c:748)
==311743== by 0x498D176: route_map_add (routemap.c:672)
==311743== by 0x498D79B: route_map_get (routemap.c:857)
==311743== by 0x499C256: lib_route_map_create (routemap_northbound.c:102)
==311743== by 0x49702D8: nb_callback_create (northbound.c:1234)
==311743== by 0x497107F: nb_callback_configuration (northbound.c:1578)
==311743== by 0x4971693: nb_transaction_process (northbound.c:1709)
==311743== by 0x496FCF4: nb_candidate_commit_apply (northbound.c:1103)
==311743== by 0x496FE4E: nb_candidate_commit (northbound.c:1136)
==311743== by 0x497798F: nb_cli_classic_commit (northbound_cli.c:49)
==311743== by 0x4977B4F: nb_cli_pending_commit_check (northbound_cli.c:88)
==311743== by 0x49078C1: cmd_execute_command_real (command.c:987)
==311743== by 0x4907B44: cmd_execute_command (command.c:1068)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
==311743== by 0x1F7655: main (bgp_main.c:505)
==311743== Block was alloc'd at
==311743== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==311743== by 0x495A068: qcalloc (memory.c:105)
==311743== by 0x498D0C8: route_map_new (routemap.c:646)
==311743== by 0x498D128: route_map_add (routemap.c:658)
==311743== by 0x498D79B: route_map_get (routemap.c:857)
==311743== by 0x499C256: lib_route_map_create (routemap_northbound.c:102)
==311743== by 0x49702D8: nb_callback_create (northbound.c:1234)
==311743== by 0x497107F: nb_callback_configuration (northbound.c:1578)
==311743== by 0x4971693: nb_transaction_process (northbound.c:1709)
==311743== by 0x496FCF4: nb_candidate_commit_apply (northbound.c:1103)
==311743== by 0x496FE4E: nb_candidate_commit (northbound.c:1136)
==311743== by 0x497798F: nb_cli_classic_commit (northbound_cli.c:49)
==311743== by 0x4977B4F: nb_cli_pending_commit_check (northbound_cli.c:88)
==311743== by 0x49078C1: cmd_execute_command_real (command.c:987)
==311743== by 0x4907B44: cmd_execute_command (command.c:1068)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
Effectively the route_map that is being stored has been freed already
but we have not cleaned up properly yet. Go through and clean the
code up by ensuring that the pointer actually exists instead of trusting
it does when doing the decrement operation.
Donatas Abraitis [Mon, 27 Feb 2023 21:40:32 +0000 (23:40 +0200)]
bgpd: Intern attributes before putting into rib-out
After we call subgroup_announce_check(), we leave communities, large-communities
that are modified by route-maps uninterned, and here we have a memory leak.
```
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323:Direct leak of 80 byte(s) in 2 object(s) allocated from:
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #0 0x7f0858d90037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #1 0x7f08589b15b2 in qcalloc lib/memory.c:105
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #2 0x561f5c4e08d2 in lcommunity_new bgpd/bgp_lcommunity.c:28
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #3 0x561f5c4e11d9 in lcommunity_dup bgpd/bgp_lcommunity.c:141
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #4 0x561f5c5c3b8b in route_set_lcommunity bgpd/bgp_routemap.c:2491
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #5 0x7f0858a177a5 in route_map_apply_ext lib/routemap.c:2675
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #6 0x561f5c5696f9 in subgroup_announce_check bgpd/bgp_route.c:2352
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #7 0x561f5c5fb728 in subgroup_announce_table bgpd/bgp_updgrp_adv.c:682
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #8 0x561f5c5fbd95 in subgroup_announce_route bgpd/bgp_updgrp_adv.c:765
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #9 0x561f5c5f6105 in peer_af_announce_route bgpd/bgp_updgrp.c:2187
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #10 0x561f5c5790be in bgp_announce_route_timer_expired bgpd/bgp_route.c:5032
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #11 0x7f0858a76e4e in thread_call lib/thread.c:1991
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #12 0x7f0858974c24 in frr_run lib/libfrr.c:1185
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #13 0x561f5c3e955d in main bgpd/bgp_main.c:505
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #14 0x7f08583a9d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323-
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323:Indirect leak of 144 byte(s) in 2 object(s) allocated from:
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #0 0x7f0858d8fe8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #1 0x7f08589b1579 in qmalloc lib/memory.c:100
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #2 0x561f5c4e1282 in lcommunity_dup bgpd/bgp_lcommunity.c:144
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #3 0x561f5c5c3b8b in route_set_lcommunity bgpd/bgp_routemap.c:2491
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #4 0x7f0858a177a5 in route_map_apply_ext lib/routemap.c:2675
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #5 0x561f5c5696f9 in subgroup_announce_check bgpd/bgp_route.c:2352
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #6 0x561f5c5fb728 in subgroup_announce_table bgpd/bgp_updgrp_adv.c:682
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #7 0x561f5c5fbd95 in subgroup_announce_route bgpd/bgp_updgrp_adv.c:765
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #8 0x561f5c5f6105 in peer_af_announce_route bgpd/bgp_updgrp.c:2187
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #9 0x561f5c5790be in bgp_announce_route_timer_expired bgpd/bgp_route.c:5032
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #10 0x7f0858a76e4e in thread_call lib/thread.c:1991
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #11 0x7f0858974c24 in frr_run lib/libfrr.c:1185
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #12 0x561f5c3e955d in main bgpd/bgp_main.c:505
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323- #13 0x7f08583a9d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323-
./bgp_large_community.test_bgp_large_community_topo_2/r1.bgpd.asan.2465323-SUMMARY: AddressSanitizer: 224 byte(s) leaked in 4 allocation(s).
```
Direct leak of 80 byte(s) in 2 object(s) allocated from:
0 0x7f8065294d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
1 0x7f8064cfd216 in qcalloc lib/memory.c:105
2 0x5646b7024073 in ecommunity_dup bgpd/bgp_ecommunity.c:252
3 0x5646b7153585 in route_set_ecommunity_lb bgpd/bgp_routemap.c:2925
4 0x7f8064d459be in route_map_apply_ext lib/routemap.c:2675
5 0x5646b7116584 in subgroup_announce_check bgpd/bgp_route.c:2374
6 0x5646b711b907 in subgroup_process_announce_selected bgpd/bgp_route.c:2918
7 0x5646b717ceb8 in group_announce_route_walkcb bgpd/bgp_updgrp_adv.c:184
8 0x7f8064cc6acd in hash_walk lib/hash.c:270
9 0x5646b717ae0c in update_group_af_walk bgpd/bgp_updgrp.c:2046
10 0x5646b7181275 in group_announce_route bgpd/bgp_updgrp_adv.c:1030
11 0x5646b711a986 in bgp_process_main_one bgpd/bgp_route.c:3303
12 0x5646b711b5bf in bgp_process_wq bgpd/bgp_route.c:3444
13 0x7f8064da12d7 in work_queue_run lib/workqueue.c:267
14 0x7f8064d891fb in thread_call lib/thread.c:1991
15 0x7f8064cdffcf in frr_run lib/libfrr.c:1185
16 0x5646b6feca67 in main bgpd/bgp_main.c:505
17 0x7f8063f96c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 16 byte(s) in 2 object(s) allocated from:
0 0x7f8065294b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
1 0x7f8064cfcf01 in qmalloc lib/memory.c:100
2 0x5646b7024151 in ecommunity_dup bgpd/bgp_ecommunity.c:256
3 0x5646b7153585 in route_set_ecommunity_lb bgpd/bgp_routemap.c:2925
4 0x7f8064d459be in route_map_apply_ext lib/routemap.c:2675
5 0x5646b7116584 in subgroup_announce_check bgpd/bgp_route.c:2374
6 0x5646b711b907 in subgroup_process_announce_selected bgpd/bgp_route.c:2918
7 0x5646b717ceb8 in group_announce_route_walkcb bgpd/bgp_updgrp_adv.c:184
8 0x7f8064cc6acd in hash_walk lib/hash.c:270
9 0x5646b717ae0c in update_group_af_walk bgpd/bgp_updgrp.c:2046
10 0x5646b7181275 in group_announce_route bgpd/bgp_updgrp_adv.c:1030
11 0x5646b711a986 in bgp_process_main_one bgpd/bgp_route.c:3303
12 0x5646b711b5bf in bgp_process_wq bgpd/bgp_route.c:3444
13 0x7f8064da12d7 in work_queue_run lib/workqueue.c:267
14 0x7f8064d891fb in thread_call lib/thread.c:1991
15 0x7f8064cdffcf in frr_run lib/libfrr.c:1185
16 0x5646b6feca67 in main bgpd/bgp_main.c:505
17 0x7f8063f96c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
```
zebra: Fix crash when `dplane_fpm_nl` fails to process received routes
When `dplane_fpm_nl` receives a route, it allocates memory for a dplane
context and calls `netlink_route_change_read_unicast_internal` without
initializing the `intf_extra_list` contained in the dplane context. If
`netlink_route_change_read_unicast_internal` is not able to process the
route, we call `dplane_ctx_fini` to free the dplane context. This causes
a crash because `dplane_ctx_fini` attempts to access the intf_extra_list
which is not initialized.
To solve this issue, we can call `dplane_ctx_route_init`to initialize
the dplane route context properly, just after the dplane context
allocation.
(gdb) bt
#0 0x0000555dd5ceae80 in dplane_intf_extra_list_pop (h=0x7fae1c007e68) at ../zebra/zebra_dplane.c:427
#1 dplane_ctx_free_internal (ctx=0x7fae1c0074b0) at ../zebra/zebra_dplane.c:724
#2 0x0000555dd5cebc99 in dplane_ctx_free (pctx=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:869
#3 dplane_ctx_free (pctx=0x7fae2aa88c98, pctx@entry=0x7fae2aa78c28) at ../zebra/zebra_dplane.c:855
#4 dplane_ctx_fini (pctx=pctx@entry=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:890
#5 0x00007fae31e93f29 in fpm_read (t=) at ../zebra/dplane_fpm_nl.c:605
#6 0x00007fae325191dd in thread_call (thread=thread@entry=0x7fae2aa98da0) at ../lib/thread.c:2006
#7 0x00007fae324c42b8 in fpt_run (arg=0x555dd74777c0) at ../lib/frr_pthread.c:309
#8 0x00007fae32405ea7 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007fae32325a2f in clone () from /lib/x86_64-linux-gnu/libc.so.6
zebra: Abstract `dplane_ctx_route_init` to init route without copying
The function `dplane_ctx_route_init` initializes a dplane route context
from the route object passed as an argument. Let's abstract this
function to allow initializing the dplane route context without actually
copying a route object.
This allows us to use this function for initializing a dplane route
context when we don't have any route to copy in it.
Donald Sharp [Sat, 1 Jul 2023 15:18:06 +0000 (11:18 -0400)]
ospf6d: Fix crash because neighbor structure was freed
The loading_done event needs a event pointer to prevent
use after free's. Testing found this:
ERROR: AddressSanitizer: heap-use-after-free on address 0x613000035130 at pc 0x55ad42d54e5f bp 0x7ffff1e942a0 sp 0x7ffff1e94290
READ of size 1 at 0x613000035130 thread T0
#0 0x55ad42d54e5e in loading_done ospf6d/ospf6_neighbor.c:447
#1 0x55ad42ed7be4 in event_call lib/event.c:1995
#2 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#3 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#4 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#5 0x55ad42cf2b19 in _start (/usr/lib/frr/ospf6d+0x248b19)
0x613000035130 is located 48 bytes inside of 384-byte region [0x613000035100,0x613000035280)
freed by thread T0 here:
#0 0x7f57998d77a8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7a8)
#1 0x55ad42e3b4b6 in qfree lib/memory.c:130
#2 0x55ad42d5d049 in ospf6_neighbor_delete ospf6d/ospf6_neighbor.c:180
#3 0x55ad42d1e1ea in interface_down ospf6d/ospf6_interface.c:930
#4 0x55ad42ed7be4 in event_call lib/event.c:1995
#5 0x55ad42ed84fe in _event_execute lib/event.c:2086
#6 0x55ad42d26d7b in ospf6_interface_clear ospf6d/ospf6_interface.c:2847
#7 0x55ad42d73f16 in ospf6_process_reset ospf6d/ospf6_top.c:755
#8 0x55ad42d7e98c in clear_router_ospf6_magic ospf6d/ospf6_top.c:778
#9 0x55ad42d7e98c in clear_router_ospf6 ospf6d/ospf6_top_clippy.c:42
#10 0x55ad42dc2665 in cmd_execute_command_real lib/command.c:994
#11 0x55ad42dc2b32 in cmd_execute_command lib/command.c:1053
#12 0x55ad42dc2fa9 in cmd_execute lib/command.c:1221
#13 0x55ad42ee3cd6 in vty_command lib/vty.c:591
#14 0x55ad42ee4170 in vty_execute lib/vty.c:1354
#15 0x55ad42eec94f in vtysh_read lib/vty.c:2362
#16 0x55ad42ed7be4 in event_call lib/event.c:1995
#17 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#18 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#19 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
previously allocated by thread T0 here:
#0 0x7f57998d7d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x55ad42e3ab22 in qcalloc lib/memory.c:105
#2 0x55ad42d5c8ff in ospf6_neighbor_create ospf6d/ospf6_neighbor.c:119
#3 0x55ad42d4c86a in ospf6_hello_recv ospf6d/ospf6_message.c:464
#4 0x55ad42d4c86a in ospf6_read_helper ospf6d/ospf6_message.c:1884
#5 0x55ad42d4c86a in ospf6_receive ospf6d/ospf6_message.c:1925
#6 0x55ad42ed7be4 in event_call lib/event.c:1995
#7 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#8 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#9 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Add an actual event pointer and just track it appropriately.
Signed-off-by: Donald Sharp <sharpd@nvidia.com> Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Donald Sharp [Fri, 30 Jun 2023 19:21:43 +0000 (15:21 -0400)]
ospf6d: Stop crash in ospf6_write
I'm seeing crashes in ospf6_write on the `assert(node)`. The only
sequence of events that I see that could possibly cause this to happen
is this:
a) Someone has scheduled a outgoing write to the ospf6->t_write and
placed item(s) on the ospf6->oi_write_q
b) A decision is made in ospf6_send_lsupdate() to send an immediate
packet via a event_execute(..., ospf6_write,....).
c) ospf6_write is called and the oi_write_q is cleaned out.
d) the t_write event is now popped and the oi_write_q is empty
and FRR asserts on the `assert(node)` <crash>
When event_execute is called for ospf6_write, just cancel the t_write
event. If ospf6_write has more data to send at the end of the function
it will reschedule itself. I've only seen this crash one time and am
unable to reliably reproduce this at all. But this is the only mechanism
that I can see that could make this happen, given how little the oi_write_q
is actually touched in code.
anlan_cs [Wed, 28 Jun 2023 04:31:52 +0000 (12:31 +0800)]
pbrd: fix crash with match command
Crash with empty `ip-protocol`:
```
anlan(config-pbr-map)# match ip-protocol
vtysh: error reading from pbrd: Resource temporarily unavailable (11)Warning: closing connection to pbrd because of an I/O error!
```
This commit introduced a crash. When the VRF is deleted, the RIPNG
instance should not be freed, because the NB infrastructure still stores
the pointer to it. The instance should be deleted only when it's actually
deleted from the configuration.
To reproduce the crash:
```
frr# conf t
frr(config)# vrf vrf1
frr(config-vrf)# exit
frr(config)# router ripng vrf vrf1
frr(config-router)# exit
frr(config)# no vrf vrf1
frr(config)# no router ripng vrf vrf1
vtysh: error reading from ripngd: Resource temporarily unavailable (11)Warning: closing connection to ripngd because of an I/O error!
frr(config)#
```
This commit introduced a crash. When the VRF is deleted, the RIP instance
should not be freed, because the NB infrastructure still stores the
pointer to it. The instance should be deleted only when it's actually
deleted from the configuration.
To reproduce the crash:
```
frr# conf t
frr(config)# vrf vrf1
frr(config-vrf)# exit
frr(config)# router rip vrf vrf1
frr(config-router)# exit
frr(config)# no vrf vrf1
frr(config)# no router rip vrf vrf1
vtysh: error reading from ripd: Resource temporarily unavailable (11)Warning: closing connection to ripd because of an I/O error!
frr(config)#
```
bfdd
Fix malformed session with vrf
Remove redundant nb destroy callbacks
bgpd
Ensure stream received has enough data
Fix bgpd core when unintern attr
Fix the json output of show bgp all json to be in a valid format
Make sure aigp attribute is non-transitive
Using no pretty json output for l2vpn-evpn routes
doc
Add `neighbor aigp` command for bgp
lib
Fix memory leak in in link state
Fix vtysh core when handling questionmark
Link state memory corruption
ospfd
Fix interface param type update
Fix memory leaks w/ `show ip ospf int x json` commands
Ospf opaque lsa stale processing fix and topotests.
Respect loopback's cost that is set and set loopback costs to 0
pim6d
Fix crash in ipv6 pim command
pimd
Pim not sending register packets after changing from non dr to dr
tests
Adjust aigp metric numbers for ibgp setup
tools
Fix list value remove in frr-reload
vtysh
Give actual pam error messages
zebra
Evpn handle del event for dup detected mac
Fix dp_out_queued counter to actually reflect real life
Fix evpn dup detected local mac del event
Reduce creation and fix memory leak of frrscripting pointers
Unlock the route node when sending route notifications
Chirag Shah [Fri, 26 May 2023 20:43:50 +0000 (13:43 -0700)]
ospfd: fix interface param type update
interface link update event needs
to be handle properly in ospf interface
cache.
Example:
When vrf (interface) is created its default type
would be set to BROADCAST because ifp->status
is not set to VRF.
Subsequent link event sets ifp->status to vrf,
ospf interface update need to compare current type
to new default type which would be VRF (OSPF_IFTYPE_LOOPBACK).
Since ospf type param was created in first add event,
ifp vrf link event didn't update ospf type param which
leads to treat vrf as non loopback interface.
Donald Sharp [Tue, 6 Dec 2022 15:23:11 +0000 (10:23 -0500)]
bgpd: Ensure stream received has enough data
BGP_PREFIX_SID_SRV6_L3_SERVICE attributes must not
fully trust the length value specified in the nlri.
Always ensure that the amount of data we need to read
can be fullfilled.
Reported-by: Iggy Frankovic <iggyfran@amazon.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 06431bfa7570f169637ebb5898f0b0cc3b010802)
Chirag Shah [Tue, 6 Jun 2023 04:48:12 +0000 (21:48 -0700)]
tools: fix list value remove in frr-reload
There might be a time element(s) from
temporary list are removed more than once
which leads to valueError in certain python3
version.
commit-id 1543f58b5 did not handle valueError
properly. This caused regression where
prefix-list config leads to delete followed
by add.
The new fix should just pass the exception as
value removal from list_to_add or list_to_del
is best effort.
This allows prefix-list config has no change
then removes the lines from lines_to_del and
lines_to_add properly.
Configure prefix-list in frr.conf and perform
multiple frr-reload. After first reload operatoin
subsequent ones should not result in delete followed
by add of the prefix-list but rather no-op operation.
Donald Sharp [Wed, 31 May 2023 15:40:07 +0000 (11:40 -0400)]
zebra: Unlock the route node when sending route notifications
When using a context to send route notifications to upper
level protocols, the code was using a locking function to
get the route node. There is no need for this to be locked
as such FRR should free it up.
Yuan Yuan [Tue, 30 May 2023 19:20:09 +0000 (19:20 +0000)]
lib: fix vtysh core when handling questionmark
When issue vtysh command with ?, the initial buf size for the
element is 16. Then it would loop through each element in the cmd
output vector. If the required size for printing out the next
element is larger than the current buf size, realloc the buf memory
by doubling the current buf size regardless of the actual size
that's needed. This would cause vtysh core when the doubled size
is not enough for the next element.
Sarita Patra [Fri, 5 May 2023 17:52:33 +0000 (10:52 -0700)]
pim6d: Fix crash in ipv6 pim command
Problem:
Execute the below commands, pim6d core happens.
interface ens193
ip address 69.0.0.2/24
ipv6 address 8000::1/120
ipv6 mld
ipv6 pim
We see crash only if the interface is not configured, and
we are executing PIM/MLD commands.
RootCause:
Interface ens193 is not configured. So, it will have
ifindex = 0 and mroute_vif_index = -1.
Currently, we don't enable MLD on an interface if
mroute_vif_index < 0. So, pim_ifp->MLD = NULL.
In the API pim_if_membership_refresh(), we are accessing
pim_ifp->MLD NULL pointer which leads to crash.
Fix:
Added NULL check before accessing pim_ifp->MLD pointer in
the API pim_if_membership_refresh().
Yuan Yuan [Tue, 30 May 2023 18:53:32 +0000 (18:53 +0000)]
bgpd: fix bgpd core when unintern attr
When the remote peer is neither EBGP nor confed, aspath is the
shadow copy of attr->aspath in bgp_packet_attribute(). Striping
AS4_PATH should not be done on the aspath directly, since
that would lead to bgpd core dump when unintern the attr.
Donald Sharp [Fri, 26 May 2023 11:44:11 +0000 (07:44 -0400)]
vtysh: Give actual pam error messages
Code was was written where the pam error message put out
was the result from a previous call to the pam modules
instead of the current call to the pam module.
Rajasekar Raja [Mon, 22 May 2023 21:14:30 +0000 (14:14 -0700)]
bgpd: Using no pretty json output for l2vpn-Evpn routes
The output of show bgp all json is inconsistent across Address-families
i.e. ipv4/ipv6 is a no pretty format while l2vpn-evpn is in a pretty
format. For huge scale (lots of routes with lots of paths), it is better
to use no_pretty format.
The vrf check should use the carefully adjusted `vrfid`, which is
based on globally/reliable interface. We can't believe the
`bvrf->vrf->vrf_id` because the `/proc/sys/net/ipv4/udp_l3mdev_accept`
maybe is set "1" in VRF-lite backend even with security drawback.
Acee [Tue, 9 May 2023 20:51:03 +0000 (16:51 -0400)]
ospfd: OSPF opaque LSA stale processing fix and topotests.
1. Fix OSPF opaque LSA processing to preserve the stale opaque
LSAs in the Link State Database for 60 seconds consistent with
what is done for other LSA types.
2. Add a topotest that tests for cases where ospfd is restarted
and a stale OSPF opaque LSA exists in the OSPF routing domain
both when the LSA is purged and when the LSA is reoriginagted
with a more recent instance.
Donald Sharp [Tue, 9 May 2023 17:10:35 +0000 (13:10 -0400)]
ospfd: Respect loopback's cost that is set and set loopback costs to 0
When setting an loopback's cost, set the value to 0, unless the operator
has assigned a value for the loopback's cost.
RFC states:
If the state of the interface is Loopback, add a Type 3
link (stub network) as long as this is not an interface
to an unnumbered point-to-point network. The Link ID
should be set to the IP interface address, the Link Data
set to the mask 0xffffffff (indicating a host route),
and the cost set to 0.
FRR is going to allow this to be overridden if the operator specifically
sets a value too.