]> git.puffer.fish Git - matthieu/frr.git/log
matthieu/frr.git
8 months agoRevert "tests: ipv6 global removal in bgp_nexthop_mp_ipv4_6"
Donatas Abraitis [Wed, 14 Aug 2024 17:15:08 +0000 (20:15 +0300)]
Revert "tests: ipv6 global removal in bgp_nexthop_mp_ipv4_6"

This reverts commit 04c220bedb63334a65677a46ef84938cc812221f.

8 months agoRevert "bgpd: set ipv4-mapped ipv6 for ipv4 with ipv6 nexthop"
Donatas Abraitis [Wed, 14 Aug 2024 17:15:04 +0000 (20:15 +0300)]
Revert "bgpd: set ipv4-mapped ipv6 for ipv4 with ipv6 nexthop"

This reverts commit fc5a738409eac9ca938cbb398872ea77d9cc5023.

8 months agoRevert "bgpd: prefer link-local to a ipv4-mapped ipv6 global"
Donatas Abraitis [Wed, 14 Aug 2024 17:14:59 +0000 (20:14 +0300)]
Revert "bgpd: prefer link-local to a ipv4-mapped ipv6 global"

This reverts commit 5dd731af8421e8b3070eec0b3af4ad234c95c6bb.

8 months agoRevert "topotests: update bgp_vrf_leaking_5549_routes"
Donatas Abraitis [Wed, 14 Aug 2024 17:14:55 +0000 (20:14 +0300)]
Revert "topotests: update bgp_vrf_leaking_5549_routes"

This reverts commit f1b8364ab3784cebfc0689883efdb21ac7d06213.

8 months agoRevert "bgpd: optimize bgp_interface_address_add"
Donatas Abraitis [Wed, 14 Aug 2024 17:14:20 +0000 (20:14 +0300)]
Revert "bgpd: optimize bgp_interface_address_add"

This reverts commit 8599fe2b5e34b2ac1a46a14983ddcc2336e9116d.

8 months agoRevert "bgpd: reduce bgp_interface_address_add indentation"
Donatas Abraitis [Wed, 14 Aug 2024 17:14:16 +0000 (20:14 +0300)]
Revert "bgpd: reduce bgp_interface_address_add indentation"

This reverts commit 778e0df87b7a846f46d84f61ea889a32fe578e49.

8 months agoRevert "bgpd: log new ipv6 global in bgp_interface_address_add"
Donatas Abraitis [Wed, 14 Aug 2024 17:14:12 +0000 (20:14 +0300)]
Revert "bgpd: log new ipv6 global in bgp_interface_address_add"

This reverts commit b083885198157555bbb916ecae9809c5d67a567b.

8 months agoRevert "bgpd: fix sending ipv6 local nexthop if global present"
Donatas Abraitis [Wed, 14 Aug 2024 17:13:44 +0000 (20:13 +0300)]
Revert "bgpd: fix sending ipv6 local nexthop if global present"

This reverts commit 424fe0bf809c1d84f16aba3f5e5f8249af29083b.

8 months agoMerge pull request #16560 from FRRouting/mergify/bp/stable/10.1/pr-16554
Donald Sharp [Tue, 13 Aug 2024 12:45:54 +0000 (08:45 -0400)]
Merge pull request #16560 from FRRouting/mergify/bp/stable/10.1/pr-16554

zebra: Ensure non-equal id's are not same nhg's (backport #16554)

8 months agoMerge pull request #16564 from FRRouting/mergify/bp/stable/10.1/pr-16545
Donald Sharp [Tue, 13 Aug 2024 12:45:09 +0000 (08:45 -0400)]
Merge pull request #16564 from FRRouting/mergify/bp/stable/10.1/pr-16545

isisd: fix memory handling in isis_adj_process_threeway() (backport #16545)

8 months agoMerge pull request #16568 from FRRouting/mergify/bp/stable/10.1/pr-16551
Donald Sharp [Tue, 13 Aug 2024 12:44:39 +0000 (08:44 -0400)]
Merge pull request #16568 from FRRouting/mergify/bp/stable/10.1/pr-16551

lib: fix distribute-list deletion (backport #16551)

8 months agolib: fix distribute-list deletion
Igor Ryzhov [Fri, 9 Aug 2024 22:32:55 +0000 (01:32 +0300)]
lib: fix distribute-list deletion

When a whole distribute-list is deleted (can be done only using API),
all its children must be cleaned up manually.

Fixes #16538

Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
(cherry picked from commit 8fad4f317ebd3de7677d7600e7a024f713b20d70)

8 months agoisisd: fix memory handling in isis_adj_process_threeway()
Mark Stapp [Fri, 9 Aug 2024 14:08:21 +0000 (10:08 -0400)]
isisd: fix memory handling in isis_adj_process_threeway()

The adj_process_threeway() api may call the adj_state_change()
api, which may delete the adj struct being examined. Change the
signature so that callers pass a ptr-to-ptr so that they will
see that deletion.

Signed-off-by: Mark Stapp <mjs@cisco.com>
(cherry picked from commit 3eb7d1641166872591554519607483f6d77657f5)

8 months agozebra: Ensure non-equal id's are not same nhg's
Donald Sharp [Sat, 10 Aug 2024 23:43:08 +0000 (19:43 -0400)]
zebra: Ensure non-equal id's are not same nhg's

The function zebra_nhg_hash_equal is only used
as a hash function for storage of NHG's and retrieval.
If you have say two nhg's:

31 (25/26)
32 (25/26)

This function would return them as being equal.  Which
of course leads to the problem when you attempt to
hash_release 32 but release 31 from the hash.  Then later
when you attempt to do hash comparisons 32 has actually
been freed leaving to use after free situations and shit
goes down hill fast.

This hash is only used as part of the hash comparison
function for nexthop group storage.  Since this is so
let's always return the 31/32 nhg's are not equal at all.

We possibly have a different problem where we are creating
31 and 32 ( when 31 should have just been used instead of 32 )
but we need to prevent any type of hash release problem at all.
This supercedes any other issue( that should be tracked down
on it's own ).  Since you can have use after free situation
that leads to a crash -vs- some possible nexthop group duplication
which is very minor in comparison.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 5a1b61aeba1b83825e1656a047aab35be0c1db55)

8 months agoMerge pull request #16552 from FRRouting/mergify/bp/stable/10.1/pr-16529
Donald Sharp [Sat, 10 Aug 2024 23:50:08 +0000 (19:50 -0400)]
Merge pull request #16552 from FRRouting/mergify/bp/stable/10.1/pr-16529

mgmtd: don't add implicit state data when reading config from file (backport #16529)

8 months agomgmtd: don't add implicit state data when reading config from file
Igor Ryzhov [Wed, 7 Aug 2024 21:40:51 +0000 (00:40 +0300)]
mgmtd: don't add implicit state data when reading config from file

When mgmt reads configuration from file, it shouldn't add implicit state
data to the candidate datastore. Configuration datastores like candidate
should never store state, otherwise they fail validation.

Fixes #15814

Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
(cherry picked from commit 61e8d5e0b9f0ccb7647a04974f7134ede67fedd8)

8 months agoMerge pull request #16540 from FRRouting/mergify/bp/stable/10.1/pr-16531
Donald Sharp [Fri, 9 Aug 2024 00:40:15 +0000 (20:40 -0400)]
Merge pull request #16540 from FRRouting/mergify/bp/stable/10.1/pr-16531

ripd: fix show run output for distribute-list (backport #16531)

8 months agoripd: fix show run output for distribute-list
Igor Ryzhov [Wed, 7 Aug 2024 22:25:02 +0000 (01:25 +0300)]
ripd: fix show run output for distribute-list

CLI show callbacks should be defined in frr_ripd_cli_info instead of
frr_ripd_info, because only the former is loaded by mgmtd and only its
callbacks are getting called for config output.

Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
(cherry picked from commit 25d94ec3eedca978ce7c37359105b0518dcf0f5f)

8 months agoMerge pull request #16536 from FRRouting/mergify/bp/stable/10.1/pr-16530
Donald Sharp [Thu, 8 Aug 2024 16:18:48 +0000 (12:18 -0400)]
Merge pull request #16536 from FRRouting/mergify/bp/stable/10.1/pr-16530

lib: fix crash on distribute-list delete (backport #16530)

8 months agolib: fix crash on distribute-list delete
Igor Ryzhov [Wed, 7 Aug 2024 22:17:11 +0000 (01:17 +0300)]
lib: fix crash on distribute-list delete

The destroy callback must be executed only once on APPLY stage.

Fixes #16528

Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
(cherry picked from commit 2b12d62e38bf41648b2703b5a5c48e47eb01edc7)

8 months agoFRR Release 10.1 docker/10.1.0 frr-10.1
Jafar Al-Gharaibeh [Thu, 1 Aug 2024 04:32:17 +0000 (23:32 -0500)]
FRR Release 10.1

Release Overview:

* Breaking changes
 - Enable BGP dynamic capability by default for datacenter profile
 - Split BGP `rpki cache` command into separate per SSH/TCP
 - Add deprecation cycle for OSPF `router-info X [A.B.C.D]` command

* Features
 - BGP dampening per-neighbor support
 - BMP send-experimental stats
 - Implement extended link-bandwidth for BGP
 - Paths Limit for Multiple Paths in BGP
 - New command for OSPFv2 `ip ospf neighbor-filter NAME [A.B.C.D]`
 - Implement non-broadcast support for point-to-multipoint networks

* Other significant changes
bgpd
- Fix route leaking from the default l3vrf
- Fix `match peer` when switching between IPv4/IPv6/interface
- Fix dynamic peer graceful restart race condition
- Fix colored routes not installed after a switchover
- Fix crash when deleting the SRv6 locator
- Fix `no set as-path prepend ASNUM...`
- Fix negative commands for Graceful-Restart operations
- Fix ipv4-mapped ipv6 on non 6pe
- Fix show run of network route-distinguisher
- Fix display when using `missing-as-worst`
- Fix `show bgp neighbors` output
- Fix error handling for MP/GR capabilities as a dynamic capability
- Fix error handling when receiving BGP Prefix-SID attribute
- Fix route-target display with a dotted format
- Fix `no bgp as-path access-list`
- Fix `no` form for `neighbor X capability software-version`
- Check against extended community unit size for link bandwidth
- Make sure we have enough data to handle extended link bandwidth
- Check if FQDN capability length is in valid ranges
- Allow using different ASNs per VRF instances
- Send End-of-RIB not only if Graceful-Restart capability is received
- Implement backpressure to avoid CPU hog
- Ignore validating the attribute flags if path-attribute is configured
- Prevent deletion of BGP peer groups associated with `bgp listen range`
- Inherit some peer flags from the peer-group
- Allow specification of AS 0 for RPKI commands
- Allow using `maximum-prefix` for EVPN
- Increase install/uninstall speed of EVPN VNIs
- Update default-originate route-map actual map structure
- Include `unsuppress-map` as a valid outgoing eBGP policy
- Allow dynamically disable graceful-restart/long-lived graceful-restart
- Unset advertised capabilities if the capability is disabled
- Aggregated summary-only remove suppressed from EVPN

isisd
- Fix crash when deactivating ISIS adjacency on the interface
- Fix `show isis database [detail] json`
- Fix `show isis algorithm`
- Fix crash when configuring the circuit type for the interface
- Fix IP/IPv6 reachability TLVs
- When the metric-type is configured as "wide", the IS-IS generates
  incorrect metric values for IPv4 directly connected routes
- Add link state support for SRv6 adjacencies
- The hold time of hello packets on a P2P link does not match the
  sending interval

mgmtd
- Implement YANG RPC/action support

ospfd
- Fix crash in OSPF TE parsing
- Fix the bug where ip_ospf_dead-interval_minimal_hello-multiplier did
  not reset the hello timer
- Fix `no write-multiplier` command
- Fix `no maximum-paths` command
- Solved crash in RI parsing with OSPF TE
- Assure OSPF AS External routes are installed after the link flap
- Send LS Updates in response to LS Request as unicast

ospf6d
- Handle topo change in Graceful-Restart Helper mode for max-age LSAs
- Prevent heap-buffer-overflow with an unknown type
- Redistribute metric for AS-external route
- Fix next-hop computation for inter-area multi-ABR ECMP
- Fix interface type vs. connected routes updates

pathd
- Retry synchronous label-manager ZAPI connection

pimd
- Fix null register before aging out reg-stop
- Fix dr-priority range
- Fix crash unconfiguring rp keepalive timer

lib
- Fix keychain NB crash
- Do not convert EVPN prefixes into IPv4/IPv6 if not needed

ripd
- Fix `clear ip rip` command

ripngd
- Fix `clear ipv6 ripng` command

tools
- Handle seq num for BGP as-path in frr-reload.py

vtysh
- Fix 'show ip[v6] prefix-list ... json' formatting by moving it to vtysh
- Fix `show route-map` command when calling via `do`
- Show `ip ospf network ...` even if it's not the same as the interface type

zebra
- Fix `mpls label bind` command
- Fix excessive `exit` commands
- Fix static SRv6 segment-list SID order
- Fix JSON output for `show route summary json`
- Fix malformed json output for multiple vrfs in command
  `show ip route vrf all json`
- Fix crash if MAC-VLAN link in another netns
- Fix crash on MAC-VLAN link down/up
- Deny the routes if ip protocol CLI refers to an undefined route-map
- Bridge flap handle VLAN membership update
- Add `show fpm status [json]` command

8 months agodebian, redhat: updating changelog for 10.1 release
Jafar Al-Gharaibeh [Thu, 1 Aug 2024 04:31:57 +0000 (23:31 -0500)]
debian, redhat: updating changelog for 10.1 release

Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
8 months agoMerge pull request #16502 from FRRouting/mergify/bp/stable/10.1/pr-16497
Donald Sharp [Wed, 31 Jul 2024 16:33:46 +0000 (12:33 -0400)]
Merge pull request #16502 from FRRouting/mergify/bp/stable/10.1/pr-16497

bgpd: Check the actual remaining stream length before taking TLV value (backport #16497)

8 months agobgpd: Check the actual remaining stream length before taking TLV value
Donatas Abraitis [Wed, 31 Jul 2024 05:35:14 +0000 (08:35 +0300)]
bgpd: Check the actual remaining stream length before taking TLV value

```
    0 0xb50b9f898028 in __sanitizer_print_stack_trace (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x368028) (BuildId: 3292703ed7958b20076550c967f879db8dc27ca7)
    1 0xb50b9f7ed8e4 in fuzzer::PrintStackTrace() (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x2bd8e4) (BuildId: 3292703ed7958b20076550c967f879db8dc27ca7)
    2 0xb50b9f7d4d9c in fuzzer::Fuzzer::CrashCallback() (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x2a4d9c) (BuildId: 3292703ed7958b20076550c967f879db8dc27ca7)
    3 0xe0d12d7469cc  (linux-vdso.so.1+0x9cc) (BuildId: 1a77697e9d723fe22246cfd7641b140c427b7e11)
    4 0xe0d12c88f1fc in __pthread_kill_implementation nptl/pthread_kill.c:43:17
    5 0xe0d12c84a678 in gsignal signal/../sysdeps/posix/raise.c:26:13
    6 0xe0d12c83712c in abort stdlib/abort.c:79:7
    7 0xe0d12d214724 in _zlog_assert_failed /home/ubuntu/frr-public/frr_public_private-libfuzzer/lib/zlog.c:789:2
    8 0xe0d12d1285e4 in stream_get /home/ubuntu/frr-public/frr_public_private-libfuzzer/lib/stream.c:324:3
    9 0xb50b9f8e47c4 in bgp_attr_encap /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_attr.c:2758:3
    10 0xb50b9f8dcd38 in bgp_attr_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_attr.c:3783:10
    11 0xb50b9faf74b4 in bgp_update_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:2383:20
    12 0xb50b9faf1dcc in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4075:11
    13 0xb50b9f8c90d0 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3
```

Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 0998b38e4d61179441f90dd7e7fd6a3a8b7bd8c5)

9 months agoMerge pull request #16493 from FRRouting/mergify/bp/stable/10.1/pr-16491
Donald Sharp [Mon, 29 Jul 2024 11:55:47 +0000 (07:55 -0400)]
Merge pull request #16493 from FRRouting/mergify/bp/stable/10.1/pr-16491

bgpd: Do not process VRF import/export to/from auto created VRF instances (backport #16491)

9 months agobgpd: Do not process VRF import/export to/from auto created VRF instances
Donatas Abraitis [Sun, 28 Jul 2024 11:26:13 +0000 (14:26 +0300)]
bgpd: Do not process VRF import/export to/from auto created VRF instances

Fixes the crash:

```
(gdb) bt
0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=124583315603008) at ./nptl/pthread_kill.c:44
1  __pthread_kill_internal (signo=11, threadid=124583315603008) at ./nptl/pthread_kill.c:78
2  __GI___pthread_kill (threadid=124583315603008, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
3  0x0000714ed0242476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
4  0x0000714ed074cfb7 in core_handler (signo=11, siginfo=0x7ffe6d9792b0, context=0x7ffe6d979180) at lib/sigevent.c:258
5  <signal handler called>
6  0x000060f55e33ffdd in route_table_get_info (table=0x0) at ./lib/table.h:177
7  0x000060f55e340053 in bgp_dest_table (dest=0x60f56dabb840) at ./bgpd/bgp_table.h:156
8  0x000060f55e340c9f in is_route_injectable_into_vpn (pi=0x60f56dbc4a60) at ./bgpd/bgp_mplsvpn.h:331
9  0x000060f55e34507c in vpn_leak_from_vrf_update (to_bgp=0x60f56da52070, from_bgp=0x60f56da75af0, path_vrf=0x60f56dbc4a60) at bgpd/bgp_mplsvpn.c:1575
10 0x000060f55e346657 in vpn_leak_from_vrf_update_all (to_bgp=0x60f56da52070, from_bgp=0x60f56da75af0, afi=AFI_IP) at bgpd/bgp_mplsvpn.c:2028
11 0x000060f55e340c10 in vpn_leak_postchange (direction=BGP_VPN_POLICY_DIR_TOVPN, afi=AFI_IP, bgp_vpn=0x60f56da52070, bgp_vrf=0x60f56da75af0) at ./bgpd/bgp_mplsvpn.h:310
12 0x000060f55e34a692 in vpn_leak_postchange_all () at bgpd/bgp_mplsvpn.c:3737
13 0x000060f55e3d91fc in router_bgp (self=0x60f55e5cbc20 <router_bgp_cmd>, vty=0x60f56e2d7660, argc=3, argv=0x60f56da19830) at bgpd/bgp_vty.c:1601
14 0x0000714ed069ddf5 in cmd_execute_command_real (vline=0x60f56da32a80, vty=0x60f56e2d7660, cmd=0x0, up_level=0) at lib/command.c:1002
15 0x0000714ed069df6e in cmd_execute_command (vline=0x60f56da32a80, vty=0x60f56e2d7660, cmd=0x0, vtysh=0) at lib/command.c:1061
16 0x0000714ed069e51e in cmd_execute (vty=0x60f56e2d7660, cmd=0x60f56dbf07d0 "router bgp 100\n", matched=0x0, vtysh=0) at lib/command.c:1227
17 0x0000714ed076faa0 in vty_command (vty=0x60f56e2d7660, buf=0x60f56dbf07d0 "router bgp 100\n") at lib/vty.c:616
18 0x0000714ed07719c4 in vty_execute (vty=0x60f56e2d7660) at lib/vty.c:1379
19 0x0000714ed07740f0 in vtysh_read (thread=0x7ffe6d97c700) at lib/vty.c:2374
20 0x0000714ed07685c4 in event_call (thread=0x7ffe6d97c700) at lib/event.c:1995
21 0x0000714ed06e3351 in frr_run (master=0x60f56d1d2e40) at lib/libfrr.c:1232
22 0x000060f55e2c4b44 in main (argc=7, argv=0x7ffe6d97c978) at bgpd/bgp_main.c:555
(gdb)
```

Fixes https://github.com/FRRouting/frr/issues/16484

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 04f9372409a11a59dafbbf8423f0cf832b99cf0e)

9 months agoMerge pull request #16479 from opensourcerouting/fix/backpressue_10.1
Donald Sharp [Fri, 26 Jul 2024 14:23:31 +0000 (10:23 -0400)]
Merge pull request #16479 from opensourcerouting/fix/backpressue_10.1

backpressure backports for dev/10.1

9 months agoMerge pull request #16476 from FRRouting/mergify/bp/dev/10.1/pr-16472
Donald Sharp [Fri, 26 Jul 2024 11:33:45 +0000 (07:33 -0400)]
Merge pull request #16476 from FRRouting/mergify/bp/dev/10.1/pr-16472

pimd: Fix msdp setting of sa->rp (backport #16472)

9 months agobgpd: backpressure - fix evpn route sync to zebra
Chirag Shah [Mon, 17 Jun 2024 20:58:03 +0000 (13:58 -0700)]
bgpd: backpressure - fix evpn route sync to zebra

In scaled EVPN + ipv4/ipv6 uni route sync to zebra,
some of the ipv4/ipv6 routes skipped reinstallation
due to incorrect local variable's stale value.

Once the local variable value reset in each loop
iteration all skipped routes synced to zebra properly.

Ticket: #3948828

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
9 months agobgpd: backpressure - log error for evpn when route install to zebra fails.
Rajasekar Raja [Fri, 19 Jul 2024 05:23:23 +0000 (22:23 -0700)]
bgpd: backpressure - log error for evpn when route install to zebra fails.

log error for evpn in case route install to zebra fails.

Ticket :#3992392

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
9 months agobgpd: backpressure - fix ret value evpn_route_select_install
Rajasekar Raja [Wed, 17 Jul 2024 06:34:15 +0000 (23:34 -0700)]
bgpd: backpressure - fix ret value evpn_route_select_install

The return value of evpn_route_select_install is ignored in all cases
except during vni route table install/uninstall and based on the
returned value, an error is logged. Fixing this.

Ticket :#3992392

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
9 months agopimd: Fix msdp setting of sa->rp
Donald Sharp [Thu, 25 Jul 2024 11:50:32 +0000 (07:50 -0400)]
pimd: Fix msdp setting of sa->rp

The code is clearly incorrect.  After consultation with
the original author this is the decided change.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit c4b4c242ec8cfcdb23f0f90faaa0ff76577e1364)

9 months agoMerge pull request #16458 from FRRouting/mergify/bp/dev/10.1/pr-16451
Donatas Abraitis [Thu, 25 Jul 2024 10:25:14 +0000 (13:25 +0300)]
Merge pull request #16458 from FRRouting/mergify/bp/dev/10.1/pr-16451

lib: mgmtd: fix too early daemon detach of mgmtd (backport #16451)

9 months agolib: mgmtd: fix too early daemon detach of mgmtd
Christian Hopps [Tue, 23 Jul 2024 21:42:07 +0000 (17:42 -0400)]
lib: mgmtd: fix too early daemon detach of mgmtd

Correct FRR startup counts on a daemon's vty socket to be open when the
parent process exits. The parent process waits for `frr_check_detach()`
to be called by the child before exiting. The problem is when the
`FRR_MANUAL_VTY_START` flag is set the vty socket was not opened but
`frr_check_detach()` was called anyway.

Instead add a bool option for `frr_check_detach()` to be called when the
socket is opened with `frr_vty_serv_start()`, and do so when "manually"
calling said function (i.e., when FRR_MANUAL_VTY_START is set).

The `FRR_MANUAL_VTY_START` flag is only set by mgmtd. The reason we
wait to open the vty socket is so that mgmtd can parse the various
daemon specific config files it has taken over, after the event loop has
started, but before we receive any possible new config from `vtysh`.

fixes #16362

Signed-off-by: Christian Hopps <chopps@labn.net>
(cherry picked from commit be9a6fc0ea8180a4aaa558c5402ea543427e2e7e)

9 months agoMerge pull request #16453 from FRRouting/mergify/bp/dev/10.1/pr-16428
Donald Sharp [Wed, 24 Jul 2024 14:20:57 +0000 (10:20 -0400)]
Merge pull request #16453 from FRRouting/mergify/bp/dev/10.1/pr-16428

yang: Added missed prefix to the frr-deviations-ietf-routing yang file (backport #16428)

9 months agoyang: Added missed prefix to the yang file
Y Bharath [Mon, 22 Jul 2024 11:17:54 +0000 (16:47 +0530)]
yang: Added missed prefix to the yang file

Corrected warning by including the module

Signed-off-by: y-bharath14 <y.bharath@samsung.com>
(cherry picked from commit ed832b70ec0ea7ad0cf22a844de2151592b5b0e8)

9 months agoMerge pull request #16442 from FRRouting/mergify/bp/dev/10.1/pr-16330
Jafar Al-Gharaibeh [Wed, 24 Jul 2024 04:47:58 +0000 (00:47 -0400)]
Merge pull request #16442 from FRRouting/mergify/bp/dev/10.1/pr-16330

zebra: Properly note that a nhg's nexthop has gone down (backport #16330)

9 months agoMerge pull request #16443 from FRRouting/mergify/bp/dev/10.1/pr-16376
Russ White [Tue, 23 Jul 2024 22:11:46 +0000 (18:11 -0400)]
Merge pull request #16443 from FRRouting/mergify/bp/dev/10.1/pr-16376

ospfd: fix internal ldp-sync state flags when feature is disabled (backport #16376)

9 months agoMerge pull request #16441 from FRRouting/mergify/bp/dev/10.1/pr-16437
Donatas Abraitis [Tue, 23 Jul 2024 20:38:23 +0000 (23:38 +0300)]
Merge pull request #16441 from FRRouting/mergify/bp/dev/10.1/pr-16437

bgpd: backpressure - Avoid use after free (backport #16437)

9 months agoospfd: fix internal ldp-sync state flags when feature is disabled
Christian Breunig [Sat, 13 Jul 2024 06:43:36 +0000 (08:43 +0200)]
ospfd: fix internal ldp-sync state flags when feature is disabled

When enabling "mpls ldp-sync" under "router ospf" ospfd configures
SET_FLAG(ldp_sync_info->flags, LDP_SYNC_FLAG_IF_CONFIG) so internally knowing
that the ldp-sync feature is enabled. However the flag is not cleared when
turning of the feature using "nompls ldp-sync"!

https://github.com/FRRouting/frr/issues/16375

Signed-off-by: Christian Breunig <christian@breunig.cc>
(cherry picked from commit 5a70378a47f541b0354fbb96770dd0a65ec552b8)

9 months agozebra: Properly note that a nhg's nexthop has gone down
Donald Sharp [Wed, 7 Feb 2024 19:56:15 +0000 (14:56 -0500)]
zebra: Properly note that a nhg's nexthop has gone down

Current code when a link is set down is to just mark the
nexthop group as not properly setup.  Leaving situations
where when an interface goes down and show output is
entered we see incorrect state.  This is true for anything
that would be checking those flags at that point in time.

Modify the interface down nexthop group code to notice the
nexthops appropriately ( and I mean set the appropriate flags )
and to allow a `show ip route` command to actually display
what is going on with the nexthops.

eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:00:06 ago
  * 192.168.44.33, via dummy1, weight 1
  * 192.168.45.33, via dummy2, weight 1

sharpd@eva:~/frr1$ sudo ip link set dummy2 down

eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:00:12 ago
  * 192.168.44.33, via dummy1, weight 1
    192.168.45.33, via dummy2 inactive, weight 1

Notice now that the 1.0.0.0/32 route now correctly
displays the route for the nexthop group entry.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 266b0619942edd8d838235e14dc0cb71a772f585)

9 months agobgpd: backpressure - Avoid use after free
Rajasekar Raja [Mon, 22 Jul 2024 17:13:19 +0000 (10:13 -0700)]
bgpd: backpressure - Avoid use after free

Coverity complains there is a use after free (1598495 and 1598496)
At this point, most likely dest->refcount cannot go 1 and free up
the dest, but there might be some code path where this can happen.

Fixing this with a simple order change (no harm fix).

Ticket :#4001204

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit 40965e599975b019bbe6f4b1dfb3ff22d8980876)

9 months agoMerge pull request #16433 from FRRouting/mergify/bp/dev/10.1/pr-16309
Donald Sharp [Tue, 23 Jul 2024 14:45:11 +0000 (10:45 -0400)]
Merge pull request #16433 from FRRouting/mergify/bp/dev/10.1/pr-16309

pimd: fix crash on non-existent interface (backport #16309)

9 months agoMerge pull request #16430 from FRRouting/mergify/bp/dev/10.1/pr-16429
Jafar Al-Gharaibeh [Mon, 22 Jul 2024 18:02:32 +0000 (14:02 -0400)]
Merge pull request #16430 from FRRouting/mergify/bp/dev/10.1/pr-16429

lib: move non-error from __log_err to __dbg (backport #16429)

9 months agopimd: fix crash on non-existent interface
Louis Scalbert [Fri, 28 Jun 2024 11:22:36 +0000 (13:22 +0200)]
pimd: fix crash on non-existent interface

Fix the following crash when pim options are (un)configured on an
non-existent interface.

> r1(config)# int fgljdsf
> r1(config-if)# no ip pim unicast-bsm
> vtysh: error reading from pimd: Connection reset by peer (104)Warning: closing connection to pimd because of an I/O error!

> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f70c8f32249 in core_handler (signo=11, siginfo=0x7fffff88e4f0, context=0x7fffff88e3c0) at lib/sigevent.c:258
> #2  <signal handler called>
> #3  0x0000556cfdd9b16d in lib_interface_pim_address_family_unicast_bsm_modify (args=0x7fffff88f130) at pimd/pim_nb_config.c:1910
> #4  0x00007f70c8efdcb5 in nb_callback_modify (context=0x556d00032b60, nb_node=0x556cffeeb9b0, event=NB_EV_APPLY, dnode=0x556d00031670, resource=0x556d00032b48, errmsg=0x7fffff88f710 "", errmsg_len=8192)
>     at lib/northbound.c:1538
> #5  0x00007f70c8efe949 in nb_callback_configuration (context=0x556d00032b60, event=NB_EV_APPLY, change=0x556d00032b10, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1888
> #6  0x00007f70c8efee82 in nb_transaction_process (event=NB_EV_APPLY, transaction=0x556d00032b60, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:2016
> #7  0x00007f70c8efd658 in nb_candidate_commit_apply (transaction=0x556d00032b60, save_transaction=true, transaction_id=0x0, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1356
> #8  0x00007f70c8efd78e in nb_candidate_commit (context=..., candidate=0x556cffeb0e80, save_transaction=true, comment=0x0, transaction_id=0x0, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1389
> #9  0x00007f70c8f03e58 in nb_cli_classic_commit (vty=0x556d00025a80) at lib/northbound_cli.c:51
> #10 0x00007f70c8f043f8 in nb_cli_apply_changes_internal (vty=0x556d00025a80,
>     xpath_base=0x7fffff893bb0 "/frr-interface:lib/interface[name='fgljdsf']/frr-pim:pim/address-family[address-family='frr-routing:ipv4']", clear_pending=false) at lib/northbound_cli.c:178
> #11 0x00007f70c8f0475d in nb_cli_apply_changes (vty=0x556d00025a80, xpath_base_fmt=0x556cfdde9fe0 "./frr-pim:pim/address-family[address-family='%s']") at lib/northbound_cli.c:234
> #12 0x0000556cfdd8298f in pim_process_no_unicast_bsm_cmd (vty=0x556d00025a80) at pimd/pim_cmd_common.c:3493
> #13 0x0000556cfddcf782 in no_ip_pim_ucast_bsm (self=0x556cfde40b20 <no_ip_pim_ucast_bsm_cmd>, vty=0x556d00025a80, argc=4, argv=0x556d00031500) at pimd/pim_cmd.c:4950
> #14 0x00007f70c8e942f0 in cmd_execute_command_real (vline=0x556d00032070, vty=0x556d00025a80, cmd=0x0, up_level=0) at lib/command.c:1002
> #15 0x00007f70c8e94451 in cmd_execute_command (vline=0x556d00032070, vty=0x556d00025a80, cmd=0x0, vtysh=0) at lib/command.c:1061
> #16 0x00007f70c8e9499f in cmd_execute (vty=0x556d00025a80, cmd=0x556d00030320 "no ip pim unicast-bsm", matched=0x0, vtysh=0) at lib/command.c:1227
> #17 0x00007f70c8f51e44 in vty_command (vty=0x556d00025a80, buf=0x556d00030320 "no ip pim unicast-bsm") at lib/vty.c:616
> #18 0x00007f70c8f53bdd in vty_execute (vty=0x556d00025a80) at lib/vty.c:1379
> #19 0x00007f70c8f55d59 in vtysh_read (thread=0x7fffff896600) at lib/vty.c:2374
> #20 0x00007f70c8f4b209 in event_call (thread=0x7fffff896600) at lib/event.c:2011
> #21 0x00007f70c8ed109e in frr_run (master=0x556cffdb4ea0) at lib/libfrr.c:1217
> #22 0x0000556cfdddec12 in main (argc=2, argv=0x7fffff896828, envp=0x7fffff896840) at pimd/pim_main.c:165
> (gdb) f 3
> #3  0x0000556cfdd9b16d in lib_interface_pim_address_family_unicast_bsm_modify (args=0x7fffff88f130) at pimd/pim_nb_config.c:1910
> 1910 pim_ifp->ucast_bsm_accept =
> (gdb) list
> 1905 case NB_EV_ABORT:
> 1906 break;
> 1907 case NB_EV_APPLY:
> 1908 ifp = nb_running_get_entry(args->dnode, NULL, true);
> 1909 pim_ifp = ifp->info;
> 1910 pim_ifp->ucast_bsm_accept =
> 1911 yang_dnode_get_bool(args->dnode, NULL);
> 1912
> 1913 break;
> 1914 }
> (gdb) p pim_ifp
> $1 = (struct pim_interface *) 0x0

Fixes: 3bb513c399 ("lib: adapt to version 2 of libyang")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 6952bea5cdd38057bf8c0a5e9c0fbe916dc73953)

9 months agolib: move non-error from __log_err to __dbg
Christian Hopps [Mon, 22 Jul 2024 11:52:10 +0000 (07:52 -0400)]
lib: move non-error from __log_err to __dbg

Additionally, print `errmsg_if_any` in successful debug messages
if non-NULL.

fixes #16386 #16043

Signed-off-by: Christian Hopps <chopps@labn.net>
(cherry picked from commit 7afd7d99f2fa39be073625c630d46f96e5dd66a5)

9 months agoMerge pull request #16404 from FRRouting/mergify/bp/dev/10.1/pr-16368
Donatas Abraitis [Wed, 17 Jul 2024 05:08:00 +0000 (22:08 -0700)]
Merge pull request #16404 from FRRouting/mergify/bp/dev/10.1/pr-16368

bgpd: backpressure - fix to properly remove dest for bgp under deletion (backport #16368)

9 months agoMerge pull request #16399 from FRRouting/mergify/bp/dev/10.1/pr-16374
Jafar Al-Gharaibeh [Wed, 17 Jul 2024 00:07:15 +0000 (20:07 -0400)]
Merge pull request #16399 from FRRouting/mergify/bp/dev/10.1/pr-16374

bgpd: Mark VRF instance as auto created if import vrf is configured for this instance (backport #16374)

9 months agoMerge pull request #16379 from FRRouting/mergify/bp/dev/10.1/pr-16350
Donatas Abraitis [Tue, 16 Jul 2024 23:00:05 +0000 (16:00 -0700)]
Merge pull request #16379 from FRRouting/mergify/bp/dev/10.1/pr-16350

zebra: Fix to avoid two Vrfs with same table ids (backport #16350)

9 months agoMerge pull request #16395 from FRRouting/mergify/bp/dev/10.1/pr-16365
Donatas Abraitis [Tue, 16 Jul 2024 22:51:01 +0000 (15:51 -0700)]
Merge pull request #16395 from FRRouting/mergify/bp/dev/10.1/pr-16365

isisd: fix crash when calculating the neighbor spanning tree based on the fragmented LSP (backport #16365)

9 months agobgpd: backpressure - Improve debuggability
Rajasekar Raja [Thu, 11 Jul 2024 03:17:14 +0000 (20:17 -0700)]
bgpd: backpressure - Improve debuggability

Improve debuggability in backpressure code.

Ticket :#3980988

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit 186db96c06e4f44b4450fcba88f0fa680ee0b92d)

9 months agobgpd: backpressure - fix to properly remove dest for bgp under deletion
Rajasekar Raja [Wed, 10 Jul 2024 23:46:29 +0000 (16:46 -0700)]
bgpd: backpressure - fix to properly remove dest for bgp under deletion

In case of imported routes (L3vni/vrf leaks), when a bgp instance is
being deleted, the peer->bgp comparision with the incoming bgp to remove
the dest from the pending fifo is wrong. This can lead to the fifo
having stale entries resulting in crash.

Two changes are done here.
 - Instead of pop/push items in list if the struct bgp doesnt match,
   simply iterate the list and remove the expected ones.

 - Corrected the way bgp is fetched from dest rather than relying on
   path_info->peer so that it works for all kinds of routes.

Ticket :#3980988

Signed-off-by: Chirag Shah <chirag @nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit 4395fcd8e120958a91d3a11f918e9071b1cb5619)

9 months agobgpd: Skip empty (auto created) VRF instances when deleting a default BGP instance
Donatas Abraitis [Mon, 15 Jul 2024 13:20:31 +0000 (16:20 +0300)]
bgpd: Skip empty (auto created) VRF instances when deleting a default BGP instance

Auto created VRF instances does not have any config, so it's not relevant
depending on them.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit bfedb38110e8d3e5471718a0f9abe8836ffc7143)

9 months agotests: Check if VRF instance has a different ASN than a default VRF
Donatas Abraitis [Sat, 13 Jul 2024 10:14:33 +0000 (13:14 +0300)]
tests: Check if VRF instance has a different ASN than a default VRF

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit c6c0403c61c157a9507781c332e152a2b220da52)

9 months agobgpd: Skip automatically created BGP instances for show CMDs
Donatas Abraitis [Sat, 13 Jul 2024 20:19:57 +0000 (23:19 +0300)]
bgpd: Skip automatically created BGP instances for show CMDs

When using e.g. `adverise-all-vni`, and/or `import vrf ...`, the VRF instance
is created with a default's VRF ASN and tagged as AUTO_VRF. We MUST skip them
here also.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 03c086866bdee9daf55420b88593345b9eb6be15)

9 months agotests: Check if multiple VRF instances can have different ASNs
Donatas Abraitis [Sat, 13 Jul 2024 09:43:31 +0000 (12:43 +0300)]
tests: Check if multiple VRF instances can have different ASNs

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 7540364e58b08da7442927c1a9ffbd535d94fc46)

9 months agobgpd: Mark VRF instance as auto created if import vrf is configured for this instance
Donatas Abraitis [Fri, 12 Jul 2024 14:09:16 +0000 (17:09 +0300)]
bgpd: Mark VRF instance as auto created if import vrf is configured for this instance

If we create a new BGP instance (in this case VRF instance), it MUST be marked
as auto created, to avoid bgpd changing VRF instance's ASN to the default VRF's.

That's because of the ordering when FRR reload is happening.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 80a4f87c9a38d5e893f7e24da11cc0c885db682e)

9 months agoisisd: fix crash when calculating the neighbor spanning tree based on the fragmented LSP
zhou-run [Thu, 11 Jul 2024 03:35:34 +0000 (11:35 +0800)]
isisd: fix crash when calculating the neighbor spanning tree based on the fragmented LSP

1. When the root IS regenerates an LSP, it calls lsp_build() -> lsp_clear_data() to free the TLV memory of the first fragment and all other fragments. If the number of fragments in the regenerated LSP decreases or if no fragmentation is needed, the extra LSP fragments are not immediately deleted. Instead, lsp_seqno_update() -> lsp_purge() is called to set the remaining time to zero and start aging, while also notifying other IS nodes to age these fragments. lsp_purge() usually does not reset lsp->hdr.seqno to zero because the LSP might recover during the aging process.
2. When other IS nodes receive an LSP, they always call process_lsp() -> isis_unpack_tlvs() to allocate TLV memory for the LSP. This does not differentiate whether the received LSP has a remaining lifetime of zero. Therefore, it is rare for an LSP of a non-root IS to have empty TLVs. Of course, if an LSP with a remaining time of zero and already corrupted is received, lsp_update() -> lsp_purge() will be called to free the TLV memory of the LSP, but this scenario is rare.
3. In LFA calculations, neighbors of the root IS are traversed, and each neighbor is taken as a new root to compute the neighbor SPT. During this process, the old root IS will serve as a neighbor of the new root IS, triggering a call to isis_spf_process_lsp() to parse the LSP of the old root IS and obtain its IP vertices and neighboring IS vertices. However, isis_spf_process_lsp() only checks whether the TLVs in the first fragment of the LSP exist, and does not check the TLVs in the fragmented LSP. If the TLV memory of the fragmented LSP of the old root IS has been freed, it can lead to a null pointer access, causing the current crash.

Additionally, for the base SPT, there are only two places where the LSP of the root IS is parsed:
1. When obtaining the UP neighbors of the root IS via spf_adj_list_parse_lsp().
2. When preloading the IP vertices of the root IS via isis_lsp_iterate_ip_reach().
Both of these checks ensure that frag->tlvs is not null, and they do not subsequently call isis_spf_process_lsp() to parse the root IS's LSP. It is very rare for non-root IS LSPs to have empty TLVs unless they are corrupted LSPs awaiting deletion. If it happens, a crash will occur.

The backtrace is as follows:
(gdb) bt
#0  0x00007f3097281fe1 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f30973a2972 in core_handler (signo=11, siginfo=0x7ffce66c2870, context=0x7ffce66c2740) at ../lib/sigevent.c:261
#2  <signal handler called>
#3  0x000055dfa805512b in isis_spf_process_lsp (spftree=0x55dfa950eee0, lsp=0x55dfa94cb590, cost=10, depth=1, root_sysid=0x55dfa950ef6c "", parent=0x55dfa952fca0)
    at ../isisd/isis_spf.c:898
#4  0x000055dfa805743b in isis_spf_loop (spftree=0x55dfa950eee0, root_sysid=0x55dfa950ef6c "") at ../isisd/isis_spf.c:1688
#5  0x000055dfa805784f in isis_run_spf (spftree=0x55dfa950eee0) at ../isisd/isis_spf.c:1808
#6  0x000055dfa8037ff5 in isis_spf_run_neighbors (spftree=0x55dfa9474440) at ../isisd/isis_lfa.c:1259
#7  0x000055dfa803ac17 in isis_spf_run_lfa (area=0x55dfa9477510, spftree=0x55dfa9474440) at ../isisd/isis_lfa.c:2300
#8  0x000055dfa8057964 in isis_run_spf_with_protection (area=0x55dfa9477510, spftree=0x55dfa9474440) at ../isisd/isis_spf.c:1827
#9  0x000055dfa8057c15 in isis_run_spf_cb (thread=0x7ffce66c38e0) at ../isisd/isis_spf.c:1889
#10 0x00007f30973bbf04 in thread_call (thread=0x7ffce66c38e0) at ../lib/thread.c:1990
#11 0x00007f309735497b in frr_run (master=0x55dfa91733c0) at ../lib/libfrr.c:1198
#12 0x000055dfa8029d5d in main (argc=5, argv=0x7ffce66c3b08, envp=0x7ffce66c3b38) at ../isisd/isis_main.c:273
(gdb) f 3
#3  0x000055dfa805512b in isis_spf_process_lsp (spftree=0x55dfa950eee0, lsp=0x55dfa94cb590, cost=10, depth=1, root_sysid=0x55dfa950ef6c "", parent=0x55dfa952fca0)
    at ../isisd/isis_spf.c:898
898     ../isisd/isis_spf.c: No such file or directory.
(gdb) p te_neighs
$1 = (struct isis_item_list *) 0x120
(gdb) p lsp->tlvs
$2 = (struct isis_tlvs *) 0x0
(gdb) p lsp->hdr
$3 = {pdu_len = 27, rem_lifetime = 0, lsp_id = "\000\000\000\000\000\001\000\001", seqno = 4, checksum = 59918, lsp_bits = 1 '\001'}

The backtrace provided above pertains to version 8.5.4, but it seems that the same issue exists in the code of the master branch as well.

I have reviewed the process for calculating the SPT based on the LSP, and isis_spf_process_lsp() is the only function that does not check whether the TLVs in the fragments are empty. Therefore, I believe that modifying this function alone should be sufficient. If the TLVs of the current fragment are already empty, we do not need to continue processing subsequent fragments. This is consistent with the behavior where we do not process fragments if the TLVs of the first fragment are empty.
Of course, one could argue that lsp_purge() should still retain the TLV memory, freeing it and then reallocating it if needed. However, this is a debatable point because in some scenarios, it is permissible for the LSP to have empty TLVs. For example, after receiving an SNP (Sequence Number PDU) message, an empty LSP (with lsp->hdr.seqno = 0) might be created by calling lsp_new. If the corresponding LSP message is discarded due to domain or area authentication failure, the TLV memory wouldn't be allocated.

Test scenario:
In an LFA network, importing a sufficient number of static routes to cause LSP fragmentation, and then rolling back the imported static routes so that the LSP is no longer fragmented, can easily result in this issue.

Signed-off-by: zhou-run <zhou.run@h3c.com>
(cherry picked from commit e905177a8c9d67713682d46934c7a87a0913c250)

9 months agoMerge pull request #16387 from FRRouting/mergify/bp/dev/10.1/pr-16373
Mark Stapp [Tue, 16 Jul 2024 11:55:28 +0000 (07:55 -0400)]
Merge pull request #16387 from FRRouting/mergify/bp/dev/10.1/pr-16373

staticd: fix missing static routes (backport #16373)

9 months agozebra: fix missing static routes
anlan_cs [Fri, 12 Jul 2024 09:03:03 +0000 (17:03 +0800)]
zebra: fix missing static routes

Use `vtysh` with this input file:
```
ip route A nh1
ip route A nh2
ip route B nh1
ip route B nh2
```

When running "ip route B" with "nh1" and "nh2", the procedure maybe is:
1) Create the two nexthops: "nh1" and "nh2".
2) Register "nh1" with `static_zebra_nht_register()`, then the states of both
   "nh1" and "nht2" are set to "STATIC_SENT_TO_ZEBRA".
3) Register "nh2" with `static_zebra_nht_register()`, then only the routes with
   nexthop of "STATIC_START" will be sent to zebra.

So, send the routes with the nexthop of "STATIC_SENT_TO_ZEBRA" to zebra.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
(cherry picked from commit 4518d386f7683289b079708fcdb0c42ced4754d9)

9 months agoMerge pull request #16378 from FRRouting/mergify/bp/dev/10.1/pr-16363
Jafar Al-Gharaibeh [Mon, 15 Jul 2024 18:43:21 +0000 (14:43 -0400)]
Merge pull request #16378 from FRRouting/mergify/bp/dev/10.1/pr-16363

tests: tweak timers to avoid frequent failures on slow CI hardware (backport #16363)

9 months agozebra: Fix to avoid two Vrfs with same table ids
Rajasekar Raja [Fri, 5 Jul 2024 23:02:12 +0000 (16:02 -0700)]
zebra: Fix to avoid two Vrfs with same table ids

During internal testing, when the following sequence is followed, two
non default vrfs end up pointing to the same table-id

 - Initially vrf201 has table id 1002
 - ip link add dev vrf202 type vrf table 1002
 - ip link set dev vrf202 up
 - ip link set dev <intrerface> master vrf202

This will ideally lead to zebra exit since this is a misconfiguration as
expected.

However if we perform a restart frr.service at this point, we end up
having two vrfs pointing to same table-id and bad things can happen.
This is because in the interface_vrf_change, we incorrectly check for
vrf_lookup_by_id() to evaluate if there is a misconfig. This works well
for a non restart case but not for the startup case.

root@mlx-3700-20:mgmt:/var/log/frr# sudo vtysh -c "sh vrf"
vrf mgmt id 37 table 1001
vrf vrf201 id 46 table 1002
vrf vrf202 id 59 table 1002 >>>>

Fix: in all cases of misconfiguration, exit zebra as expected.

Ticket :#3970414

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit c77e15710d6a3a9be71f41a9ce608f06b2795dfb)

9 months agotests: tweak timers to avoid frequent failures on slow CI hardware
Jafar Al-Gharaibeh [Wed, 10 Jul 2024 19:18:51 +0000 (14:18 -0500)]
tests: tweak timers to avoid frequent failures on slow CI hardware

Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
(cherry picked from commit ad7a1f9487edc75fbfaf932e929a008c8bcbc4f9)

9 months agoMerge pull request #16335 from FRRouting/mergify/bp/dev/10.1/pr-16226
Donald Sharp [Wed, 3 Jul 2024 12:41:01 +0000 (08:41 -0400)]
Merge pull request #16335 from FRRouting/mergify/bp/dev/10.1/pr-16226

ldpd: fix wrong gtsm count (backport #16226)

9 months agoMerge pull request #16328 from FRRouting/mergify/bp/dev/10.1/pr-16303
Jafar Al-Gharaibeh [Tue, 2 Jul 2024 20:55:12 +0000 (16:55 -0400)]
Merge pull request #16328 from FRRouting/mergify/bp/dev/10.1/pr-16303

isisd: fix crash when obtaining the next hop to calculate LFA on LAN links (backport #16303)

9 months agoldpd: fix wrong gtsm count
anlan_cs [Sat, 15 Jun 2024 12:34:20 +0000 (20:34 +0800)]
ldpd: fix wrong gtsm count

In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR #8313

Signed-off-by: anlan_cs <vic.lan@pica8.com>
(cherry picked from commit 1919df3a64d3fe6d4084c1d0b050b3e368860170)

9 months agoisisd: fix crash when obtaining the next hop to calculate LFA on LAN links
zhou-run [Thu, 27 Jun 2024 03:51:02 +0000 (11:51 +0800)]
isisd: fix crash when obtaining the next hop to calculate LFA on LAN links

When a neighbor connection is disconnected, it may trigger LSP re-generation as a timer task, but this process may be delayed. As a result, the list of neighbors in area->adjacency_list may be inconsistent with the neighbors in lsp->tlvs->oldstyle_reach/extended_reach. For example, the area->adjacency_list may lack certain neighbors even though they are present in the LSP. When computing SPF, the call to isis_spf_build_adj_list() generates the spftree->sadj_list, which reflects the real neighbors in the area->adjacency_list. However, in the case of LAN links, spftree->sadj_list may include additional pseudo neighbors.
The pre-loading of tents through the call to isis_spf_preload_tent involves two steps:
1. isis_spf_process_lsp() is called to generate real neighbor vertices based on the root LSP and pseudo LSP.
2. isis_spf_add_local() is called to add corresponding next hops to the vertex->Adj_N list for the real neighbor vertices.
In the case of LAN links, the absence of corresponding real neighbors in the spftree->sadj_list prevents the execution of the second step. Consequently, the vertex->Adj_N list for the real neighbor vertices lacks corresponding next hops. This leads to a null pointer access when isis_lfa_compute() is called to calculate LFA.
As for P2P links, since there are no pseudo neighbors, only the second step is executed, which does not create real neighbor vertices and therefore does not encounter this issue.
The backtrace is as follows:
(gdb) bt
#0  0x00007fd065277fe1 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fd065398972 in core_handler (signo=11, siginfo=0x7ffc5c0636b0, context=0x7ffc5c063580) at ../lib/sigevent.c:261
#2  <signal handler called>
#3  0x00005564d82f8408 in isis_lfa_compute (area=0x5564d8b143f0, circuit=0x5564d8b21d10, spftree=0x5564d8b06bf0, resource=0x7ffc5c064410) at ../isisd/isis_lfa.c:2134
#4  0x00005564d82f8d78 in isis_spf_run_lfa (area=0x5564d8b143f0, spftree=0x5564d8b06bf0) at ../isisd/isis_lfa.c:2344
#5  0x00005564d8315964 in isis_run_spf_with_protection (area=0x5564d8b143f0, spftree=0x5564d8b06bf0) at ../isisd/isis_spf.c:1827
#6  0x00005564d8315c15 in isis_run_spf_cb (thread=0x7ffc5c064590) at ../isisd/isis_spf.c:1889
#7  0x00007fd0653b1f04 in thread_call (thread=0x7ffc5c064590) at ../lib/thread.c:1990
#8  0x00007fd06534a97b in frr_run (master=0x5564d88103c0) at ../lib/libfrr.c:1198
#9  0x00005564d82e7d5d in main (argc=5, argv=0x7ffc5c0647b8, envp=0x7ffc5c0647e8) at ../isisd/isis_main.c:273
(gdb) f 3
#3  0x00005564d82f8408 in isis_lfa_compute (area=0x5564d8b143f0, circuit=0x5564d8b21d10, spftree=0x5564d8b06bf0, resource=0x7ffc5c064410) at ../isisd/isis_lfa.c:2134
2134    ../isisd/isis_lfa.c: No such file or directory.
(gdb) p vadj_primary
$1 = (struct isis_vertex_adj *) 0x0
(gdb) p vertex->Adj_N->head
$2 = (struct listnode *) 0x0
(gdb) p (struct isis_vertex *)spftree->paths->l.list->head->next->next->next->next->data
$8 = (struct isis_vertex *) 0x5564d8b5b240
(gdb) p $8->type
$9 = VTYPE_NONPSEUDO_TE_IS
(gdb) p $8->N.id
$10 = "\000\000\000\000\000\002"
(gdb) p $8->Adj_N->count
$11 = 0
(gdb) p (struct isis_vertex *)spftree->paths->l.list->head->next->next->next->next->next->data
$12 = (struct isis_vertex *) 0x5564d8b73dd0
(gdb) p $12->type
$13 = VTYPE_NONPSEUDO_TE_IS
(gdb) p $12->N.id
$14 = "\000\000\000\000\000\003"
(gdb) p $12->Adj_N->count
$15 = 0
(gdb) p area->adjacency_list->count
$16 = 0
The backtrace provided above pertains to version 8.5.4, but it seems that the same issue exists in the code of the master branch as well.
The scenario where a vertex has no next hop is normal. For example, the "clear isis neighbor" command invokes isis_vertex_adj_del() to delete the next hop of a vertex. Upon reviewing all the instances where the vertex->Adj_N list is used, I found that only isis_lfa_compute() lacks a null check. Therefore, I believe that modifying this part will be sufficient. Additionally, the vertex->parents list for IP vertices is guaranteed not to be empty.
Test scenario:
Setting up LFA for LAN links and executing the "clear isis neighbor" command easily reproduces the issue.

Signed-off-by: zhou-run <zhou.run@h3c.com>
(cherry picked from commit a970bb51b5fe32335c783860a03bb02ce74a49aa)

9 months agoMerge pull request #16312 from FRRouting/mergify/bp/dev/10.1/pr-16305
Russ White [Tue, 2 Jul 2024 12:00:43 +0000 (08:00 -0400)]
Merge pull request #16312 from FRRouting/mergify/bp/dev/10.1/pr-16305

bgpd: Ignore RFC8212 for BGP Confederations (backport #16305)

9 months agoMerge pull request #16318 from FRRouting/mergify/bp/dev/10.1/pr-16233
Russ White [Tue, 2 Jul 2024 11:59:50 +0000 (07:59 -0400)]
Merge pull request #16318 from FRRouting/mergify/bp/dev/10.1/pr-16233

ripd/ripd.c - rip_auth_md5 : Change the start value of sequence 1 to 0 (backport #16233)

9 months agoripd: Change the start value of sequence 1 to 0
T-Nicolas [Mon, 17 Jun 2024 13:05:58 +0000 (15:05 +0200)]
ripd: Change the start value of sequence 1 to 0

Signed-off-by: T-Nicolas <github@toselli.email>
(cherry picked from commit 1a64fe4254759245a67fb279d67478922e00255e)

9 months agotests: Test if RFC 8212 is not involved for BGP confederations
Donatas Abraitis [Thu, 27 Jun 2024 19:53:24 +0000 (22:53 +0300)]
tests: Test if RFC 8212 is not involved for BGP confederations

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit dd6a679e3a0e9415827643942bcc103c48a89adb)

9 months agobgpd: Ignore RFC8212 for BGP Confederations
Donatas Abraitis [Thu, 27 Jun 2024 19:46:58 +0000 (22:46 +0300)]
bgpd: Ignore RFC8212 for BGP Confederations

RFC 8212 should be restricted for eBGP peers.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit fa2cc09d45d3f843564f7bd1e02346373c5741a8)

9 months agoMerge pull request #16306 from FRRouting/mergify/bp/dev/10.1/pr-16068
Donatas Abraitis [Sun, 30 Jun 2024 16:03:38 +0000 (18:03 +0200)]
Merge pull request #16306 from FRRouting/mergify/bp/dev/10.1/pr-16068

bgpd: Ignore routes from evpn if VRF is unknown (backport #16068)

10 months agobgpd: Ignore routes from evpn if VRF is unknown
Piotr Suchy [Wed, 22 May 2024 08:41:52 +0000 (10:41 +0200)]
bgpd: Ignore routes from evpn if VRF is unknown

Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here https://github.com/FRRouting/frr/issues/13708

Signed-off-by: Piotr Suchy <psuchy@akamai.com>
(cherry picked from commit 8044d733009dd428c291460eb8b0e539b53b78fa)

10 months agoMerge pull request #16302 from FRRouting/mergify/bp/dev/10.1/pr-16271
Donald Sharp [Thu, 27 Jun 2024 16:16:28 +0000 (12:16 -0400)]
Merge pull request #16302 from FRRouting/mergify/bp/dev/10.1/pr-16271

bgpd: avoid clearing routes for peers that were never established (backport #16271)

10 months agobgpd: avoid clearing routes for peers that were never established
Loïc Sang [Wed, 19 Jun 2024 14:19:22 +0000 (16:19 +0200)]
bgpd: avoid clearing routes for peers that were never established

Under heavy system load with many peers in passive mode and a large
number of routes, bgpd can enter an infinite loop. This occurs while
processing timeout BGP_OPEN messages, which prevents it from accepting
new connections. The following log entries illustrate the issue:
>bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0
>bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224
>bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224
... repeating

The issue occurs when bgpd handles a massive number of routes in the RIB
while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it
fails to process these packets promptly, leading the remote peer to
close the connection and resend BGP_OPEN packets.

When bgpd eventually starts processing these timeout BGP_OPEN packets,
it finds the TCP connection closed by the remote peer, resulting in
"bgp_stop()" being called. For each timeout peer, bgpd must iterate
through the routing table, which is time-consuming and causes new
incoming BGP_OPEN packets to timeout, perpetuating the infinite loop.

To address this issue, the code is modified to check if the peer has
been established at least once before calling "bgp_clear_route_all()".
This ensures that routes are only cleared for peers that had a
successful session, preventing unnecessary iterations over the routing
table for peers that never established a connection.

With this change, BGP_OPEN timeout messages may still occur, but in the
worst case, bgpd will stabilize. Before this patch, bgpd could enter a
loop where it was unable to accpet any new connections.

Signed-off-by: Loïc Sang <loic.sang@6wind.com>
(cherry picked from commit e0ae285eb8beeef7b43bdadc073d8ae346eaeb6c)

10 months agoMerge pull request #16285 from FRRouting/mergify/bp/dev/10.1/pr-15838
Russ White [Tue, 25 Jun 2024 12:42:02 +0000 (08:42 -0400)]
Merge pull request #16285 from FRRouting/mergify/bp/dev/10.1/pr-15838

 bgpd: fix "bgp as-pah access-list" with "set aspath exclude" set/unset issue (backport #15838)

10 months agoMerge pull request #16284 from FRRouting/mergify/bp/dev/10.1/pr-16261
Donatas Abraitis [Tue, 25 Jun 2024 11:49:39 +0000 (14:49 +0300)]
Merge pull request #16284 from FRRouting/mergify/bp/dev/10.1/pr-16261

zebra: clear evpn dup-addr return error-msg when there is no vni (backport #16261)

10 months agoMerge pull request #16291 from FRRouting/mergify/bp/dev/10.1/pr-16214
Russ White [Tue, 25 Jun 2024 11:30:45 +0000 (07:30 -0400)]
Merge pull request #16291 from FRRouting/mergify/bp/dev/10.1/pr-16214

bgpd: A couple more fixes for Tunnel encapsulation handling (backport #16214)

10 months agoMerge pull request #16289 from FRRouting/mergify/bp/dev/10.1/pr-16273
Russ White [Tue, 25 Jun 2024 11:30:24 +0000 (07:30 -0400)]
Merge pull request #16289 from FRRouting/mergify/bp/dev/10.1/pr-16273

bgpd: Relax OAD (One-Administration-Domain) for RFC8212 (backport #16273)

10 months agobgpd: Check if we have real stream data for tunnel encapsulation sub-tlvs
Donatas Abraitis [Thu, 13 Jun 2024 06:00:21 +0000 (09:00 +0300)]
bgpd: Check if we have real stream data for tunnel encapsulation sub-tlvs

When the packet is malformed it can use whatever values it wants. Let's check
what the real data we have in a stream instead of relying on malformed values.

Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 9929486d6bdb28469a5b626a17d5bc9991c83ce3)

10 months agobgpd: Adjust the length of tunnel encap sub-tlv by sub-tlv type
Donatas Abraitis [Thu, 13 Jun 2024 05:43:21 +0000 (08:43 +0300)]
bgpd: Adjust the length of tunnel encap sub-tlv by sub-tlv type

Fixes: 79563af564ad0fe5b9c8d95bf080d570f87b1859 ("bgpd: Get 1 or 2 octets for Sub-TLV length (Tunnel Encap attr)")
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 34b209f0ae2caca0d1ebcde9d4095375ac31b562)

10 months agobgpd: Relax OAD (One-Administration-Domain) for RFC8212
Donatas Abraitis [Mon, 24 Jun 2024 17:16:16 +0000 (20:16 +0300)]
bgpd: Relax OAD (One-Administration-Domain) for RFC8212

RFC 8212 defines leak prevention for eBGP peers, but BGP-OAD defines a new
peering type One Administrative Domain (OAD), where multiple ASNs could be used
inside a single administrative domain. OAD allows sending non-transitive attributes,
so this prevention should be relaxed too.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 3b98ddf5018cf7526b50c15018cbaf71a38fa752)

10 months agoMerge pull request #16281 from FRRouting/mergify/bp/dev/10.1/pr-16213
Donatas Abraitis [Tue, 25 Jun 2024 10:48:18 +0000 (13:48 +0300)]
Merge pull request #16281 from FRRouting/mergify/bp/dev/10.1/pr-16213

bgpd: Check if we have really enough data before doing memcpy for FQDN capability (backport #16213)

10 months agoMerge pull request #16278 from FRRouting/mergify/bp/dev/10.1/pr-16211
Donatas Abraitis [Tue, 25 Jun 2024 10:47:50 +0000 (13:47 +0300)]
Merge pull request #16278 from FRRouting/mergify/bp/dev/10.1/pr-16211

bgpd: Check if we have really enough data before doing memcpy for software version (backport #16211)

10 months agoMerge pull request #16239 from FRRouting/mergify/bp/dev/10.1/pr-16224
Donatas Abraitis [Tue, 25 Jun 2024 10:47:33 +0000 (13:47 +0300)]
Merge pull request #16239 from FRRouting/mergify/bp/dev/10.1/pr-16224

zebra: Prevent starvation in dplane_thread_loop (backport #16224)

10 months agoMerge pull request #16274 from FRRouting/mergify/bp/dev/10.1/pr-16242
Jafar Al-Gharaibeh [Tue, 25 Jun 2024 05:25:25 +0000 (01:25 -0400)]
Merge pull request #16274 from FRRouting/mergify/bp/dev/10.1/pr-16242

bgpd: Set last reset reason to admin shutdown if it was manually (backport #16242)

10 months agotests: improve tests for aspath exclude and bgp access list
Francois Dumontet [Wed, 24 Apr 2024 12:34:48 +0000 (14:34 +0200)]
tests: improve tests for aspath exclude and bgp access list

add some match in route map rules
add some set unset bgp access path list
add another prefix for better tests discrimination
update expected results

Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
(cherry picked from commit 0df2e149970beff39915d0095614d56d5859f3ff)

10 months agobgpd: fix "bgp as-pah access-list" with "set aspath exclude" set/unset issues
Francois Dumontet [Tue, 23 Apr 2024 09:16:24 +0000 (11:16 +0200)]
bgpd: fix "bgp as-pah access-list" with "set aspath exclude" set/unset issues

whith the following config

router bgp 65001
 no bgp ebgp-requires-policy
 neighbor 192.168.1.2 remote-as external
 neighbor 192.168.1.2 timers 3 10
 !
 address-family ipv4 unicast
  neighbor 192.168.1.2 route-map r2 in
 exit-address-family
exit
!
bgp as-path access-list FIRST seq 5 permit ^65
bgp as-path access-list SECOND seq 5 permit 2$
!
route-map r2 permit 6
 match ip address prefix-list p2
 set as-path exclude as-path-access-list SECOND
exit
!
route-map r2 permit 10
 match ip address prefix-list p1
 set as-path exclude 65003
exit
!
route-map r2 permit 20
 match ip address prefix-list p3
 set as-path exclude all
exit

making some
no bgp as-path access-list SECOND permit 2$
bgp as-path access-list SECOND permit 3$

clear bgp *

no bgp as-path access-list SECOND permit 3$
bgp as-path access-list SECOND permit 2$

clear bgp *

will induce some crashes

thus  we rework the links between aslists and aspath_exclude

Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
(cherry picked from commit 094dcc3cdac19d3da65b38effc45aa88d960909f)

10 months agozebra: clear evpn dup-addr return error-msg when there is no vni
Sindhu Parvathi Gopinathan [Wed, 19 Jun 2024 14:35:31 +0000 (07:35 -0700)]
zebra: clear evpn dup-addr return error-msg when there is no vni

clear evpn dup-addr cli returns error-msg for below conditions,

 - If evpn is not enabled &
 - If there is no VNI exists.

supported command:

```
clear evpn dup-addr vni <vni-id>
```

Ticket: #3495573

Testing:

bharat# clear evpn dup-addr vni all
Error type: validation
Error description: % EVPN not enabled

bharat# clear evpn dup-addr vni 20
Error type: validation
Error description: % VNI 20 does not exist

Signed-off-by: Sindhu Parvathi Gopinathan's <sgopinathan@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
(cherry picked from commit 56c16ee529b546058c8d1fabbb701d8ed2fded75)

10 months agobgpd: Check if we have really enough data before doing memcpy for FQDN capability
Donatas Abraitis [Thu, 13 Jun 2024 05:12:10 +0000 (08:12 +0300)]
bgpd: Check if we have really enough data before doing memcpy for FQDN capability

We advance data pointer (data++), but we do memcpy() with the length that is 1-byte
over, which is technically heap overflow.

```
==411461==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50600011da1a at pc 0xc4f45a9786f0 bp 0xffffed1e2740 sp 0xffffed1e1f30
READ of size 4 at 0x50600011da1a thread T0
    0 0xc4f45a9786ec in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x3586ec) (BuildId: e794c5f796eee20c8973d7efb9bf5735e54d44cd)
    1 0xc4f45abf15f8 in bgp_dynamic_capability_fqdn /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3457:4
    2 0xc4f45abdd408 in bgp_capability_msg_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3911:4
    3 0xc4f45abdbeb4 in bgp_capability_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3980:9
    4 0xc4f45abde2cc in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4109:11
    5 0xc4f45a9b6110 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3
```

Found by fuzzing.

Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit b685ab5e1bdec0848502c20e9596b9716b236639)

10 months agobgpd: Check if we have really enough data before doing memcpy for software version
Donatas Abraitis [Wed, 12 Jun 2024 19:54:45 +0000 (22:54 +0300)]
bgpd: Check if we have really enough data before doing memcpy for software version

If we receive CAPABILITY message (software-version), we SHOULD check if we really
have enough data before doing memcpy(), that could also lead to buffer overflow.

(data + len > end) is not enough, because after this check we do data++ and later
memcpy(..., data, len). That means we have one more byte.

Hit this through fuzzing by

```
    0 0xaaaaaadf872c in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x35872c) (BuildId: 9c6e455d0d9a20f5a4d2f035b443f50add9564d7)
    1 0xaaaaab06bfbc in bgp_dynamic_capability_software_version /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3713:3
    2 0xaaaaab05ccb4 in bgp_capability_msg_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3839:4
    3 0xaaaaab05c074 in bgp_capability_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3980:9
    4 0xaaaaab05e48c in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4109:11
    5 0xaaaaaae36150 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3
```

Hit this again by Iggy \m/

Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 5d7af51c4f7980507135babd94d392ca179c1bf7)

10 months agobgpd: Remove redundant whitespace before printing the reason of the failed peer
Donatas Abraitis [Wed, 19 Jun 2024 11:32:16 +0000 (14:32 +0300)]
bgpd: Remove redundant whitespace before printing the reason of the failed peer

Before:

```
Neighbor        EstdCnt DropCnt ResetTime Reason
127.0.0.1             0       0     never  Waiting for peer OPEN (n/a)
```

After:

```
Neighbor        EstdCnt DropCnt ResetTime Reason
127.0.0.1             0       0     never Waiting for peer OPEN (n/a)
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit b5bd626a82b2541bee8e3120139e19ba05e444c8)

10 months agobgpd: Set last reset reason to admin shutdown if it was manually
Donatas Abraitis [Wed, 19 Jun 2024 11:09:00 +0000 (14:09 +0300)]
bgpd: Set last reset reason to admin shutdown if it was manually

Before this patch, we always printed the last reason "Waiting for OPEN", but
if it's a manual shutdown, then we technically are not waiting for OPEN.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit c25c7e929d550c2faca3af74a29593b8c0b75db3)

10 months agoMerge pull request #16255 from FRRouting/mergify/bp/dev/10.1/pr-16059
Donatas Abraitis [Fri, 21 Jun 2024 14:51:43 +0000 (17:51 +0300)]
Merge pull request #16255 from FRRouting/mergify/bp/dev/10.1/pr-16059

bgpd: fixed failing to remove VRF if there is a stale l3vni (backport #16059)

10 months agoMerge pull request #16264 from FRRouting/mergify/bp/dev/10.1/pr-16252
Donatas Abraitis [Fri, 21 Jun 2024 14:50:52 +0000 (17:50 +0300)]
Merge pull request #16264 from FRRouting/mergify/bp/dev/10.1/pr-16252

zebra: fix evpn mh bond member proto reinstall (backport #16252)

10 months agoMerge pull request #16262 from FRRouting/mergify/bp/dev/10.1/pr-16260
Donatas Abraitis [Fri, 21 Jun 2024 14:50:20 +0000 (17:50 +0300)]
Merge pull request #16262 from FRRouting/mergify/bp/dev/10.1/pr-16260

bgpd: fix do not use api.backup_nexthop in ZAPI message (backport #16260)

10 months agozebra: fix evpn mh bond member proto reinstall
Chirag Shah [Wed, 19 Jun 2024 00:21:49 +0000 (17:21 -0700)]
zebra: fix evpn mh bond member proto reinstall

In case of EVPN MH bond, a member port going in
protodown state due to external reason (one case being linkflap),
frr updates the state correctly but upon manually
clearing external reason trigger FRR to reinstate
protodown without any reason code.

Fix is to ensure if the protodown reason was external
and new state is to have protodown 'off' then do no reinstate
protodown.

Ticket: #3947432
Testing:
switch:#ip link show swp1
4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9216 qdisc
   pfifo_fast master bond1 state DOWN mode DEFAULT group default qlen
   1000
       link/ether 1c:34:da:2c:aa:68 brd ff:ff:ff:ff:ff:ff protodown on
       protodown_reason <linkflap>

switch:#ip link set swp1 protodown off protodown_reason linkflap off
switch:#ip link show swp1
 4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9216 qdisc
    pfifo_fast master bond1 state DOWN mode DEFAULT group default qlen
    1000
        link/ether 1c:34:da:2c:aa:68 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Chirag Shah <chirag@nvidia.com>
(cherry picked from commit e4d843b438ae7cbae89ae47af0754fb1db153c6c)

10 months agobgpd: fix do not use api.backup_nexthop in ZAPI message
Philippe Guibert [Thu, 20 Jun 2024 16:02:26 +0000 (18:02 +0200)]
bgpd: fix do not use api.backup_nexthop in ZAPI message

The backup_nexthop entry list has been populated by mistake,
and should not. Fix this by reverting the introduced behavior.

Fixes: 237ebf8d4503 ("bgpd: rework bgp_zebra_announce() function, separate nexthop handling")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit d4390fc21795b09b84a6b95b1f8fa1ac2b3dcda9)

10 months agobgpd: fixed failing remove of vrf if there is a stale l3vni
Kacper Kwaśny [Mon, 27 May 2024 09:03:30 +0000 (11:03 +0200)]
bgpd: fixed failing remove of vrf if there is a stale l3vni

Problem statement:
==================
When a vrf is deleted from the kernel, before its removed from the FRR
config, zebra gets to delete the the vrf and assiciated state.

It does so by sending a request to delete the l3 vni associated with the
vrf followed by a request to delete the vrf itself.

2023/10/06 06:22:18 ZEBRA: [JAESH-BABB8] Send L3_VNI_DEL 1001 VRF
testVRF1001 to bgp
2023/10/06 06:22:18 ZEBRA: [XC3P3-1DG4D] MESSAGE: ZEBRA_VRF_DELETE
testVRF1001

The zebra client communication is asynchronous and about 1/5 cases the
bgp client process them in a different order.

2023/10/06 06:22:18 BGP: [VP18N-HB5R6] VRF testVRF1001(766) is to be
deleted.
2023/10/06 06:22:18 BGP: [RH4KQ-X3CYT] VRF testVRF1001(766) is to be
disabled.
2023/10/06 06:22:18 BGP: [X8ZE0-9TS5H] VRF disable testVRF1001 id 766
2023/10/06 06:22:18 BGP: [X67AQ-923PR] Deregistering VRF 766
2023/10/06 06:22:18 BGP: [K52W0-YZ4T8] VRF Deletion:
testVRF1001(4294967295)
.. and a bit later :
2023/10/06 06:22:18 BGP: [MRXGD-9MHNX] DJERNAES: process L3VNI 1001 DEL
2023/10/06 06:22:18 BGP: [NCEPE-BKB1G][EC 33554467] Cannot process L3VNI
1001 Del - Could not find BGP instance

When the bgp vrf config is removed later it fails on the sanity check if
l3vni is removed.

        if (bgp->l3vni) {
            vty_out(vty, "%% Please unconfigure l3vni %u\n",
                bgp->l3vni);
            return CMD_WARNING_CONFIG_FAILED;
        }

Solution:
=========
The solution is to make bgp cleanup the l3vni a bgp instance is going
down.

The fix:
========
The fix is to add a function in bgp_evpn.c to be responsible for for
deleting the local vni, if it should be needed, and call the function
from bgp_instance_down().

Testing:
========
Created a test, which can run in container lab that remove the vrf on
the host before removing the vrf and the bgp config form frr. Running
this test in a loop trigger the problem 18 times of 100 runs. After the
fix it did not fail.

To verify the fix a log message (which is not in the code any longer)
were used when we had a stale l3vni and needed to call
bgp_evpn_local_l3vni_del() to do the cleanup. This were hit 20 times in
100 test runs.

Signed-off-by: Kacper Kwasny <kkwasny@akamai.com>
bgpd: braces {} are not necessary for single line block

Signed-off-by: Kacper Kwasny <kkwasny@akamai.com>
(cherry picked from commit 171d2583d0373b456335477dea6688d2e9e95db7)