Donald Sharp [Sun, 12 Mar 2023 00:37:21 +0000 (19:37 -0500)]
pimd: IN_MULTICAST needs host order
New correct behavior:
eva# conf
eva(config)# ip pim rp 192.168.1.224 224.0.0.0/24
No Path to RP address specified: 192.168.1.224
eva(config)# ip pim rp 224.1.2.3 224.0.0.0/24
% Bad RP address specified: 224.1.2.3
eva(config)#
Donald Sharp [Sat, 11 Mar 2023 17:05:44 +0000 (12:05 -0500)]
bgpd: Increment version number even when no data is sent
When an update group decides to not send a prefix
announcement because it has not changed, still increment
the version number. Why? To allow for the situation
where you have say 2 peers in 1 peer group and shortly
after they come up a 3rd peer comes up. It will be
placed into a separate update group and could be
coalesced down, when it finishes updating all data
to it. Now imagine that a single prefix changes at
this point in time as well. Then first 2 peers may
decide to not send the data, since nothing has changed.
While the 3rd peer will and since the versions numbers
never match they will never coalesce. So when the decision
is made to skip, update the version number as well.
Sarita Patra [Sat, 25 Feb 2023 08:33:13 +0000 (00:33 -0800)]
pimd, pim6d: Don't track nexthop for RP 0.0.0.0 & 0::0
Topology:
========
FHR----Source
Problem:
=======
When FHR receives multicast traffic, there is no RP configured,
PIMD does NHT register for RP address 0.0.0.0 and group 224.0.0.0/4
PIM6D does NHT register for RP address 0::0 and group FF00::0/8
frr# show ip pim nexthop
Number of registered addresses: 1
Address Interface Nexthop
---------------------------------------------
frr# show ipv6 pim nexthop
Number of registered addresses: 1
Address Interface Nexthop
---------------------------------------------
Fix:
====
Dont track nexthop for RP 0.0.0.0 & 0::0.
frr# show ip pim nexthop
Number of registered addresses: 0
frr# show ipv6 pim nexthop
Number of registered addresses: 0
Donald Sharp [Wed, 1 Mar 2023 19:41:21 +0000 (14:41 -0500)]
pimd: Prevent crash when pimreg already exists.
If the pimreg device exists but it has not been set to the pim->pimreg pointer we can have
a crash. Just prevent the crash since it's some sort of startup / re-org the network
issue.
(gdb) bt
0 0x00007f0485b035cb in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
1 0x00007f0485c0fbec in core_handler (signo=6, siginfo=0x7ffdc0198030, context=<optimized out>) at lib/sigevent.c:264
2 <signal handler called>
3 0x00007f04859668eb in raise () from /lib/x86_64-linux-gnu/libc.so.6
4 0x00007f0485951535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
5 0x00007f0485c3af76 in _zlog_assert_failed (xref=xref@entry=0x55692269b940 <_xref.23164>, extra=extra@entry=0x0) at lib/zlog.c:680
6 0x00005569226150d0 in pim_if_new (ifp=0x556922c82900, gm=gm@entry=false, pim=pim@entry=false, ispimreg=ispimreg@entry=true,
is_vxlan_term=is_vxlan_term@entry=false) at pimd/pim_iface.c:124
7 0x0000556922615140 in pim_if_create_pimreg (pim=pim@entry=0x556922cc11e0) at pimd/pim_iface.c:1549
8 0x0000556922616bc8 in pim_if_create_pimreg (pim=0x556922cc11e0) at pimd/pim_iface.c:1613
9 pim_ifp_create (ifp=0x556922cc0e70) at pimd/pim_iface.c:1641
10 0x00007f0485c32cf9 in zclient_interface_add (cmd=<optimized out>, zclient=<optimized out>, length=<optimized out>, vrf_id=77) at lib/zclient.c:2214
11 0x00007f0485c3346a in zclient_read (thread=<optimized out>) at lib/zclient.c:4003
12 0x00007f0485c215ed in thread_call (thread=thread@entry=0x7ffdc0198880) at lib/thread.c:2008
13 0x00007f0485bdbbc8 in frr_run (master=0x556922a10470) at lib/libfrr.c:1223
14 0x000055692260312b in main (argc=<optimized out>, argv=0x7ffdc0198b98, envp=<optimized out>) at pimd/pim_main.c:176
Sarita Patra [Wed, 19 Oct 2022 00:32:11 +0000 (17:32 -0700)]
pimd, pim6d: Fix pim upstream rpf change
When upstream RPF address is secondary, and
neighborship is built with primary address,
then pim_neighbor_find() fails, due to which when there
is upstream change it wont send prune.
Verify the nexthop is present in the neighbor primary
and secondary address list.
Sarita Patra [Tue, 18 Oct 2022 23:27:14 +0000 (16:27 -0700)]
pimd, pim6d: Fix RP Unknown IIF
When route to RP is having nexthop secndary address,
neighborship is built with primary address,
then pim_neighbor_find() fails, which causes RP IIF
Unknown.
Fix:
Verify pim neighborship on the RP connected interface.
Sarita Patra [Tue, 18 Oct 2022 23:06:12 +0000 (16:06 -0700)]
pimd, pim6d: Fix BSM packet process
Problem 1:
When route to BSR is having nexthop secondary address,
neighborship is built with primary address,
then pim_neighbor_find() fails, which cause drop of BSM
packet.
Fix 1:
Verify pim neighborship on the BSM received interface.
Problem 2:
Problem 2:
Source IP BSM address is primary address, where
as nexthop also can be primary or secondary address.
Fix 2:
Avoiding the check (nhaddr == src_ip) for PIMV6
Sarita Patra [Mon, 10 Oct 2022 18:06:10 +0000 (11:06 -0700)]
zebra: Send nexthop ifindex for type NEXTHOP_TYPE_IPV6
Once RP/BSR address is learned in PIMD, PIMD does nexthop tracking
in Zebra.
For IPV6 address, the nexthop type is either NEXTHOP_TYPE_IPV6
or NEXTHOP_TYPE_IPV6_IFINDEX.
Zebra should send nexthop ifindex information along with nexthop address
to the client (PIMD).
Sarita Patra [Fri, 24 Feb 2023 15:13:30 +0000 (07:13 -0800)]
pim6d: Fix missing parameters in "show ipv6 mld interface" command
Before fix:
==========
frr# show ipv6 mld interface
Interface State V Querier Timer Uptime
ens224 up 1 fe80::250:56ff:feb7:a7e3 query 00:00:24.219 00:00:07.031
After fix:
=========
frr(config-if)# do show ipv6 mld interface
Interface State Address V Querier QuerierIp Query Timer Uptime
ens224 up fe80::250:56ff:feb7:a7e3 1 local fe80::250:56ff:feb7:a7e3 00:01:22.263 00:08:00.237
Sarita Patra [Fri, 24 Feb 2023 15:01:22 +0000 (07:01 -0800)]
pim6d: Don't display MLD disabled or down interfaces in "show ipv6 mld interface" cmd
We should not display down interfaces or MLD disabled interfaces in
"show ipv6 mld interface" command.
Before fix:
==========
frr# show ipv6 mld interface
Interface State V Querier Timer Uptime
ens192 up 2 fe80::250:56ff:feb7:d04 query 00:00:25.432 00:00:07.038
ens224 up 1 fe80::250:56ff:feb7:a7e3 query 00:00:24.219 00:00:07.031
pim6reg down
After fix:
=========
frr# show ipv6 mld interface
Interface State V Querier Timer Uptime
ens192 up 2 fe80::250:56ff:feb7:d04 query 00:00:25.432 00:00:07.038
ens224 up 1 fe80::250:56ff:feb7:a7e3 query 00:00:24.219 00:00:07.031
Sarita Patra [Mon, 27 Feb 2023 06:25:05 +0000 (22:25 -0800)]
pimd, pim6d: Upstream IIF pointing towards PIM and IGMP disabled source connected interface
Topology:
=========
RP---FHR<ens224>---Source
Problem Statement:
=================
Step 1:
Enable PIM and IGMP on source connected interface ens224
Step 2:
Start multicast traffic. (s,g) mroute and upstream will be created as expected.
dev# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - SGRpt Pruned, F - Register flag, T - SPT-bit for SSM FHR
Source Group Flags Proto Input Output TTL Uptime
50.0.0.4 225.1.1.1 SF PIM ens224 pimreg 1 00:37:55
dev# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
ens224 50.0.0.4 225.1.1.1 NotJ,RegJ 00:37:57 --:--:-- --:--:-- 00:02:43 1
Step 3:
Disable PIM on source connected interafce ens224
dev# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - SGRpt Pruned, F - Register flag, T - SPT-bit for SSM FHR
Source Group Flags Proto Input Output TTL Uptime
50.0.0.4 225.1.1.1 SF PIM ens224 pimreg 1 00:38:05
dev# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
ens224 50.0.0.4 225.1.1.1 NotJ,RegJ 00:38:08 --:--:-- --:--:-- 00:02:32 1
Step 4:
Disable IGMP on source connected interface ens224
dev# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
ens224 50.0.0.4 225.1.1.1 NotJ,RegJ 00:38:15 --:--:-- --:--:-- 00:03:27 1
dev# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - SGRpt Pruned, F - Register flag, T - SPT-bit for SSM FHR
Source Group Flags Proto Input Output TTL Uptime
50.0.0.4 225.1.1.1 SF PIM <iif?> pimreg 1 00:38:18
Pim upstream IIF is still pointing towards the source connected
interface which is not pim enabled and not IGMP enabled and
Mroute is still present in the kernel and KAT timer is still running
on the interface, where ifp->info is already set to NULL.
This leads to crash.
Root Cause:
==========
When "no ip pim" commands get executed on source connected interface,
we are updating upstream IIF only when IGMP is not enabled on the same
interface.
Fix:
===
When PIM is disabled on source connected interface, update upstream IIF
no matter if IGMP is enabled or not on the same interface.
Donald Sharp [Thu, 23 Feb 2023 18:29:32 +0000 (13:29 -0500)]
bgpd: Flowspec overflow issue
According to the flowspec RFC 8955 a flowspec nlri is <length, <nlri data>>
Specifying 0 as a length makes BGP get all warm on the inside. Which
in this case is not a good thing at all. Prevent warmth, stay cold
on the inside.
Reported-by: Iggy Frankovic <iggyfran@amazon.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 0b999c886e241c52bd1f7ef0066700e4b618ebb3)
Donatas Abraitis [Wed, 22 Feb 2023 20:22:28 +0000 (22:22 +0200)]
bgpd: Align `show bgp ...` output with the header for wide option
Before:
```
r1# sh ip bgp wide
BGP table version is 1, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
* 172.16.255.254/32 192.168.2.2 0 0 (65003) i
*> 192.168.1.2 0 0 (65002) i
Displayed 1 routes and 2 total paths
r1#
```
After:
```
r1# sh ip bgp wide
BGP table version is 1, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
* 172.16.255.254/32 192.168.2.2 0 0 (65003) i
*> 192.168.1.2 0 0 (65002) i
David Lamparter [Wed, 1 Jun 2022 07:54:31 +0000 (09:54 +0200)]
pimd: try to reinstall MFC when we get NOCACHE
Whether due to a pimd bug, some expiry, or someone just deleting MFC
entries, when we're in NOCACHE we *know* there's no MFC entry. Add an
install call to make sure pimd's MFC view aligns with the actual kernel
MFC.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Thu, 10 Mar 2022 12:59:26 +0000 (13:59 +0100)]
pimd: make logs useful for input drops
This path here is pretty far on top of the list of issues that operators
will run into and have to debug when setting up PIM. Make the log
messages actually tell what's going on. Also escalate some from
`debug mroute detail` to `debug mroute`.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donald Sharp [Wed, 22 Feb 2023 16:38:00 +0000 (11:38 -0500)]
bgpd: Give better debug message when configuration is being read in
Sometimes bgp connections can be rejected for a variety of reasons. Give
a bit more context about what is going wrong so that the operator can
make better decisions about their network.
Donatas Abraitis [Tue, 21 Feb 2023 21:10:45 +0000 (23:10 +0200)]
bgpd: Pass global ASN for confederation peers if not AS_SPECIFIED
When we specify remote-as as external/internal, we need to set local_as to
bgp->as, instead of bgp->confed_id. Before this patch, (bgp->as != *as) is
always valid for such a case because *as is always 0.
Also, append peer->local_as as CONFED_SEQ to avoid other side withdrawing
the routes due to confederation own AS received and/or malformed as-path.
zyxwvu Shi [Wed, 15 Feb 2023 15:55:00 +0000 (23:55 +0800)]
zebra: Fix other table inactive when ip import-table is on
In `rib_link`, if is_zebra_import_table_enabled returns
true, `rib_queue_add` will not called, resulting in other
table route node never processed. This actually should not
be dependent on whether the route is imported.
In `rib_delnode`, if is_zebra_import_table_enabled returns
true, it will use `rib_unlink` instead of enqueuing the
route node for process. There is no reason that imported
route nodes should not be reprocessed. Long ago, the
behaviour was dependent on whether the route_entry comes
from a table other than main.
Philippe Guibert [Mon, 13 Feb 2023 11:18:33 +0000 (12:18 +0100)]
bgpd: clarify when the vpnv6 nexthop length must be modified
Using a route-map to update the local ipv6 address has to be
better clarified. Actually, when a VPN SAFI is used, the nexthop
length must be changed to 48 bytes. Other cases, the length will
be 32 bytes.
Fixes: 9795e9f23465 ("bgpd: fix when route-map changes the link local
nexthop for vpnv6")
Donald Sharp [Thu, 2 Feb 2023 21:28:27 +0000 (16:28 -0500)]
lib: Fix non-use of option
Commit d7c6467ba2f55d1055babbb7fe82716ca3efdc7e added the
ability to specify non pretty printing but unfortunately
forgot to use the option variable to make the whole
thing work.
vivek [Fri, 18 Dec 2020 18:55:40 +0000 (10:55 -0800)]
bgpd: Prevent multipathing among EVPN and non-EVPN paths
Ensure that a multipath set is fully comprised of EVPN paths (i.e.,
paths imported into the VRF from EVPN address-family) or non-EVPN
paths. This is actually a condition that existed already in the code
but was not properly enforced.
This change, as a side effect, eliminates the known trigger condition
for bad or missing RMAC programming in an EVPN deployment, described
in tickets CM-29043 and CM-31222. Routes (actually, paths) in a VRF
routing table that require VXLAN tunneling to the next hop currently
need some special handling in zebra to deal with the nexthop (neigh)
and RMAC programming, and this is implemented for the entire route
(prefix), not per-path. This can lead to the bad or missing RMAC
situation, which is now eliminated by ensuring all paths in the route
are 'similar'.
The longer-term solution in CL 5.x will be to deal with the special
programming by means of explicit communication between bgpd and zebra.
This is already implemented for EVPN-MH via CM-31398. These changes
will be extended to non-MH also and the special code in zebra removed
or refined.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com> Acked-by: Trey Aspelund <taspelund@nvidia.com> Acked-by: Anuradha Karuppiah <anuradhak@nvidia.com> Acked-by: Chirag Shah <chirag@nvidia.com>
Ticket: CM-29043
Testing Done:
1. Manual testing
2. precommit on both MLX and BCM platforms
3. evpn-smoke - BCM and VX
vivek [Thu, 3 Dec 2020 04:04:19 +0000 (20:04 -0800)]
bgpd: Fix deterministic-med check for stale paths
When performing deterministic MED processing, ensure that the peer
status is not checked when we encounter a stale path. Otherwise, this
path will be skipped from the DMED consideration leading to it potentially
not being installed.
Test scenario: Consider a prefix with 2 (multi)paths. The peer that
announces the path with the winning DMED undergoes a graceful-restart.
Before it comes back up, the other path goes away. Prior to the fix, a
third router that receives both these paths would have ended up not
having any path installed to the prefix after the above events.
bgpd: Intern default-originate attributes to avoid use-after-free
When we receive a default route from a peer and we originate default route
using `neighbor default-originate`, we do not track of struct attr we use,
and when we do `no neighbor default-originate` we withdraw our generated
default route, but we announce default-route from the peer.
After we do this, we unintern aspath (which was used for default-originate),
BUT it was used also for peer's default route we received.
And here we have a use-after-free crash, because bgp_process_main_one()
reaps old paths that are marked as BGP_PATH_REMOVED with aspath->refcnt > 0,
but here it's 0.
```
0 0x55c24bbcd022 in aspath_key_make bgpd/bgp_aspath.c:2070
1 0x55c24b8f1140 in attrhash_key_make bgpd/bgp_attr.c:777
2 0x7f52322e66c9 in hash_release lib/hash.c:220
3 0x55c24b8f6017 in bgp_attr_unintern bgpd/bgp_attr.c:1271
4 0x55c24ba0acaa in bgp_path_info_free_with_caller bgpd/bgp_route.c:283
5 0x55c24ba0a7de in bgp_path_info_unlock bgpd/bgp_route.c:309
6 0x55c24ba0af6d in bgp_path_info_reap bgpd/bgp_route.c:426
7 0x55c24ba17b9a in bgp_process_main_one bgpd/bgp_route.c:3333
8 0x55c24ba18a1d in bgp_process_wq bgpd/bgp_route.c:3425
9 0x7f52323c2cd5 in work_queue_run lib/workqueue.c:282
10 0x7f52323aab92 in thread_call lib/thread.c:2006
11 0x7f5232300dc7 in frr_run lib/libfrr.c:1198
12 0x55c24b8ea792 in main bgpd/bgp_main.c:520
13 0x7f5231c3a082 in __libc_start_main ../csu/libc-start.c:308
14 0x55c24b8ef0bd in _start (/usr/lib/frr/bgpd+0x2c90bd)
```
anlan_cs [Mon, 6 Feb 2023 01:27:05 +0000 (09:27 +0800)]
bgpd: fix use-after-free crash for evpn
```
anlan(config-router-af)# vni 33
anlan(config-router-af-vni)# route-target both 44:55
anlan(config-router-af-vni)# no route-target both 44:55
vtysh: error reading from bgpd: Resource temporarily unavailable (11)Warning: closing connection to bgpd because of an I/O error!
```
When `bgp_evpn_vni_rt_cmd` deals with "both" type, it wrongly created
only one node ( should be two nodes ) for lists of both `vpn->import_rtl` and
`vpn->export_rtl`. At this time, the two lists are already wrong.
In `no route-target both RT`, it will free the single node from lists of both
`vpn->import_rtl` and `vpn->export_rtl`. After freed from `vpn->import_rtl`,
it is "use-after-free" at the time of freeing it from `vpn->export_rtl`.
It causes crash sometimes, or other unexpected behaviours.
This issue is introduced by commit `3b7e8d`, which have adjusted both
`bgp_evpn_vni_rt_cmd` and `bgp_evpn_vrf_rt_cmd`.
Since `bgp_evpn_vrf_rt_cmd/no_bgp_evpn_vrf_rt_cmd` works well again
unintentionally with commit `7022da`, only `bgp_evpn_vni_rt_cmd` needs to
modify - add two nodes for "both" type and some explicit comments for this
special case of "both" type.