Donald Sharp [Thu, 25 Mar 2021 13:28:30 +0000 (09:28 -0400)]
bgpd: Use rpki_curr_state instead of curr_state
During Review it was suggested that appending rpki_
to curr_state and target_state would be better
variable names. Instead of going and fixing
3 or so commits up. Just do this one.
bgpd: handle local ES del or transition to LACP bypass
1. When a local ES is deleted or the ES-bond goes into bypass we treat
imported MAC-IP routes with that ES destination as remote routes instead
of sync routes. This requires a re-evaluation of the routes as
"non-local-dest" and an update to zebra.
2. When a ES is attached to an access port or the ES-bond transitions from
bypass to LACP-up we treat imported MAC-IP routes with that ES destination as
sync routes. This requires a re-evaluation of the routes as
"local-dest" and an update to zebra.
Setup a mh_info indirection in the path extra. This has been done to
avoid increasing evpn route's path size to add new (type based) pointers
in path_info_extra.
lib/zebra: zapi for installing EVPN nexthops from bgp
EVPN nexthops are installed as remote neighs by zebra. This was earlier
done only via VRF IPvX uni routes imported from EVPN routes.
With EVPN-MH these VRF routes now reference a L3NHG which is setup based
on the EAD and doesn't include the RMAC. To workaround that BGP now
consolidates and maintains EVPN nexthops which are then sent to zebra.
zebra sets up these nexthops as L3-VNI nh entries using a dummy type-1
route as reference.
bgpd: Disable L3NHG support for routes leaked from another VRF
Theoretically we should just be able to use the L3 NHG in the other-VRF/nh-VRF.
But there is some change list handling (when an ES is added to or
removed from a VRF) that needs to be updated to account for routes in other
VRFs using that ES-VRF as nexthop. Till that is done we will disable L3-NHG
use for routes leaked from a different VRF.
Route in tenant2 with ES/NHG as destination -
===========================================
root@leaf11:mgmt:~# ip route show vrf tenant2 22.1.0.7
22.1.0.7 nhid 75000012 proto bgp metric 20
root@leaf11:mgmt:~# ip nexthop list id 75000012
id 75000012 group 103/107/111 proto bgp
root@leaf11:mgmt:~# ip nexthop |grep "103\|107\|111"
id 103 via 6.0.0.11 dev vlan12 scope link proto bgp onlink
id 107 via 6.0.0.12 dev vlan12 scope link proto bgp onlink
id 111 via 6.0.0.13 dev vlan12 scope link proto bgp onlink
id 75000012 group 103/107/111 proto bgp
root@leaf11:mgmt:~#
Leaked into VRF1 with a flat/exploded mpaths
============================================
root@leaf11:mgmt:~# ip route show vrf tenant1 |grep -A3 22.1.0.7
22.1.0.7 proto bgp metric 20
nexthop via 6.0.0.11 dev vlan12 weight 1 onlink
nexthop via 6.0.0.12 dev vlan12 weight 1 onlink
nexthop via 6.0.0.13 dev vlan12 weight 1 onlink
root@leaf11:mgmt:~#
bgpd: re-eval use-l3nhg when a remote ES is [de]activated in a VRF
There are two changes in this commit -
1. Maintain a list of global MAC-IP routes per-ES. This list is maintained
for quick processing on the following events -
a. When the first VTEP/PE becomes active in the ES-VRF, the L3 NHG is
activated and the route can be sent to zebra.
b. When there are no active PEs in the ES-VRF the L3 NHG is
de-activated and -
- If the ES is present in the VRF -
The route is not installed in zebra as there are no active PEs for
the ES-VRF
- If the ES is not present in the VRF -
The route is installed with a flat multi-path list i.e. without L3NHG.
This is to handle the case where there are no locally attached L2VNIs
on the ES (for that tenant VRF).
2. Reinstall VRF route when an ES is installed or uninstalled in a
tenant VRF (the global MAC-IP list in #1 is used for this purpose also).
If an ES is present in the VRF we use L3NHG to enable fast-failover of
routed traffic.
bgpd: allow routes to be imported if the ES/ES-VRF is not present
In a sym-IRB setup the remote ES may not be installed if the tenant
VRF is not present locally. To allow that case while retaining the
fast-failover benefits for the case where the tenant VRF is locally
present we use the following approach -
1. If ES is present in the tenant VRF we use the L3NHG for installing
the MAC-IP based tenant route. This allows for efficient failover via
L3NHG updates.
2. If the ES is not present locally in the corresponding tenant VRF we
fall back to a non-NHG multi-path based routing approach. In this
case individual routes are updated when the ES links flap.
PS: #1 can be turned off entirely by disabling use-l3-nhg in BGP.
bgpd: on ES down re-advertise the MAC-IP entry without the L3 ECOM
When an ES goes down the MAC-IP route must be updated to remove it from
the tenant VRF routing table. This is because the fast-failover
(via EAD-per-ES withdraw) procedures described in RFC 7432 are only
applicable to L2 forwarding/MAC-ECMP. For L3/routed traffic (in a
sym-IRB setup) failover, individual paths need to be withdrawn.
To handle this difference in L2/L3 requirements BGP updates the MAC-IP
route to include the L3 ECOM if local destination ES is oper-up and
to exclude the L3 ECOM if local ES is oper-down.
Donald Sharp [Fri, 19 Mar 2021 13:27:51 +0000 (09:27 -0400)]
isisd: Prevent OOM crash in isis
When you set the isis mtu to 200, isis ends up in a infinite loop
trying to fragment the tlv's.
Specifically ( for me ) the extended reachability function
for packing pack_item_extended_reach requires 11 + ISIS_SUBTLV_MAX_SIZE
room in the packet. Which is 180 bytes. At this point we have
174 bytes that we can write into a packet.
I created this by modifying the isis-topo1 topology to all
the isis routers to have a lsp-mtu of 200 and immediately
saw the crash.
Effectively the pack_items_ function had no detection for
when a part of the next bit it was writing into the stream
could not even fit and it would go into an infinite loop
allocating ~800 bytes at a time. This would cause the
router to run out of memory very very fast and the OOM
detector would kill the process.
Modify the code to notice that we have insufficient space to
even write any data into the stream.
I suspect that pack_item_extended_reach could also be optimized
to figure out exactly how much space is needed. But I also
think we need this protection in the function if this ever
happens again.
I also do not understand the use case of saying the min mtu is
200.
Fixes: #8289 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
pimd: in 'no ip pim hello' add hold time as optional when hello interval given
Issue:
User is allowed to configure only hello without hold timer but when undo
config, the hold timer is mandatory as shown below:
FRR-4(config-if)# ip pim hello 10
<cr>
(1-180) Time in seconds for Hold Interval
FRR-4(config-if)# ip pim hello 10
FRR-4(config-if)# no ip pim hello 10
(1-180) Time in seconds for Hold Interval
FRR-4(config-if)# no ip pim hello 10
% Command incomplete: no ip pim hello 20
Fix:
Making the hold timer as optional when undo config.
pimd: Validation that hello should be less than hold time config.
Also included display of hold time in CLI 'show ip pim int <intf>' cmd
and json commands.
Issue:
PIM neighbor not coming up if hold time is less than hello timer
since hello is sent every 4 sec and hold is 1 sec,
because of this nbr is flapping
Fix:
Do not allow configuration of hold timer less than hello timer
Also reset the value of hold timer to 3.5 times to hello whenever
only hello is modified so that the relationship holds good.
Yash Ranjan [Thu, 11 Mar 2021 13:41:05 +0000 (05:41 -0800)]
ospf6d: Link LSAs are not getting MAX_AGE in neighbor
When the ospf6 daemon goes down, it originates MAX_AGE
LSAs for all the self-originated LSAs so that it gets
flushed from the neighbor's database. But the link-LSAs
are not getting MAX_AGE.
Set the self-originated link-LSAs age to MAX_AGE and
flood it
ckishimo [Mon, 2 Nov 2020 14:28:52 +0000 (06:28 -0800)]
ospfd: add support for suppress_fa
This command will trigger the OSPF forwarding address suppression in
translated type-5 LSAs, causing a NSSA ABR to use 0.0.0.0 as a forwarding
address instead of copying the address from the type-7 LSA
Example: In a topology like: R1 --- R2(ABR) --- R3(ASBR)
R3 is announcing a type-7 LSA that is translated to type-5 by the R2 ABR.
The forwarding address in the type-5 is by default copied from the type-7
r1# sh ip os da external
AS External Link States
LS age: 6
Options: 0x2 : *|-|-|-|-|-|E|-
LS Flags: 0x6
LS Type: AS-external-LSA
Link State ID: 3.3.3.3 (External Network Number)
Advertising Router: 10.0.25.2
LS Seq Number: 80000001
Checksum: 0xcf99
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 20
Forward Address: 10.0.23.3 <--- address copied from type-7 lsa
External Route Tag: 0
r2# sh ip os database
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.23.3 8 0x80000001 0x431d E2 3.3.3.3/32 [0x0]
AS External Link States
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.25.2 0 0x80000001 0xcf99 E2 3.3.3.3/32 [0x0]
r2# conf t
r2(config)# router ospf
r2(config-router)# area 1 nssa suppress-fa
r2(config-router)# exit
r2(config)# exit
r2# sh ip os database
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.23.3 66 0x80000001 0x431d E2 3.3.3.3/32 [0x0]
AS External Link States
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.25.2 16 0x80000002 0x0983 E2 3.3.3.3/32 [0x0]
r1# sh ip os da external
OSPF Router with ID (11.11.11.11)
AS External Link States
LS age: 34
Options: 0x2 : *|-|-|-|-|-|E|-
LS Flags: 0x6
LS Type: AS-external-LSA
Link State ID: 3.3.3.3 (External Network Number)
Advertising Router: 10.0.25.2
LS Seq Number: 80000002
Checksum: 0x0983
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 20
Forward Address: 0.0.0.0 <--- address set to 0
External Route Tag: 0
r2# conf t
r2(config)# router ospf
r2(config-router)# no area 1 nssa suppress-fa
r2(config-router)# exit
r1# sh ip os da external
OSPF Router with ID (11.11.11.11)
AS External Link States
LS age: 1
Options: 0x2 : *|-|-|-|-|-|E|-
LS Flags: 0x6
LS Type: AS-external-LSA
Link State ID: 3.3.3.3 (External Network Number)
Advertising Router: 10.0.25.2
LS Seq Number: 80000003
Checksum: 0xcb9b
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 20
Forward Address: 0.0.0.0 <--- address set to 0
External Route Tag: 0
r2# conf t
r2(config)# router ospf
r2(config-router)# no area 1 nssa suppress-fa
r2(config-router)# exit
r1# sh ip os da external
OSPF Router with ID (11.11.11.11)
AS External Link States
LS age: 1
Options: 0x2 : *|-|-|-|-|-|E|-
LS Flags: 0x6
LS Type: AS-external-LSA
Link State ID: 3.3.3.3 (External Network Number)
Advertising Router: 10.0.25.2
LS Seq Number: 80000003
Checksum: 0xcb9b
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 20
Forward Address: 10.0.23.3 <--- address copied from type-7 lsa
External Route Tag: 0
Igor Ryzhov [Wed, 24 Mar 2021 10:16:15 +0000 (13:16 +0300)]
vtysh: hide "show configuration running" command
This command is currently useful only for developers.
Let's hide it to not confuse end users by having both
"show runnning-config" and "show configuration running".
Mark Stapp [Tue, 23 Mar 2021 18:48:58 +0000 (14:48 -0400)]
lib: enlarge the local buffer for printfrr extension tokens
Make the local buffer offered to printfrr extension tokens
bigger; existing size wasn't quite enough for some of the
more elaborate struct prefix types.
Rafael Zalamena [Tue, 23 Mar 2021 15:19:20 +0000 (12:19 -0300)]
bgpd: improve BFD with timers configuration
Move `bgp_peer_config_apply` outside `bgp_peer_configure_bfd` (and
document it) so we only call the session installation once with one
set of timers. It also makes all calls of that function
equal (e.g. always calls `bgp_peer_config_apply` afterwards).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Rafael Zalamena [Fri, 26 Feb 2021 19:50:51 +0000 (16:50 -0300)]
bgpd: rework BFD integration
Remove old BFD API usage and replace it with the new one.
Highlights:
- More shared code: the daemon gets notified with callbacks instead of
having to roll its own code to find the notified sessions.
- Less code to integrate with BFD.
- Remove hidden commands to configure single / multi hop. Use
protocol data instead.
BGP can determine if a peer is single/multi hop according to the
following criteria:
a. If the IP address is a link-local address (single hop)
b. The network is shared with peer (single hop)
c. BGP is configured for eBGP multi hop / TTL security (multi hop)
- Respect the configuration hierarchy:
a. Peer configuration take precendence over peer-group
configuration.
b. When peer group configuration is removed, reset peer
BFD configurations to defaults (unless peer had specific
configs).
Example:
neighbor foo peer-group
neighbor foo bfd profile X
neighbor 192.168.0.2 peer-group foo
neighbor 192.168.0.2 bfd
! If peer-group is removed the profile configuration gets
! removed from peer 192.168.0.2, but BFD will still enabled
! because of the neighbor specific bfd configuration.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Rafael Zalamena [Fri, 26 Feb 2021 19:51:00 +0000 (16:51 -0300)]
bgpd: remove cumulus specific code
The BFD function `bgp_bfd_is_peer_multihop` will no longer exist and now
both code paths are equal.
Longer explanation:
Cumulus was previously using the BFD function to help determine whether a
peer is multi hop or not, because there is a configuration to set BFD
to use single or multi hop.
Current BFD code can automatically pick between single/multi hop by
using the protocol information and so it is a good idea to have that
tested/used than relying on yet another duplicated information.
(BFD extracts the TTL information from protocol and selects
single/multi hop based on that)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Igor Ryzhov [Tue, 16 Mar 2021 19:09:27 +0000 (22:09 +0300)]
ospfd: fix "show ip ospf database" issues
Current implementation of commands `show_ip_ospf_instance_database_cmd`
and `show_ip_ospf_instance_database_type_adv_router_cmd` have the
following problems:
- they doesn't have "vrf all" argument, however the processing of this
argument is implemented,
- they incorrectly implement json output for instances - they don't
output anything to the vty and don't release the json object.
To fix the problems, let's do the following:
1. Split `show_ip_ospf_instance_database_cmd` into two aliases to
`show_ip_ospf_database_max_cmd` and `show_ip_ospf_instance_database_max_cmd`.
The code is the same and doesn't need to be duplicated.
2. Split `show_ip_ospf_instance_database_type_adv_router_cmd` into two
separate functions - one regular and one for instances, which now
correctly implements the processing for json output.
David Lamparter [Mon, 22 Mar 2021 19:21:19 +0000 (20:21 +0100)]
tools: run `vtysh -b` once for all-startup
As noted by Donald:
When FRR is starting all daemons (or restarting them all) FRR is reading
in the configuration 1 time for each daemon specified to run. This is
not a big deal if you have a very small configuration. But with large
configurations FRR is taking long enough that watchfrr is not
establishing connection to all the daemons and starting some over.
Modify the code so that vtysh is only read in at the end of a all
sequence. If we are restarting an individual daemon allow the read in of
the whole config.
Reported-by: Donald Sharp <sharpd@nvidia.com> Signed-off-by: David Lamparter <equinox@diac24.net>