This test demonstrates that a label is allocated for each
ipv6 next-hop. IPv6 test introduces link local ipv6 addresses
as next hops, and compared to IPv4, one can have two different
next-hops depending if the next-hop is defined by a global
address (static route redistributed) or a bgp peer.
This test checks that:
- The labels are correctly allocated per connected next-hop.
- The default label is used for non connected prefixes.
- The withdraw operation frees the MPLS entry.
- If a recursive route is redistributed by BGP, then the nexthop
tracking will find the appropriate nexthop entry, and the
associated label will be found out.
- When a prefix moves from one peer to one another behind the
vrf, then the MPLS switching operation for return
traffic is changing the outgoing interface to use.
- When the 'label vpn export <value>' MPLS label value is changed,
then the modification is propagated to prefixes which use that value.
- Also, when unconfiguring the per-nexthop allocation mode, check
that the MPLS entries and the VPNv4 entries of r1 are changed
accordingly.
- Reversely, when re-configuring the per-nexthop allocation mode,
check that the allocation mode reuses the other label values.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Tue, 10 Jan 2023 13:53:54 +0000 (14:53 +0100)]
topotests: add bgp mpls allocation per next-hop test
A new test suite checks for the mpls label allocation
per nexthop mode. This test checks that:
- The labels are correctly allocated per connected
next-hop.
- The default label is used for non connected prefixes
- The withdraw operation frees the mpls entry.
- If a recursive route is redistributed by BGP, then the nexthop
tracking will find the appropriate nexthop entry, and the associated
label will be found out.
- When a prefix moves from one peer to one another behind the vrf,
then the MPLS switching operation for return traffic is changing
the outgoing interface to use.
- When the 'label vpn export <value>' MPLS label value is changed,
then the modification is propagated to prefixes which use that value.
- When unconfiguring the per-nexthop allocation mode, check
that the MPLS entries and the VPNv4 entries of r1 are changed
accordingly.
- Reversely, when re-configuring the per-nexthop allocation mode,
check that the allocation mode reuses the other label values.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 16 Feb 2023 12:46:32 +0000 (13:46 +0100)]
bgpd: update time of last change when label nexthop entry changed
A timer attribute is added for each label nexthop entry, in order
to know when the last change occured.
The timer value will be used for troubleshooting by a show
command in the next commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
bgpd: export redistributed routes with label allocation per nexthop
The label allocation per nexthop mode requires to use a nexthop
tracking context. For redistributed routes, a nexthop tracking
context is created, and the resolution helps to know the real
nexthop ip address used. The below configuration example has
been used:
Without that patch, only the redistributed routes that rely on a
pre-existing nexthop tracking context could be exported.
Also, a command in the code about redistributed routes is modified
accordingly, to explain that redistribute routes may be submitted
to nexthop tracking in the case label allocation per next-hop is
used.
note:
VNC routes have been removed from the redistribution,
because of a test failure in the bgp_l3vpn_to_bgp_direct test.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Fri, 13 Jan 2023 14:51:59 +0000 (15:51 +0100)]
bgpd: correctly initialize the IP nexthop of redistributed routes
This is a preliminary work to export redistributed routes from
a given VRF in an VPN network. The exportation works well, when
the label allocation is based on an per-vrf mode, but not on
a per nexthop mode.
To associate a label with a connected nexthop, the nexthop
tracking contexts are used. Until today, there was no tracking
context for redistributed routes. But when using this vpn
allocation mode, one needs to know whether the route is directly
connected or not. When using the nexthop tracking context, the
nexthop attribute of the bgp update needs to have the nexthop
properly set. This was not the case for the mp_nexthop_global_in
attribute which was empty.
This commit is mandatory in order to later use nexthop tracking
context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 16 Feb 2023 09:39:40 +0000 (10:39 +0100)]
bgpd: use nexthop interface when adding LSP in BGP MPLSVPN
BGP MPLSVPN next hop label allocation was using only the next-hop
IP address. As MPLSVPN contexts rely on bnc contexts, the real
nexthop interface is known, and the LSP entry to enter can apply
to the specific interface. To illustrate, the BGP service is able
to handle the following two iproute2 commands:
> ip -f mpls route add 105 via inet 192.0.2.45 dev r1-eth1
> ip -f mpls route add 105 via inet 192.0.2.46 dev r1-eth2
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Tue, 28 Feb 2023 13:25:02 +0000 (14:25 +0100)]
bgpd: add support for l3vpn per-nexthop label
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Tue, 28 Feb 2023 13:17:17 +0000 (14:17 +0100)]
bgpd: add the bgp_label_per_nexthop_cache struct and apis
This commit introduces the necessary structs and apis to
create the cache entries that store the label information
associated to a given nexthop.
A hash table is created in each BGP instance for all the
AFIs: IPv4 and IPv6. That hash table is initialised.
An API to look and/or create an entry based on a given
nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Tue, 28 Feb 2023 13:11:30 +0000 (14:11 +0100)]
bgpd: introduce LP_TYPE_NEXTHOP label type
A new label type is introduced: LP_TYPE_NEXTHOP. This new
label type will be used in next commits to allocate labels
for a specific nexthop IP address.
The commit changes add vty and json outputs to display
the new label type and the label values associated.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This command will update the label values associated for each
BGP update to export to the global instance. Two modes are
available: per-nexthop and per-vrf. The latter is the default
one.
With this commit only, configuring label allocation per nexthop
will only reset the BGP updates, and the per-vrf mode label
allocation will be chosen.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
bgpd: remove ATTR_NEXT_HOP for redistributed ipv6 nexthops
This commit addresses an issue with an MPLS VPN network
redistributing static routes that are exported to the VPN,
and where the labels are allocated per next-hop.
For that purpose, the nexthop of the static routes is
checked against the nexthop tracking. The validation
of a valid nexthop will trigger the use of a unique
label for all prefixes using that destination.
However, the nexthop fails to be validated, with the
following message:
> evaluate_paths: prefix 172:31::14/128 (vrf vrf1), ignoring path due to
> martian or self-next-hop
The reason is due to the way the attr is created.
By default, the ATTR_NEXTHOP attribute is set for
all prefixes, whereas this flag should only be valid
for IPv4. In the case there is an IPv6 nexthop, remove
the ATTR_NEXTHOP flag.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
zebra: mpls nexthop entry displays also interface when available
The 'show mpls table json' command displays the outgoing interface
name only when the nexthop type is either NEXTHOP_TYPE_IFINDEX or
NEXTHOP_TYPE_IPV6_IFINDEX. add the interface name for the nexthop
type NEXTHOP_TYPE_IPV4_IFINDEX.
Fixes: ("b78b820d46d6") MPLS: Display enhancements and JSON support Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 12 Jan 2023 07:44:29 +0000 (08:44 +0100)]
zebra: handle nexthop vrf_id in ZEBRA_MPLS_LABELS messages
This commit addresses the case where a service wants to install
an LSP entry to a next-hop located in a VRF instance. The incoming
MPLS packet is on the namespace and has to be directed to a nexthop
located behind an interface that sits in a specific VRF instance.
The below iproute command can illustrate:
> ip link add vrf1 type vrf table 10
> ip link set dev vrf1 up
> ip link set dev eth0 master vrf1
> ip a a 192.0.2.1/24 dev eth0
> ip -f mpls route add 105 via inet 192.0.2.45 dev eth0
If a service uses the ZEBRA_MPLS_LABELS messages, then the LSP
message is ignored: from zebra perspective, the MPLS entries are
visible via the 'show mpls table' command, but no LSP entry is
installed in the kernel.
The issue is in the nhlfe_nexthop_active_ipv[4/6] function: the
outgoing interface mentioned in the nexthop is searched in the
main VRF, whereas the interface is in a separate VRF. The interface
is not found, and the nhlfe to install is considered not active.
To address this issue, reuse the incoming vrf_id parameter transmitted
in the nexthop structure from the ZEBRA_MPLS_LABELS message. When
creating an NHLFE entry, the vrf_id is used instead of the DEFAULT_VRF.
And the nhlfe entry can be considered as active.
One alternate solution to reuse the vrf_id parameter in the mpls network
context would be to modify the search function in nhlfe_nexthop_active..()
function: looking for an existing ifindex in the zns. However, this
solution may not fit later when netns backend would be used.
Note that some changes have not been done yet and are considered
sufficient for now:
- The 'nhlfe_find' API: the assumption is done that only the linux vrf
backend is used for now.
- The 'mpls_lsp_install()' API: It is currently used by the CLI command
which does not handle the interface parameter, and the SRTE service, whih
always sends LSPs towards a nexthop located in the VRF_DEFAULT.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 12 Jan 2023 07:33:50 +0000 (08:33 +0100)]
zebra: accept LSP entries with an mpls-less outgoing interface
The ZEBRA_MPLS_LABELS_[ADD/DELETE/REPLACE] messages may change an
LSP entry based on an incoming MPLS entry, followed by a given
next-hop.
Having a next hop with no label information inside is rejected
by the zebra layer. As illustration, the following ZAPI message
would be rejected, because the next hop does not contain any
label information.
> ip -f mpls route add 105 via inet 192.0.2.45
At the same time, such configuration is desirable to be
supported:
An attempt has been done to configure the next-hop with an implicit-
null label. But the message is rejected by the kernel:
> ip -f mpls route add 104 as 3 via inet 192.0.2.45
> Error: Implicit NULL Label (3) can not be used in encapsulation.
The commit proposes to accept ZEBRA_MPLS_LABELS_[XX] messages with
a nexthop that does not contain any label information.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Eugene Crosser [Mon, 8 May 2023 16:38:40 +0000 (18:38 +0200)]
tests: Fix out of tree build for lua scripting
test_frrscript is run from the `tests` directory and expects the sample
lua script `script1.lua` to be present in the `lib` directory. When the
package is built out of tree (which always happens when a debian
package is built), and scripting is enabled, test fails because the lua
file is not present in the `tests/lib/` subdir of the _build_ directory.
Fix this by adding `script1.lua` as an extra dependency for
`test_frrscript`, and a recipe that copies the file from the source tree
to the build tree (note: it needs to be marked ".PHONY" because
otherwise `make` thinks that it already exists, in the source tree).
After this commit, the following command starts to work:
Donald Sharp [Mon, 8 May 2023 12:06:30 +0000 (08:06 -0400)]
tests: ospf_metric_propagation should not look for a specific vrfId
There is no guarantee that the vrfId is going to be the same across
tests, as that the vrfId is chosen based upon the ifindex of the
vrf device. As such we should not be looking for the vrfId, but
the correct vrf name.
Donald Sharp [Mon, 8 May 2023 11:47:49 +0000 (07:47 -0400)]
tests: ospf_metric_propagation is looking for a specific ifindex
The test ospf_metric_propagation is looking for a specific ifindex
this ifindex is not guaranteed to be any particular value by the underlying
OS. So let's remove this test for it. As a side note I am seeing
tests fail in upstream CI because of this.
Jack.zhang [Fri, 5 May 2023 06:58:32 +0000 (14:58 +0800)]
bgpd: fix the issue of connected tag error when BGP subscribes to NHT from Zebra
Imagine the following scenario:
1.Create a multihop ebgp peer and config the ttl as 254 for both side.
2.Call bgp_start and start an active connection.
Bgp will send a nht register with non-connected flag.
3.The function bgp_accept be called by remote connection.
Bgp will create a accept peer as a passive connection with default ttl(1). And then will send a nht register again with connected flag. This register result will cover the first one.
4.The active connection come to establish first. In funciton "peer_xfer_conn", check for "PEER_FLAG_CONFIG_NODE" flag of "from_peer->doppelganger" will not be pass, so we can not repair the nht register error forever.
Then the bgp nexthop will be like this:
2000::60 invalid, #paths 0, peer 2000::60
Must be Connected
Last update: Thu May 4 09:35:14 2023
The route from this peer can not be treat with a vaild nexthop forever.
This change will fix this error.
Intermittently zebra and kernel are out of sync
when interface flaps and the add's/dels are in
same processing queue and zebra assumes no change in nexthop.
Hence we need to bring in a reinstall to kernel
of the nexthops and routes to sync their states.
Upon interface flap kernel would have deleted NHGs
associated to a interface (the one flapped),
zebra retains NHGs for 3 mins even though upper
layer protocol removes the nexthops (associated NHG).
As part of interface address add ,
re-add singleton NHGs associated to interface.
1. No any configuration in FRR, and `ip link add vrf1 type vrf ...`.
Currently, everything is ok.
2. `ip link del vrf1`.
`zebra` will wrongly/redundantly notify clients to add "vrf1" as a normal
interface after correct deletion of "vrf1".
```
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELLINK(17), len=588, seq=0, pid=0
ZEBRA: [TDJW2-B9KJW] RTM_DELLINK for vrf1(93) <- Wrongly as normal interface, not vrf
ZEBRA: [WEEJX-M4HA0] interface vrf1 vrf vrf1(93) index 93 is now inactive.
ZEBRA: [NXAHW-290AC] MESSAGE: ZEBRA_INTERFACE_DELETE vrf1 vrf vrf1(93)
ZEBRA: [H97XA-ABB3A] MESSAGE: ZEBRA_INTERFACE_VRF_UPDATE/DEL vrf1 VRF Id 93 -> 0
ZEBRA: [HP8PZ-7D6D2] MESSAGE: ZEBRA_INTERFACE_VRF_UPDATE/ADD vrf1 VRF Id 93 -> 0 <-
ZEBRA: [Y6R2N-EF2N4] interface vrf1 is being deleted from the system
ZEBRA: [KNFMR-AFZ53] RTM_DELLINK for VRF vrf1(93)
ZEBRA: [P0CZ5-RF5FH] VRF vrf1 id 93 is now inactive
ZEBRA: [XC3P3-1DG4D] MESSAGE: ZEBRA_VRF_DELETE vrf1
ZEBRA: [ZMS2F-6K837] VRF vrf1 id 4294967295 deleted
OSPF: [JKWE3-97M3J] Zebra: interface add vrf1 vrf default[0] index 0 flags 480 metric 0 mtu 65575 speed 0 <- Wrongly add interface
```
`if_handle_vrf_change()` moved the interface from specific vrf to default
vrf. But it doesn't skip interface of vrf type. So, the wrong/redundant
add operation is done.
Note, the wrong add operation is regarded as an normal interface because
the `ifp->status` is cleared too early, so it is without VRF flag
( `ZEBRA_INTERFACE_VRF_LOOPBACK` ). Now, ospfd will initialize `ifp->type`
to `OSPF_IFTYPE_BROADCAST`.
3. `ip link add vrf1 type vrf ...`, add "vrf1" again. FRR will be with
wrong display:
```
interface vrf1
ip ospf network broadcast
exit
```
Here, zebra will send `ZEBRA_INTERFACE_ADD` again for "vrf1" with
correct `ifp->status`, so it will be updated into vrf type. But
it can't update `ifp->type` from `OSPF_IFTYPE_BROADCAST` to
`OSPF_IFTYPE_LOOPBACK` because it had been already configured in above
step 2.
Two changes to fix it:
1. Skip the procedure of switching VRF for interfaces of vrf type.
It means, don't send `ZEBRA_INTERFACE_ADD` to clients when deleting vrf.
2. Put the deletion of this flag at the last.
It means, clients should get correct `ifp->status`.
Christian Hopps [Fri, 28 Apr 2023 15:11:41 +0000 (11:11 -0400)]
tests: change topotest log timestamp precision to 6.
- Often millisecond precision is not good enough to differentiate things that
occur directly one after another, and things that have some pause in between,
increase to microsecond precision (reporting)
Verify activation and desactivation of per-vrf and per-af
sid export. Modify the configuration of r2 and verify that
changes are reflected in r1 and on connectivity between ce1 and c2.