Martin Winter [Wed, 21 Jul 2021 13:06:19 +0000 (15:06 +0200)]
FRR Release 8.0
Major changes:
* A new daemon, `pathd`, has been added. This daemon implements support
for segment routing.
* EVPN Multihoming is now fully supported
* OSPFv3 now supports VRFs
* TI-LFA has been implemented in IS-IS and OSPF
* Zebra now has the ability to dump netlink messages in a human-friendly format
* LDP gained SNMP support
* Minimum libyang version is now 2.0
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
Problem: Sometimes the configured Local GR state is not reflected in
show command and peer node. This is causing failures in few of the
BGP-GR topotests.
RCA: This problem is seen when the configuration of local GR state
happens when the BGP session is in OpenSent state and moves to
Established after the configuration is complete.
When the session gets established, we move the GR state value from stub peer
to the config peer. This will result in overriding the GR state to
previous value.
Fix: The local GR state is modified only through CLI configuration and
does not change during BGP FSM transition. In this case it is not necessary
to transfer the GR state value from stub peer to config peer. This way we
can ensure that always the most recent config value is present in peer
datastructure.
Igor Ryzhov [Mon, 12 Jul 2021 19:51:49 +0000 (22:51 +0300)]
ospf6d: fix freebsd mcast group issues
There's a delay in FreeBSD between issuing a command to leave a
multicast group and an actual leave. If we execute "no router ospf6" and
"router ospf6" fast enough, we can end up in a situation when OS
performs the leave later than it performs the join and the interface
remains without a multicast group.
Instead of counting on a one second delay, we must wait until the
interface actually leaves the group.
Donald Sharp [Wed, 7 Jul 2021 20:00:12 +0000 (16:00 -0400)]
lib: Allow ZAPI_MESSAGE_OPAQUE_LENGTH length of data
We are sending up to ZAPI_MESSAGE_OPAQUE_LENGTH but checking
for one less. We know the data will fit in it to that size.
Also we have asserts on the write to ensure we don't go over
it
Ondřej Surý [Mon, 15 Feb 2021 07:43:26 +0000 (08:43 +0100)]
doc: Use dpkg-buildpackage to build packages (add note about debuild)
The debuild command fails when we are doing source package only build
because it expects the arch-dependent .changes file to be present. Thus
in the instructions we switch to using dpkg-buildpackage directly and
add a note about using debuild in more complicated scenarios.
Ondřej Surý [Mon, 15 Feb 2021 07:40:08 +0000 (08:40 +0100)]
doc: Add instructions on how to build only source packages
In the CI, it's better to build the source package only once and then
instead of checking out the whole repository, only distribute the source
packages to the individual jobs.
Igor Ryzhov [Fri, 25 Jun 2021 11:59:28 +0000 (14:59 +0300)]
ospf6d: fix duplicated packet read
When OSPFv3 router is configured in both default and non-default VRFs,
every packet destined to a non-default VRF is read twice. This makes it
impossible to establish neighborship because every DbDesc packet is
treated as duplicated and we end up infinitely exchanging DbDescs.
We should drop packets received in the default VRF if an interface we
received it on is bound to another VRF.
Igor Ryzhov [Tue, 22 Jun 2021 21:27:55 +0000 (00:27 +0300)]
isisd: fix interface ldp-sync configuration
There are two checks done when configuring ldp-sync on an interface:
- interface is not a loopback
- interface is in the default VRF
Both checks are incorrectly done using the operational data.
The second check can be done using only config data - do that.
The first check can't be done using only configurational data, but it's
not necessary. LDP sync code doesn't operate on loopback interfaces
already. There's no harm in allowing this to be configured.
Igor Ryzhov [Fri, 18 Jun 2021 10:06:13 +0000 (13:06 +0300)]
lib: remove pure attribute from functions that modify memory
Almost all functions currently marked with pure attribute acquire a
route_node lock. By marking them pure we allow compiler to optimize the
code and not call them when it already knows the return value. This is
completely incorrect.
Only two of eleven functions can be marked as pure. And they still won't
be optimized because they are never called from the same function twice.
Let's remove the ext_pure macro completely to reduce the chance of
repeating this mistake in the future.
Wesley Coakley [Wed, 9 Jun 2021 03:50:43 +0000 (23:50 -0400)]
docker: Use tini unilaterally and stop tailing /dev/null
tini is a hyper-minimal PID 0 which spawns a child process (watchfrr.sh
in our case), reaps zombies and forwards signals to the script. Starting
watchfrr.sh directly instead of through the old `tail /dev/null` or
`sleep 365d` helps keep things clean too :)
While tini was previously only used in the Alpine container it is useful
to apply this PID 0 to all containers except the special CI ones.
Igor Ryzhov [Thu, 10 Jun 2021 17:21:51 +0000 (20:21 +0300)]
bgpd: fix routemap update with disabled delay timer
- vnc_routemap_update is called only for the last bgp instance
- vpn_policy_routemap_event is not called at all
- unguarded debug (there's already a debug inside the called function)
Make the code consistent with the callback code that is used when the
delay timer is enabled.
Igor Ryzhov [Fri, 11 Jun 2021 15:27:46 +0000 (18:27 +0300)]
isisd: per-instance dynamic hostname cache
Currently, the dynamic hostname cache is global. It is incorrect because
neighbors in different VRFs may have the same system ID and different
hostnames.
This also fixes a memory leak - when the instance is deleted, the cache
must be cleaned up and the cleanup thread must be cancelled.
Igor Ryzhov [Thu, 17 Jun 2021 16:31:03 +0000 (19:31 +0300)]
ospfd: fix routemap update
Currently, if the routemap already exists, we delete the pointer to it
when it is updated. We should delete the pointer only if the route-map
is actually deleted.
Igor Ryzhov [Tue, 25 May 2021 22:49:30 +0000 (01:49 +0300)]
ospf6d: fix interface area configuration
Currently the interface area is configured from the router node using
"interface IFNAME area ID" command. There are multiple problems with
this command:
- it is not in line with all other interface-related commands - other
parameters are configured from the interface node using "ipv6 ospf6"
prefix
- it is not in line with OSPFv2 - area is configured from the interface
node using "ip ospf area" command
- most importantly, it doesn't work correctly when the interface is in
a different VRF - instead of configuring the interface, it creates a
new fake interface and configuring it instead
To fix all the problems, this commit adds a new command to the interface
configuration node - "ipv6 ospf6 area ID". The purpose of the command is
completely the same, but it works correctly in a multi-VRF environment.
The old command is preserved for the backward compatibility, but the
warning is added that it is deprecated because it doesn't work correctly
with VRFs.
Igor Ryzhov [Mon, 7 Jun 2021 18:58:26 +0000 (21:58 +0300)]
tests: fix ospf6_topo1_vrf
ospf6d (and all other daemons except zebra) doesn't correctly process
`interface X vrf Y`, because it doesn't know existing VRFs at the time
of configuration file reading. Therefore it doesn't apply configuration
provided in the interface node.
Fix the problem by removing `vrf Y` part, having just an interface name
is enough.
Igor Ryzhov [Tue, 8 Jun 2021 14:01:56 +0000 (17:01 +0300)]
ospfd: fix memory leaks in summarization
To reproduce the issue:
1. Create summary-address: `summary-address 1.1.1.0/24`.
2. Try to delete it with the wrong tag: `no summary-address 1.1.1.0/24 tag 1`.
Each time this command is executed, route_node_lookup is called which
locks route node one more time. As the tag is wrong, the function
return immediately without unlock.
3. Finally delete the summary-address: `no summary-address 1.1.1.0/24`.
Donald Sharp [Tue, 8 Jun 2021 19:38:11 +0000 (15:38 -0400)]
zebra: Give extra space and stop processing if we run out of space
When processing bulk messages we need more space to handle more
mroutes. In this case we are doubling the stream size from
16k -> 32k, which should roughly double the number of mroutes
we can handle in one go.
Additionally. If we cannot parse the passed message into
the stream to pass up to pimd then gracefully stop processing
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix crash when displaying neighbor data in JSON
Add a null check to protect against the case where the neighbor
inactive timer is disabled. That can happen when the router is
acting as a helper for another router that is attempting to restart
gracefully.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
lib, ospfd, ospf6d: fix logging of pointer addresses
The %p printf format specifier does already print the pointer address
with a leading "0x" prefix (indicating a hexadecimal number). There's
no need to add that prefix manually.
While here, replace explicit function names in log messages by
__func__.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix cleanup of MaxAge LSAs on exit
During shutdown, the ospf->maxage_lsa table is iterated over to
clean up all existing entries. While doing that, route_unlock_node()
should be called only for the nodes that have an associated entry,
otherwise the table will get corrupted and ospfd will crash.
As a side note, using a routing table to store MaxAge LSAs was a
very poor choice of a data structure, considering that a simple
rb-tree or hash table would get the job done with a much simpler
(and less error-prone) API. Something to cleanup in the future...
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix dangling pointer when exiting from the helper mode
When exiting from the helper mode for a given router after an
unsuccessful graceful restart, removing the neighborship to that
router straight away leads to a dangling pointer in the associated
interface, which inevitably leads to a crash. To solve this
problem, schedule the removal of the neighbor instead of removing
it immediately.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix small issue when exiting from the GR helper mode
When exiting from the GR helper mode, recalculate the DR only for
interfaces of the appropriate types (broadcast and NMBA).
This fixes a problem where the state of a neighbor reachable over a
p2p interface was changing from Full/DROther to Full/Backup across
a graceful restart.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix GR helper initialization and termination
Since a single ospfd process can have multiple OSPF interfaces
configured, we need to separate the global GR initialization and
termination from per-instance initialization and termination.
Igor Ryzhov [Fri, 4 Jun 2021 14:47:32 +0000 (17:47 +0300)]
ospfd: fix passive interface configuration
Currently, passive interface flag is configured from the router node
using "passive-interface IFNAME". There are multiple problems with this
command:
- it is not in line with all other interface-related commands - other
parameters are configured from the interface node using "ip ospf"
prefix
- it is not in line with OSPFv3 - passive flag is configured from the
interface node using "ipv6 ospf6 passive" command
- most importantly, it doesn't work correctly when the interface is in
a different VRF - when using VRF-lite, it incorrectly changes the
vrf_id of the interface and it becomes desynced with the actual state;
when using netns, it creates a new fake interface and configures it
instead of configuring the necessary interface
To fix all the problems, this commit adds a new command to the interface
configuration node - "ip ospf passive". The purpose of the command is
completely the same, but it works correctly in a multi-VRF environment.
The old command is preserved for the backward compatibility, but the
warning is added that it is deprecated because it doesn't work correctly
with VRFs.
Rafael Zalamena [Mon, 7 Jun 2021 14:02:16 +0000 (11:02 -0300)]
lib: fix address sanitizer crash on `find`
Fix the following address sanitizer crash when running the command `find`:
ERROR: AddressSanitizer: dynamic-stack-buffer-overflow
WRITE of size 1 at 0x7fff4840fc1d thread T0
0 in print_cmd ../lib/command.c:1541
1 in cmd_find_cmds ../lib/command.c:2364
2 in find ../vtysh/vtysh.c:3732
3 in cmd_execute_command_real ../lib/command.c:995
4 in cmd_execute_command ../lib/command.c:1055
5 in cmd_execute ../lib/command.c:1219
6 in vtysh_execute_func ../vtysh/vtysh.c:486
7 in vtysh_execute ../vtysh/vtysh.c:671
8 in main ../vtysh/vtysh_main.c:721
9 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
10 in _start (/usr/bin/vtysh+0x21f64d)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Igor Ryzhov [Wed, 2 Jun 2021 14:27:02 +0000 (17:27 +0300)]
zebra: fix config after exit from vrf
When the VRF node is exited using "exit" or "quit", there's still a VRF
pointer stored in the vty context. If you try to configure some router
related command, it will be applied to the previous VRF instead of the
default VRF. For example:
```
(config)# vrf test
(config-vrf)# ip router-id 1.1.1.1
(config-vrf)# do show run
...
!
vrf test
ip router-id 1.1.1.1
exit-vrf
!
...
(config-vrf)# exit
(config)# ip router-id 2.2.2.2
(config)# do show run
...
!
vrf test
ip router-id 2.2.2.2
exit-vrf
!
...
```
`vrf-exit` works correctly, because it stores a pointer to the default
VRF into the vty context (but weirdly keeping the VRF_NODE instead of
changing it to CONFIG_NODE).
Instead of relying on the behavior of exit function, always use the
default VRF when in CONFIG_NODE.
Another problem is missing `VTY_CHECK_CONTEXT`. If someone deletes the
VRF in which node the user enters the command, then zebra applies the
command to the default VRF instead of throwing an error.
Igor Ryzhov [Tue, 1 Jun 2021 17:30:13 +0000 (20:30 +0300)]
bfdd: fix bfd key structure
There's a padding byte between "mhop" and "peer" fields in this structure.
This structure is sometimes passed by value to functions and used in
assignments. The standard doesn't guarantee that the padding bytes are
copied on assignments. As this structure is used as a hash key, having
this padding byte with unspecified value can lead to unwanted behavior.
Fix the possible issue by making the "mhop" field to be 2 bytes. Also
make the struct packed as a precaution for future changes.
the config for dynamic candidate paths with bandwidth preferences
was using a different order of keywords (required bandwidth X) than
the corresponding command (bandwidth X required). This confuses
frr-reload, and possibly users too. Make both use the same order.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Igor Ryzhov [Thu, 6 May 2021 23:49:40 +0000 (02:49 +0300)]
lib: fix binding to a vrf
There are two possible use-cases for the `vrf_bind` function:
- bind socket to an interface in a vrf
- bind socket to a vrf device
For the former case, there's one problem - success is returned when the
interface is not found. In that case, the socket is left unbound without
throwing an error.
For the latter case, there are multiple possible problems:
- If the name is not set, then the socket is left unbound (zebra, vrrp).
- If the name is "default" and there's an interface with that name in the
default VRF, then the socket is bound to that interface.
- In most daemons, if the router is configured before the VRF is actually
created, we're trying to open and bind the socket right after the
daemon receives a VRF registration from zebra. We may not receive the
VRF-interface registration from zebra yet at that point. Therefore,
`if_lookup_by_name` fails, and the socket is left unbound.
This commit fixes all the issues and updates the function description.
Suggested-by: Pat Ruddy <pat@voltanet.io> Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>