Wesley Coakley [Wed, 9 Jun 2021 03:50:43 +0000 (23:50 -0400)]
docker: Use tini unilaterally and stop tailing /dev/null
tini is a hyper-minimal PID 0 which spawns a child process (watchfrr.sh
in our case), reaps zombies and forwards signals to the script. Starting
watchfrr.sh directly instead of through the old `tail /dev/null` or
`sleep 365d` helps keep things clean too :)
While tini was previously only used in the Alpine container it is useful
to apply this PID 0 to all containers except the special CI ones.
Igor Ryzhov [Thu, 10 Jun 2021 17:21:51 +0000 (20:21 +0300)]
bgpd: fix routemap update with disabled delay timer
- vnc_routemap_update is called only for the last bgp instance
- vpn_policy_routemap_event is not called at all
- unguarded debug (there's already a debug inside the called function)
Make the code consistent with the callback code that is used when the
delay timer is enabled.
Igor Ryzhov [Fri, 11 Jun 2021 15:27:46 +0000 (18:27 +0300)]
isisd: per-instance dynamic hostname cache
Currently, the dynamic hostname cache is global. It is incorrect because
neighbors in different VRFs may have the same system ID and different
hostnames.
This also fixes a memory leak - when the instance is deleted, the cache
must be cleaned up and the cleanup thread must be cancelled.
Igor Ryzhov [Thu, 17 Jun 2021 16:31:03 +0000 (19:31 +0300)]
ospfd: fix routemap update
Currently, if the routemap already exists, we delete the pointer to it
when it is updated. We should delete the pointer only if the route-map
is actually deleted.
Igor Ryzhov [Tue, 25 May 2021 22:49:30 +0000 (01:49 +0300)]
ospf6d: fix interface area configuration
Currently the interface area is configured from the router node using
"interface IFNAME area ID" command. There are multiple problems with
this command:
- it is not in line with all other interface-related commands - other
parameters are configured from the interface node using "ipv6 ospf6"
prefix
- it is not in line with OSPFv2 - area is configured from the interface
node using "ip ospf area" command
- most importantly, it doesn't work correctly when the interface is in
a different VRF - instead of configuring the interface, it creates a
new fake interface and configuring it instead
To fix all the problems, this commit adds a new command to the interface
configuration node - "ipv6 ospf6 area ID". The purpose of the command is
completely the same, but it works correctly in a multi-VRF environment.
The old command is preserved for the backward compatibility, but the
warning is added that it is deprecated because it doesn't work correctly
with VRFs.
Igor Ryzhov [Mon, 7 Jun 2021 18:58:26 +0000 (21:58 +0300)]
tests: fix ospf6_topo1_vrf
ospf6d (and all other daemons except zebra) doesn't correctly process
`interface X vrf Y`, because it doesn't know existing VRFs at the time
of configuration file reading. Therefore it doesn't apply configuration
provided in the interface node.
Fix the problem by removing `vrf Y` part, having just an interface name
is enough.
Igor Ryzhov [Tue, 8 Jun 2021 14:01:56 +0000 (17:01 +0300)]
ospfd: fix memory leaks in summarization
To reproduce the issue:
1. Create summary-address: `summary-address 1.1.1.0/24`.
2. Try to delete it with the wrong tag: `no summary-address 1.1.1.0/24 tag 1`.
Each time this command is executed, route_node_lookup is called which
locks route node one more time. As the tag is wrong, the function
return immediately without unlock.
3. Finally delete the summary-address: `no summary-address 1.1.1.0/24`.
Donald Sharp [Tue, 8 Jun 2021 19:38:11 +0000 (15:38 -0400)]
zebra: Give extra space and stop processing if we run out of space
When processing bulk messages we need more space to handle more
mroutes. In this case we are doubling the stream size from
16k -> 32k, which should roughly double the number of mroutes
we can handle in one go.
Additionally. If we cannot parse the passed message into
the stream to pass up to pimd then gracefully stop processing
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix crash when displaying neighbor data in JSON
Add a null check to protect against the case where the neighbor
inactive timer is disabled. That can happen when the router is
acting as a helper for another router that is attempting to restart
gracefully.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
lib, ospfd, ospf6d: fix logging of pointer addresses
The %p printf format specifier does already print the pointer address
with a leading "0x" prefix (indicating a hexadecimal number). There's
no need to add that prefix manually.
While here, replace explicit function names in log messages by
__func__.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix cleanup of MaxAge LSAs on exit
During shutdown, the ospf->maxage_lsa table is iterated over to
clean up all existing entries. While doing that, route_unlock_node()
should be called only for the nodes that have an associated entry,
otherwise the table will get corrupted and ospfd will crash.
As a side note, using a routing table to store MaxAge LSAs was a
very poor choice of a data structure, considering that a simple
rb-tree or hash table would get the job done with a much simpler
(and less error-prone) API. Something to cleanup in the future...
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix dangling pointer when exiting from the helper mode
When exiting from the helper mode for a given router after an
unsuccessful graceful restart, removing the neighborship to that
router straight away leads to a dangling pointer in the associated
interface, which inevitably leads to a crash. To solve this
problem, schedule the removal of the neighbor instead of removing
it immediately.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix small issue when exiting from the GR helper mode
When exiting from the GR helper mode, recalculate the DR only for
interfaces of the appropriate types (broadcast and NMBA).
This fixes a problem where the state of a neighbor reachable over a
p2p interface was changing from Full/DROther to Full/Backup across
a graceful restart.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix GR helper initialization and termination
Since a single ospfd process can have multiple OSPF interfaces
configured, we need to separate the global GR initialization and
termination from per-instance initialization and termination.
Igor Ryzhov [Fri, 4 Jun 2021 14:47:32 +0000 (17:47 +0300)]
ospfd: fix passive interface configuration
Currently, passive interface flag is configured from the router node
using "passive-interface IFNAME". There are multiple problems with this
command:
- it is not in line with all other interface-related commands - other
parameters are configured from the interface node using "ip ospf"
prefix
- it is not in line with OSPFv3 - passive flag is configured from the
interface node using "ipv6 ospf6 passive" command
- most importantly, it doesn't work correctly when the interface is in
a different VRF - when using VRF-lite, it incorrectly changes the
vrf_id of the interface and it becomes desynced with the actual state;
when using netns, it creates a new fake interface and configures it
instead of configuring the necessary interface
To fix all the problems, this commit adds a new command to the interface
configuration node - "ip ospf passive". The purpose of the command is
completely the same, but it works correctly in a multi-VRF environment.
The old command is preserved for the backward compatibility, but the
warning is added that it is deprecated because it doesn't work correctly
with VRFs.
Rafael Zalamena [Mon, 7 Jun 2021 14:02:16 +0000 (11:02 -0300)]
lib: fix address sanitizer crash on `find`
Fix the following address sanitizer crash when running the command `find`:
ERROR: AddressSanitizer: dynamic-stack-buffer-overflow
WRITE of size 1 at 0x7fff4840fc1d thread T0
0 in print_cmd ../lib/command.c:1541
1 in cmd_find_cmds ../lib/command.c:2364
2 in find ../vtysh/vtysh.c:3732
3 in cmd_execute_command_real ../lib/command.c:995
4 in cmd_execute_command ../lib/command.c:1055
5 in cmd_execute ../lib/command.c:1219
6 in vtysh_execute_func ../vtysh/vtysh.c:486
7 in vtysh_execute ../vtysh/vtysh.c:671
8 in main ../vtysh/vtysh_main.c:721
9 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
10 in _start (/usr/bin/vtysh+0x21f64d)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Igor Ryzhov [Wed, 2 Jun 2021 14:27:02 +0000 (17:27 +0300)]
zebra: fix config after exit from vrf
When the VRF node is exited using "exit" or "quit", there's still a VRF
pointer stored in the vty context. If you try to configure some router
related command, it will be applied to the previous VRF instead of the
default VRF. For example:
```
(config)# vrf test
(config-vrf)# ip router-id 1.1.1.1
(config-vrf)# do show run
...
!
vrf test
ip router-id 1.1.1.1
exit-vrf
!
...
(config-vrf)# exit
(config)# ip router-id 2.2.2.2
(config)# do show run
...
!
vrf test
ip router-id 2.2.2.2
exit-vrf
!
...
```
`vrf-exit` works correctly, because it stores a pointer to the default
VRF into the vty context (but weirdly keeping the VRF_NODE instead of
changing it to CONFIG_NODE).
Instead of relying on the behavior of exit function, always use the
default VRF when in CONFIG_NODE.
Another problem is missing `VTY_CHECK_CONTEXT`. If someone deletes the
VRF in which node the user enters the command, then zebra applies the
command to the default VRF instead of throwing an error.
Igor Ryzhov [Tue, 1 Jun 2021 17:30:13 +0000 (20:30 +0300)]
bfdd: fix bfd key structure
There's a padding byte between "mhop" and "peer" fields in this structure.
This structure is sometimes passed by value to functions and used in
assignments. The standard doesn't guarantee that the padding bytes are
copied on assignments. As this structure is used as a hash key, having
this padding byte with unspecified value can lead to unwanted behavior.
Fix the possible issue by making the "mhop" field to be 2 bytes. Also
make the struct packed as a precaution for future changes.
the config for dynamic candidate paths with bandwidth preferences
was using a different order of keywords (required bandwidth X) than
the corresponding command (bandwidth X required). This confuses
frr-reload, and possibly users too. Make both use the same order.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Igor Ryzhov [Thu, 6 May 2021 23:49:40 +0000 (02:49 +0300)]
lib: fix binding to a vrf
There are two possible use-cases for the `vrf_bind` function:
- bind socket to an interface in a vrf
- bind socket to a vrf device
For the former case, there's one problem - success is returned when the
interface is not found. In that case, the socket is left unbound without
throwing an error.
For the latter case, there are multiple possible problems:
- If the name is not set, then the socket is left unbound (zebra, vrrp).
- If the name is "default" and there's an interface with that name in the
default VRF, then the socket is bound to that interface.
- In most daemons, if the router is configured before the VRF is actually
created, we're trying to open and bind the socket right after the
daemon receives a VRF registration from zebra. We may not receive the
VRF-interface registration from zebra yet at that point. Therefore,
`if_lookup_by_name` fails, and the socket is left unbound.
This commit fixes all the issues and updates the function description.
Suggested-by: Pat Ruddy <pat@voltanet.io> Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Christian Hopps [Thu, 20 May 2021 06:50:34 +0000 (06:50 +0000)]
lib: fix threading bug in GRPC code
The code that actually calls FRR northbound functions needs to be running in the
master thread. The previous code was running on a GRPC pthread. While fixing
moved to more functional vs OOP to make this easier to see.
Also fix ly merge to merge siblings not throw the originals away.
Christian Hopps [Fri, 28 May 2021 19:16:18 +0000 (19:16 +0000)]
northbound: KISS always batch yang config (file read), it's faster
The backoff code assumed that yang operations always completed quickly.
It checked for > 100 YANG modeled commands happening in under 1 second
to enable batching. If 100 yang modeled commands always take longer than
1 second batching is never enabled. This is the exact opposite of what
we want to happen since batching speeds the operations up.
Here are the results for libyang2 code without and with batching.
(IPv6 numbers are basically the same as iPv4, a couple percent slower)
Clearly we need this. Please note the growth (1K to 2K) w/o batching is
non-linear and 100 times slower than batched.
Notes on code: The use of the new `nb_cli_apply_changes_clear_pending`
is to commit any pending changes (including the current one). This is
done when the code would not correctly handle a single diff that
included the current changes with possible following changes. For
example, a "no" command followed by a new value to replace it would be
merged into a change, and the code would not deal well with that. A good
example of this is BGP neighbor peer-group changing. The other use is
after entering a router level (e.g., "router bgp") where the follow-on
command handlers expect that router object to now exists. The code
eventually needs to be cleaned up to not fail in these cases, but that
is for future NB cleanup.
Pat Ruddy [Wed, 3 Mar 2021 11:59:30 +0000 (11:59 +0000)]
ospf6d: read ospf6 socket until failure
To ensure we read all the datagrams availabe from a socket when the
read task is scheduled, make the read helper return and error or
continue enum and loop unitl an error is received.
This requires the read from the socket to be non blocking
Pat Ruddy [Wed, 3 Mar 2021 11:17:38 +0000 (11:17 +0000)]
ospf6d: create an ospf_read_helper function
Take the contents of ospf6_receive and split the funtionality that
deals with a single packet receipt and place it in a separate helper
function.
This is the first step in a refactor process to allow the ospf6_read
task to read until failure.
Pat Ruddy [Tue, 25 May 2021 08:38:26 +0000 (09:38 +0100)]
ospf6: fix memory leak in ospf6_abr_examin_summary
Ensure that if allocated route is not added to a table then it is
deleted to avoid leaking memory.
Add a new memory type for route table so that ospf6 routes can be
distinguished in the show memory output in isolation.
Martin Buck [Wed, 26 May 2021 14:03:51 +0000 (16:03 +0200)]
ospf6d: Fix route map "set tag" command
So far, "set tag" was 99% implemented in ospf6d, but registration of the
hook functions was missing, causing "set tag" actions in route maps to be
ignored in ospf6d.
This commit adds the missing hook registration.
Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Igor Ryzhov [Wed, 26 May 2021 08:48:09 +0000 (11:48 +0300)]
ospf6d: fix debug message config write
Fix the following issues:
- if "send" is combined with "recv-hdr", only "send" is shown
- if "recv" is combined with "send-hdr", only "recv" is shown
- if both "send-hdr" and "recv-hdr" are enabled, "; header only" is shown
Igor Ryzhov [Tue, 25 May 2021 18:58:55 +0000 (21:58 +0300)]
ospf6d: fix possible crashes
OSPF6 instance may not exist when processing interface state change.
Do not execute processing steps that require an instance if an area is
not configured for an interface.
Donald Sharp [Mon, 24 May 2021 17:45:29 +0000 (13:45 -0400)]
ospfd: Fix quick interface down up event handling in ospf
When we get this sequence of events:
- zebra receives interface up, sends to ospf
- ospf receives intf up, processes( including neighbor formation and spf )
and sends route to zebra for installation.
- zebra receives route for processing, schedules it too happen in the future
- zebra receives interface down event, sends to ospf
- zebra processes route X and marks it inactive because nexthop
interface is down
- zebra receives interface up event, sends to ospf
- ospf receives both events and processes the change and decides
that nothing has changed so it does not send any route change for X to zebra.
At this point zebra has a route from ospf that is marked as inactive, while
ospf believes that the route should be installed properly.
Modify the code such that on an interface down event, ospf marks the routes
as changed if the ifindex is being used for a nexthop, so that when ospf
is deciding if routes have changed post spf that it can just automatically
send that route down again if it still exists.
Rafael Zalamena [Mon, 24 May 2021 11:30:26 +0000 (08:30 -0300)]
ospf6d: fix address sanitizer crash
Don't `memcpy` a `struct prefix` the memory size varies depending on the
original intended type. In this case the original type was (casted away)
`struct prefix_ipv6` and we tried to copy `struct prefix` which is
bigger.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Don Slice [Wed, 19 May 2021 18:23:28 +0000 (14:23 -0400)]
ospfd: "ip ospf area" command can select wrong process
Found that in some circumstances, when the "ip ospf area"
command was entered for the default vrf, the wrong ospf
process would be used to check for the presence of a
"network" statement, causing the "ip ospf area" command to
be rejected. This was due to the command using the ospf
instance lookup to find the right ospf process, which can
be in error depending on when the processes were created.
Christian Hopps [Thu, 20 May 2021 06:49:32 +0000 (06:49 +0000)]
lib: fix northbound merge code (libyang)
lyd_merge_tree replaces dest siblings with source siblings, not what we
want. Instead lyd_merge_siblings to keep both. Instead lyd_merge_siblings
to keep both.
Igor Ryzhov [Mon, 17 May 2021 22:24:22 +0000 (01:24 +0300)]
ospf6d: fix "default-information originate" in non-existing vrf
If the default route redistribution is configured in OSPF6 router before
the VRF is created, then this is not currently registered in zebra after
the VRF creation.