Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix cleanup of MaxAge LSAs on exit
During shutdown, the ospf->maxage_lsa table is iterated over to
clean up all existing entries. While doing that, route_unlock_node()
should be called only for the nodes that have an associated entry,
otherwise the table will get corrupted and ospfd will crash.
As a side note, using a routing table to store MaxAge LSAs was a
very poor choice of a data structure, considering that a simple
rb-tree or hash table would get the job done with a much simpler
(and less error-prone) API. Something to cleanup in the future...
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix dangling pointer when exiting from the helper mode
When exiting from the helper mode for a given router after an
unsuccessful graceful restart, removing the neighborship to that
router straight away leads to a dangling pointer in the associated
interface, which inevitably leads to a crash. To solve this
problem, schedule the removal of the neighbor instead of removing
it immediately.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix small issue when exiting from the GR helper mode
When exiting from the GR helper mode, recalculate the DR only for
interfaces of the appropriate types (broadcast and NMBA).
This fixes a problem where the state of a neighbor reachable over a
p2p interface was changing from Full/DROther to Full/Backup across
a graceful restart.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
ospfd: fix GR helper initialization and termination
Since a single ospfd process can have multiple OSPF interfaces
configured, we need to separate the global GR initialization and
termination from per-instance initialization and termination.
Igor Ryzhov [Fri, 4 Jun 2021 14:47:32 +0000 (17:47 +0300)]
ospfd: fix passive interface configuration
Currently, passive interface flag is configured from the router node
using "passive-interface IFNAME". There are multiple problems with this
command:
- it is not in line with all other interface-related commands - other
parameters are configured from the interface node using "ip ospf"
prefix
- it is not in line with OSPFv3 - passive flag is configured from the
interface node using "ipv6 ospf6 passive" command
- most importantly, it doesn't work correctly when the interface is in
a different VRF - when using VRF-lite, it incorrectly changes the
vrf_id of the interface and it becomes desynced with the actual state;
when using netns, it creates a new fake interface and configures it
instead of configuring the necessary interface
To fix all the problems, this commit adds a new command to the interface
configuration node - "ip ospf passive". The purpose of the command is
completely the same, but it works correctly in a multi-VRF environment.
The old command is preserved for the backward compatibility, but the
warning is added that it is deprecated because it doesn't work correctly
with VRFs.
Rafael Zalamena [Mon, 7 Jun 2021 14:02:16 +0000 (11:02 -0300)]
lib: fix address sanitizer crash on `find`
Fix the following address sanitizer crash when running the command `find`:
ERROR: AddressSanitizer: dynamic-stack-buffer-overflow
WRITE of size 1 at 0x7fff4840fc1d thread T0
0 in print_cmd ../lib/command.c:1541
1 in cmd_find_cmds ../lib/command.c:2364
2 in find ../vtysh/vtysh.c:3732
3 in cmd_execute_command_real ../lib/command.c:995
4 in cmd_execute_command ../lib/command.c:1055
5 in cmd_execute ../lib/command.c:1219
6 in vtysh_execute_func ../vtysh/vtysh.c:486
7 in vtysh_execute ../vtysh/vtysh.c:671
8 in main ../vtysh/vtysh_main.c:721
9 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
10 in _start (/usr/bin/vtysh+0x21f64d)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Igor Ryzhov [Wed, 2 Jun 2021 14:27:02 +0000 (17:27 +0300)]
zebra: fix config after exit from vrf
When the VRF node is exited using "exit" or "quit", there's still a VRF
pointer stored in the vty context. If you try to configure some router
related command, it will be applied to the previous VRF instead of the
default VRF. For example:
```
(config)# vrf test
(config-vrf)# ip router-id 1.1.1.1
(config-vrf)# do show run
...
!
vrf test
ip router-id 1.1.1.1
exit-vrf
!
...
(config-vrf)# exit
(config)# ip router-id 2.2.2.2
(config)# do show run
...
!
vrf test
ip router-id 2.2.2.2
exit-vrf
!
...
```
`vrf-exit` works correctly, because it stores a pointer to the default
VRF into the vty context (but weirdly keeping the VRF_NODE instead of
changing it to CONFIG_NODE).
Instead of relying on the behavior of exit function, always use the
default VRF when in CONFIG_NODE.
Another problem is missing `VTY_CHECK_CONTEXT`. If someone deletes the
VRF in which node the user enters the command, then zebra applies the
command to the default VRF instead of throwing an error.
Igor Ryzhov [Tue, 1 Jun 2021 17:30:13 +0000 (20:30 +0300)]
bfdd: fix bfd key structure
There's a padding byte between "mhop" and "peer" fields in this structure.
This structure is sometimes passed by value to functions and used in
assignments. The standard doesn't guarantee that the padding bytes are
copied on assignments. As this structure is used as a hash key, having
this padding byte with unspecified value can lead to unwanted behavior.
Fix the possible issue by making the "mhop" field to be 2 bytes. Also
make the struct packed as a precaution for future changes.
the config for dynamic candidate paths with bandwidth preferences
was using a different order of keywords (required bandwidth X) than
the corresponding command (bandwidth X required). This confuses
frr-reload, and possibly users too. Make both use the same order.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Igor Ryzhov [Thu, 6 May 2021 23:49:40 +0000 (02:49 +0300)]
lib: fix binding to a vrf
There are two possible use-cases for the `vrf_bind` function:
- bind socket to an interface in a vrf
- bind socket to a vrf device
For the former case, there's one problem - success is returned when the
interface is not found. In that case, the socket is left unbound without
throwing an error.
For the latter case, there are multiple possible problems:
- If the name is not set, then the socket is left unbound (zebra, vrrp).
- If the name is "default" and there's an interface with that name in the
default VRF, then the socket is bound to that interface.
- In most daemons, if the router is configured before the VRF is actually
created, we're trying to open and bind the socket right after the
daemon receives a VRF registration from zebra. We may not receive the
VRF-interface registration from zebra yet at that point. Therefore,
`if_lookup_by_name` fails, and the socket is left unbound.
This commit fixes all the issues and updates the function description.
Suggested-by: Pat Ruddy <pat@voltanet.io> Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Christian Hopps [Thu, 20 May 2021 06:50:34 +0000 (06:50 +0000)]
lib: fix threading bug in GRPC code
The code that actually calls FRR northbound functions needs to be running in the
master thread. The previous code was running on a GRPC pthread. While fixing
moved to more functional vs OOP to make this easier to see.
Also fix ly merge to merge siblings not throw the originals away.
Christian Hopps [Fri, 28 May 2021 19:16:18 +0000 (19:16 +0000)]
northbound: KISS always batch yang config (file read), it's faster
The backoff code assumed that yang operations always completed quickly.
It checked for > 100 YANG modeled commands happening in under 1 second
to enable batching. If 100 yang modeled commands always take longer than
1 second batching is never enabled. This is the exact opposite of what
we want to happen since batching speeds the operations up.
Here are the results for libyang2 code without and with batching.
(IPv6 numbers are basically the same as iPv4, a couple percent slower)
Clearly we need this. Please note the growth (1K to 2K) w/o batching is
non-linear and 100 times slower than batched.
Notes on code: The use of the new `nb_cli_apply_changes_clear_pending`
is to commit any pending changes (including the current one). This is
done when the code would not correctly handle a single diff that
included the current changes with possible following changes. For
example, a "no" command followed by a new value to replace it would be
merged into a change, and the code would not deal well with that. A good
example of this is BGP neighbor peer-group changing. The other use is
after entering a router level (e.g., "router bgp") where the follow-on
command handlers expect that router object to now exists. The code
eventually needs to be cleaned up to not fail in these cases, but that
is for future NB cleanup.
Pat Ruddy [Wed, 3 Mar 2021 11:59:30 +0000 (11:59 +0000)]
ospf6d: read ospf6 socket until failure
To ensure we read all the datagrams availabe from a socket when the
read task is scheduled, make the read helper return and error or
continue enum and loop unitl an error is received.
This requires the read from the socket to be non blocking
Pat Ruddy [Wed, 3 Mar 2021 11:17:38 +0000 (11:17 +0000)]
ospf6d: create an ospf_read_helper function
Take the contents of ospf6_receive and split the funtionality that
deals with a single packet receipt and place it in a separate helper
function.
This is the first step in a refactor process to allow the ospf6_read
task to read until failure.
Pat Ruddy [Tue, 25 May 2021 08:38:26 +0000 (09:38 +0100)]
ospf6: fix memory leak in ospf6_abr_examin_summary
Ensure that if allocated route is not added to a table then it is
deleted to avoid leaking memory.
Add a new memory type for route table so that ospf6 routes can be
distinguished in the show memory output in isolation.
Martin Buck [Wed, 26 May 2021 14:03:51 +0000 (16:03 +0200)]
ospf6d: Fix route map "set tag" command
So far, "set tag" was 99% implemented in ospf6d, but registration of the
hook functions was missing, causing "set tag" actions in route maps to be
ignored in ospf6d.
This commit adds the missing hook registration.
Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Igor Ryzhov [Wed, 26 May 2021 08:48:09 +0000 (11:48 +0300)]
ospf6d: fix debug message config write
Fix the following issues:
- if "send" is combined with "recv-hdr", only "send" is shown
- if "recv" is combined with "send-hdr", only "recv" is shown
- if both "send-hdr" and "recv-hdr" are enabled, "; header only" is shown
Igor Ryzhov [Tue, 25 May 2021 18:58:55 +0000 (21:58 +0300)]
ospf6d: fix possible crashes
OSPF6 instance may not exist when processing interface state change.
Do not execute processing steps that require an instance if an area is
not configured for an interface.
Donald Sharp [Mon, 24 May 2021 17:45:29 +0000 (13:45 -0400)]
ospfd: Fix quick interface down up event handling in ospf
When we get this sequence of events:
- zebra receives interface up, sends to ospf
- ospf receives intf up, processes( including neighbor formation and spf )
and sends route to zebra for installation.
- zebra receives route for processing, schedules it too happen in the future
- zebra receives interface down event, sends to ospf
- zebra processes route X and marks it inactive because nexthop
interface is down
- zebra receives interface up event, sends to ospf
- ospf receives both events and processes the change and decides
that nothing has changed so it does not send any route change for X to zebra.
At this point zebra has a route from ospf that is marked as inactive, while
ospf believes that the route should be installed properly.
Modify the code such that on an interface down event, ospf marks the routes
as changed if the ifindex is being used for a nexthop, so that when ospf
is deciding if routes have changed post spf that it can just automatically
send that route down again if it still exists.
Rafael Zalamena [Mon, 24 May 2021 11:30:26 +0000 (08:30 -0300)]
ospf6d: fix address sanitizer crash
Don't `memcpy` a `struct prefix` the memory size varies depending on the
original intended type. In this case the original type was (casted away)
`struct prefix_ipv6` and we tried to copy `struct prefix` which is
bigger.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Don Slice [Wed, 19 May 2021 18:23:28 +0000 (14:23 -0400)]
ospfd: "ip ospf area" command can select wrong process
Found that in some circumstances, when the "ip ospf area"
command was entered for the default vrf, the wrong ospf
process would be used to check for the presence of a
"network" statement, causing the "ip ospf area" command to
be rejected. This was due to the command using the ospf
instance lookup to find the right ospf process, which can
be in error depending on when the processes were created.
Christian Hopps [Thu, 20 May 2021 06:49:32 +0000 (06:49 +0000)]
lib: fix northbound merge code (libyang)
lyd_merge_tree replaces dest siblings with source siblings, not what we
want. Instead lyd_merge_siblings to keep both. Instead lyd_merge_siblings
to keep both.
Igor Ryzhov [Mon, 17 May 2021 22:24:22 +0000 (01:24 +0300)]
ospf6d: fix "default-information originate" in non-existing vrf
If the default route redistribution is configured in OSPF6 router before
the VRF is created, then this is not currently registered in zebra after
the VRF creation.
Igor Ryzhov [Mon, 17 May 2021 22:23:35 +0000 (01:23 +0300)]
ospfd: fix "default-information originate" in non-existing vrf
If the default route redistribution is configured in OSPF router before
the VRF is created, then this is not currently registered in zebra after
the VRF creation.
Donald Sharp [Fri, 14 May 2021 13:46:36 +0000 (09:46 -0400)]
pimd: Fix rare crash situation
When running pim on an interface and that interface has
state and we move that interface into a different vrf
there exists a call path where we have not created the pimreg
device yet. Prevent a crash in this rare situation.
Ticket: #2552763 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Igor Ryzhov [Fri, 7 May 2021 14:05:40 +0000 (17:05 +0300)]
ospf6d: always free redistribute config
When the ospf6 instance in unknown VRF is deleted, the redistribution
config is not freed, because it is not registered in zebra. We should
always free the config regardless of zebra registration status.
Igor Ryzhov [Sun, 9 May 2021 13:43:29 +0000 (16:43 +0300)]
isisd: fix dangling instances
We only need an instance when we have at least one area configured in a
VRF. Currently we have the following issues:
- instance for the default VRF is always created
- instance is not removed after the last area config is removed
Donald Sharp [Wed, 12 May 2021 18:31:45 +0000 (14:31 -0400)]
pimd: Remove pim->vrf_id and use pim->vrf->vrf_id
VRF creation can happen from either cli or from
knowledged about the vrf learned from zebra.
In the case where we learn about the vrf from
the cli, the vrf id is UNKNOWN. Upon actual
creation of the vrf, lib/vrf.c touches up the vrf_id
and calls pim_vrf_enable to turn it on properly.
At this point in time we have a pim->vrf_id of
UNKNOWN and the vrf->vrf_id of the right value.
There is no point in duplicating this data. So just
remove all pim->vrf_id and use the vrf->vrf_id instead
since we keep a copy of the pim->vrf pointer.
This will remove some crashes where we expect the
pim->vrf_id to be usable and it's not.
Donald Sharp [Mon, 3 May 2021 18:39:47 +0000 (14:39 -0400)]
pimd: There exists a path where on vrf bringup we do not create the pimreg
When creating configuration for a vrf *Before* the vrf has been
created, pim will not properly create the pimreg device and
we will promptly crash when we try to pass data.
Put some code checks in place to ensure that the pimreg is
created for vrf's.
When browsing or parsing OSPF LSA TLVs, we need to use the LSA length which is
part of the LSA header. This length, encoded in 16 bits, must be first
converted to host byte order with ntohs() function. However, Coverity Scan
considers that ntohs() function return TAINTED data. Thus, when the length is
used to control for() loop, Coverity Scan marks this part of the code as defect
with "Untrusted Loop Bound" due to the usage of Tainted variable. Similar
problems occur when browsing sub-TLV where length is extracted with ntohs().
To overcome this limitation, a size attribute has been added to the ospf_lsa
structure. The size is set when lsa->data buffer is allocated. In addition,
when an OSPF packet is received, the size of the payload is controlled before
contains is processed. For OSPF LSA, this allow a secure buffer allocation.
Thus, new size attribute contains the exact buffer allocation allowing a
strict control during TLV browsing.
This patch adds extra control to bound for() loop during TLV browsing to
avoid potential problem as suggested by Coverity Scan. Controls are based
on new size attribute of the ospf_lsa structure to avoid any ambiguity.
Trey Aspelund [Fri, 21 May 2021 22:04:15 +0000 (22:04 +0000)]
lib: fix handling of rmap prefix-tree default node
Prior to this commit, updating a prefix-list that is referenced by a
route-map clause will unconditionally delete the root node of that
route-map's prefix-tree (used with route-map optimization).
This is problematic because routes not matching a more specific node
in the tree (i.e. other prefix-list sequences) will not fall-back to
the default node, thus they will not hit any route-map sequences.
This commit ensures that an update to a prefix-list will only delete
the default node while adding the first/only seq to the list.
Example config:
========
ip prefix-list peer475-out-pfxlist seq 45 permit 2.138.0.0/16
ip prefix-list peer475-out-pfxlist seq 50 permit 0.0.0.0/0
!
route-map peer475-out permit 5
match ip address prefix-list peer475-out-pfxlist
Before:
========
ub20# do show route-map peer475-out prefix-table
ZEBRA:
IPv4 Prefix Route-map Index List
_______________ ____________________
0.0.0.0/0 (2)
(P)
peer475-out seq 5
2.138.0.0/16 (2)
(P) 0.0.0.0/0
peer475-out seq 5
IPv6 Prefix Route-map Index List
_______________ ____________________
BGP:
IPv4 Prefix Route-map Index List
_______________ ____________________
0.0.0.0/0 (2)
(P)
peer475-out seq 5
2.138.0.0/16 (2)
(P) 0.0.0.0/0
peer475-out seq 5
IPv6 Prefix Route-map Index List
_______________ ____________________
ub20# conf t
ub20(config)# ip prefix-list peer475-out-pfxlist seq 45 permit 2.138.0.0/16 le 32
ub20(config)# do show route-map peer475-out prefix-table
ZEBRA:
IPv4 Prefix Route-map Index List
_______________ ____________________
2.138.0.0/16 (2)
(P)
peer475-out seq 5
IPv6 Prefix Route-map Index List
_______________ ____________________
BGP:
IPv4 Prefix Route-map Index List
_______________ ____________________
2.138.0.0/16 (2)
(P)
peer475-out seq 5
IPv6 Prefix Route-map Index List
_______________ ____________________
ub20(config)#
After:
========
ub20(config)# do show route-map peer475-out prefix-table
ZEBRA:
IPv4 Prefix Route-map Index List
_______________ ____________________
0.0.0.0/0 (2)
(P)
peer475-out seq 5
2.138.0.0/16 (2)
(P) 0.0.0.0/0
peer475-out seq 5
IPv6 Prefix Route-map Index List
_______________ ____________________
BGP:
IPv4 Prefix Route-map Index List
_______________ ____________________
0.0.0.0/0 (2)
(P)
peer475-out seq 5
2.138.0.0/16 (2)
(P) 0.0.0.0/0
peer475-out seq 5
IPv6 Prefix Route-map Index List
_______________ ____________________
ub20(config)# ip prefix-list peer475-out-pfxlist seq 45 permit 2.138.0.0/16 le 32
ub20(config)# do show route-map peer475-out prefix-table
ZEBRA:
IPv4 Prefix Route-map Index List
_______________ ____________________
0.0.0.0/0 (2)
(P)
peer475-out seq 5
2.138.0.0/16 (2)
(P) 0.0.0.0/0
peer475-out seq 5
IPv6 Prefix Route-map Index List
_______________ ____________________
BGP:
IPv4 Prefix Route-map Index List
_______________ ____________________
0.0.0.0/0 (2)
(P)
peer475-out seq 5
2.138.0.0/16 (2)
(P) 0.0.0.0/0
peer475-out seq 5
IPv6 Prefix Route-map Index List
_______________ ____________________