Igor Ryzhov [Tue, 17 Nov 2020 18:21:09 +0000 (21:21 +0300)]
changelog: add debian revision
It is optional, but lintian complains when a package mixes versions with
and without revision number. All previous versions have it so 7.5 should
have it too.
Rafael Zalamena [Sun, 4 Oct 2020 21:04:27 +0000 (18:04 -0300)]
lib: notify BFD when adding new profile
When a BFD integrated session already exists setting the profile
doesn't cause a session update (or vice versa): fix this issue by
handling the other cases.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Carlo Galiotto [Fri, 13 Nov 2020 16:35:06 +0000 (17:35 +0100)]
ospfd: reset mpls-te prior to ospf router removal
This commits attempts to fix a problem that occurs when mpls-te gets
removed from ospfd config. Mpls-te has an inter-as option, which can be
set to Off/Area/AS. Whenever the inter-as takes "Area" or "AS" as a
value, this value will not be cleaned after removing mpls-te or after
removing the ospf router. Therefore, if mpls-te is configured with
inter-as AS or Area and we remove mpls-te or the ospf router, the
inter-as will still preserve its value; therefore, next time mpls-te is
enabled, it will automatically inherits the previous inter-as value
(either Area or AS). This leads to wrong configuration, which can be a
problem for frr_reload.py.
The commits forces mpls-te to reset inter-as to Off before it mpls-te
gets removed from the configuration and before the ospf router gets
removed.
Donald Sharp [Mon, 16 Nov 2020 20:12:43 +0000 (15:12 -0500)]
lib: When aborting log data
When a FRR process dies due to SIGILL/SIGABORT/etc attempt
to drain the log buffer. This code change is capturing
some missing logs that were not part of the log file on
a crash.
Donald Sharp [Sat, 14 Nov 2020 22:33:43 +0000 (17:33 -0500)]
bgpd: on debug esi was not properly setup
There exists a code path where the esi would be passed
to a debug without the esi being setup with any values
causing us to display what ever is on the stack.
Donald Sharp [Sat, 14 Nov 2020 20:32:49 +0000 (15:32 -0500)]
bgpd: Fix missed unlocks
When iterating over the bgp_dest table, using this pattern:
for (dest = bgp_table_top(table); dest;
dest = bgp_route_next(dest)) {
If the code breaks or returns in the middle we will not have
properly unlocked the node as that bgp_table_top locks the top
dest and bgp_route_next locks the next dest and unlocks the old
dest.
From code inspection I have found a bunch of places that
we either return in the middle of or a break is issued.
Michael Hohl [Wed, 11 Nov 2020 15:56:15 +0000 (16:56 +0100)]
docs: mention activate keyword in user docs
As of now, the BGP user documentation does not explicitly mention how
to use IPv6. This commit adds documentation of the activate command to
the user documentation which is crucial to get IPv6 networks announced
using FRRouting.
Pat Ruddy [Thu, 29 Oct 2020 16:38:42 +0000 (16:38 +0000)]
bgpd: withdraw any exported routes when deleting a vrf
When a BGP vrf instance is deleted, the routes it exported into the
main VPN table are not deleted and they remain as stale routes
attached to an unknown bgp instance. When the new vrf instance comes
along, it imports these routes from the main table and thus we see
duplicatesalongside its own identical routes.
The solution is to call the unexport logic when a BGP vrf instance is
being deleted.
problem example
---------------
volta1# sh bgp vrf VRF-a ipv4 unicast
BGP table version is 4, local router ID is 18.0.0.1, vrf id 5
Default local pref 100, local AS 567
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 7.0.0.6/32 7.0.0.5@0< 10 100 0 ?
*> 7.0.0.8/32 18.0.0.8 0 0 111 ?
*> 18.0.0.0/24 18.0.0.8 0 0 111 ?
*> 56.0.0.0/24 7.0.0.5@0< 0 100 0 ?
Displayed 4 routes and 4 total paths
volta1# conf t
volta1(config)# no router bgp 567 vrf VRF-a
volta1(config)#
volta1(config)# router bgp 567 vrf VRF-a
volta1(config-router)# bgp router-id 18.0.0.1
volta1(config-router)# no bgp ebgp-requires-policy
volta1(config-router)# no bgp network import-check
volta1(config-router)# neighbor 18.0.0.8 remote-as 111
volta1(config-router)# !
volta1(config-router)# address-family ipv4 unicast
volta1(config-router-af)# label vpn export 12345
volta1(config-router-af)# rd vpn export 567:111
volta1(config-router-af)# rt vpn both 567:100
volta1(config-router-af)# export vpn
volta1(config-router-af)# import vpn
volta1(config-router-af)# exit-address-family
volta1(config-router)# !
volta1(config-router)# end
volta1# sh bgp vrf VRF-a ipv4 unicast
BGP table version is 4, local router ID is 18.0.0.1, vrf id 5
Default local pref 100, local AS 567
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 7.0.0.6/32 7.0.0.5@0< 10 100 0 ?
* 7.0.0.8/32 18.0.0.8 0 0 111 ?
*> 18.0.0.8@-< 0 0 111 ?
* 18.0.0.0/24 18.0.0.8 0 0 111 ?
*> 18.0.0.8@-< 0 0 111 ?
*> 56.0.0.0/24 7.0.0.5@0< 0 100 0 ?
Displayed 4 routes and 6 total paths
@- routes indicating unknown bgp instance are imported
Mark Stapp [Tue, 10 Nov 2020 14:50:50 +0000 (09:50 -0500)]
tests: only test count of nexthops in bgp max-paths test
Add support to compare the number of RIB nexthops, rather than the
specific nexthop addresses. Use this in the bgp_ecmp topotests that
test maximum-paths - testing the specific nexthops is wrong there,
it's not deterministic and we get spurious failures.
Soman K S [Sun, 4 Oct 2020 16:30:07 +0000 (22:00 +0530)]
ospf6d : Intra area route for connected prefix not installed
Issue: When ospfv3 is configured on interface between routers in different network,
the intra area route for the remote connected prefix is not installed in ospf
route table and zebra
Fix: When the advertising router is directly connected but in different network
the interface lookup in the intra area lsa processing does not provide
the matching interface and valid nexthop. Therefore the nexthop is
copied from the link state entry which contains valid
ifindex required for installing the route.
Donald Sharp [Tue, 3 Nov 2020 20:24:03 +0000 (15:24 -0500)]
bgpd: Allow 1 prefix to generate statistics
When generating a config with 1 prefix:
BGP IPv4 Unicast RIB statistics
Total Advertisements : 0
Total Prefixes : 0
Average prefix length : 0.00
Unaggregateable prefixes : 0
Maximum aggregateable prefixes: 0
BGP Aggregate advertisements : 0
Address space advertised : 0
% announced : 0.00
/8 equivalent : 0.00
/24 equivalent : 0.00
Advertisements with paths : 0
Longest AS-Path (hops) : 0
Average AS-Path length (hops) : 0.00
Largest AS-Path (bytes) : 0
Average AS-Path size (bytes) : 0.00
Highest public ASN : 0
eva# show bgp ipv4 uni summ
BGP router identifier 10.10.3.11, local AS number 329 vrf-id 0
BGP table version 1
RIB entries 1, using 192 bytes of memory
Peers 1, using 23 KiB of memory
We are not displaying it in the statistics data. This is because FRR is walking the associated
table and comparing the current dest to the top of the tree. I have no idea why this is
the case as that when you have 1 prefix you only have 1 node in your tree. Looking at the
code this is the original code that was imported in 2006. I cannot think of any reason why
FRR needs to exclude this particular node.
Fixed:
eva# show bgp ipv4 uni summ
BGP router identifier 10.10.3.11, local AS number 329 vrf-id 0
BGP table version 1
RIB entries 1, using 192 bytes of memory
Peers 1, using 23 KiB of memory
Donald Sharp [Mon, 2 Nov 2020 15:14:48 +0000 (10:14 -0500)]
bgpd: Multipath is always being allocated
The multipath arrays are always being allocated, irrelevant
if we actually have multipath information for a prefix.
This is because the link bandwidth code was always adding the
data structure. We should not be allocated multipath information
unless we actually have multipath information
isisd: fix segfault in the circuit p2p/bcast union
The fields in the broadcast/p2p union struct in an isis circuit are
initialized when the circuit goes up, but currently this step is
skipped if the interface is passive. This can create problems if the
circuit type (referred to as network type in the config) changes from
broadcast to point-to-point. We can end up with the p2p neighbor
pointer pointing at some garbage left by the broadcast struct in the
union, which would then cause a segfault the first time we would
dereference it - for example when building the lsp, or computing the
SPF tree.
compressed backtrace of a possible crash:
#0 0x0000555555579a9c in lsp_build at frr/isisd/isis_lsp.c:1114
#1 0x000055555557a516 in lsp_regenerate at frr/isisd/isis_lsp.c:1301
#2 0x000055555557aa25 in lsp_refresh at frr/isisd/isis_lsp.c:1381
#3 0x00007ffff7b2622c in thread_call at frr/lib/thread.c:1549
#4 0x00007ffff7ad6df4 in frr_run at frr/lib/libfrr.c:1098
#5 0x000055555556b67f in main at frr/isisd/isis_main.c:272
isis_lsp.c:
1112 case CIRCUIT_T_P2P: {
1113 struct isis_adjacency *nei = circuit->u.p2p.neighbor;
1114 if (nei && nei->adj_state == ISIS_ADJ_UP
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Donald Sharp [Fri, 13 Nov 2020 17:06:57 +0000 (12:06 -0500)]
zebra: Allow `set src X` to work on startup
If a route-map in zebra has `set src X` and the interface
X is on has not been configured yet, we are rejecting the command
outright. This is a problem on boot up especially( and where I
found this issue ) in that interfaces *can* and *will* be slow
on startup and config can easily be read in *before* the
interface has an ip address.
Let's modify zebra to just warn to the user we may have a problem
and let the chips fall where they may.
Martin Winter [Tue, 3 Nov 2020 22:47:40 +0000 (23:47 +0100)]
FRRouting Release 7.5
BFD
Profile support
Minimum ttl support
BGP
rpki VRF support
GR fixes
Add wide option to display of routes
Add `maximum-prefix <num> force`
Add `bestpath-routes` to neighbor command
Add `bgp shutdown message MSG...` command
Add v6 Flowspec support
Add `neighbor <neigh> shutdown rtt` command
Allow update-delay to be applied globaly
EVPN
Beginning of MultiHoming Support
ISIS
Segment Routing Support
VRF Support
Guard against adj timer display overflow
Add support for Anycast-SIDs
Add support for Topology Independent LFA (TI-LFA)
Add `lsp-gen-interval 2` to isis configuration
OSPF
Segment Routing support for ECMP
Various LSA fixes
Prevent crash if transferring config amongst instances
PBR
Adding json support to commands
DSCP/ECN based PBR Matching
PIM
Add more json support to commands
Fix missing mesh-group commands
MSDP SA forwarding
Clear (s,g,rpt) ifchannel on (*, G) prune received
Fix igmp querier election and IP address mapping
Crash fix when RP is removed
STATIC
Northbound Support
YANG
Filter and route-map Support
OSPF model definition
BGP model definition
VTYSH
Speed up output across daemons
Fix build-time errors for some --enable flags
Speed up output of configuration across daemons
ZEBRA
nexthop group support for FPM
northbound support for rib model
Backup nexthop support
netlink batching support
Allow upper level protocols to request ARP
Add json output for zebra ES, ES-EVI and access vlan dumps
Upgrade to using libyang1.0.184
RPM
Moved RPKI to subpackage
Added SNMP subpackage
As always there are too many bugfixes to list individually. This release
compromises just over 1k of commits by the community, with contributors from
70 people.
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
Igor Ryzhov [Tue, 13 Oct 2020 22:46:27 +0000 (01:46 +0300)]
ospfd: don't initialize ospf every time "router ospf" is used
Move ospf initialization to the actual place where it is created.
We don't need to do that every time "router ospf" is entered.
Also remove a couple of useless checks that can never be true.
Donald Sharp [Tue, 27 Oct 2020 13:59:10 +0000 (09:59 -0400)]
isisd: Fix usage of uninited memory
valgrind is showing a usage of uninited memory:
==935465== Conditional jump or move depends on uninitialised value(s)
==935465== at 0x159E17: tlvs_area_addresses_to_adj (isis_tlvs.c:4430)
==935465== by 0x15A4BD: isis_tlvs_to_adj (isis_tlvs.c:4568)
==935465== by 0x1377F0: process_p2p_hello (isis_pdu.c:203)
==935465== by 0x1391FD: process_hello (isis_pdu.c:781)
==935465== by 0x13BDBE: isis_handle_pdu (isis_pdu.c:1700)
==935465== by 0x13BECD: isis_receive (isis_pdu.c:1744)
==935465== by 0x49210FF: thread_call (thread.c:1585)
==935465== by 0x48CFACB: frr_run (libfrr.c:1099)
==935465== by 0x1218C9: main (isis_main.c:272)
==935465==
==935465== Conditional jump or move depends on uninitialised value(s)
==935465== at 0x483EEC5: bcmp (vg_replace_strmem.c:1111)
==935465== by 0x15A290: tlvs_ipv4_addresses_to_adj (isis_tlvs.c:4512)
==935465== by 0x15A4EB: isis_tlvs_to_adj (isis_tlvs.c:4570)
==935465== by 0x1377F0: process_p2p_hello (isis_pdu.c:203)
==935465== by 0x1391FD: process_hello (isis_pdu.c:781)
==935465== by 0x13BDBE: isis_handle_pdu (isis_pdu.c:1700)
==935465== by 0x13BECD: isis_receive (isis_pdu.c:1744)
==935465== by 0x49210FF: thread_call (thread.c:1585)
==935465== by 0x48CFACB: frr_run (libfrr.c:1099)
==935465== by 0x1218C9: main (isis_main.c:272)
Effectively we are reallocing memory to hold data. realloc does not
set the new memory to anything. So whatever happens to be in the memory
is what is there. after the realloc happens we are iterating over the
memory just realloced and doing memcmp's to values in it causing these
use of uninitialized memory.
Donald Sharp [Tue, 27 Oct 2020 19:16:32 +0000 (15:16 -0400)]
bgpd: Prevent ecomm memory leak
There are some situations where we create a ecommunity for
comparing to internal state when we are deleting, but in the
failure cases we would not free up the created memory.
Donald Sharp [Tue, 27 Oct 2020 16:40:46 +0000 (12:40 -0400)]
isisd: Fix memory leak in copy_tlv_router_cap
There exists a code path where we would allocate memory
then test a variable and then immediately return NULL.
Prevent memory from leaking in this situation.
Having an NSSA ABR redistributing statics, the type-7 LSA are being
continuously refreshed (every ~14 secs). The LSA Seq number keeps
incrementing and the LSA age is going back to 0 when reaching ~14s.
This PR fixes the issue by not forcing the LSA update
However I ignore if the "force" parameter was used in purpose. With this
PR updates are sent in case the metric or metric type are changed
Sep 24 08:54:48 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
Sep 24 08:55:02 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
Sep 24 08:55:16 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
Sep 24 08:55:30 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
Sep 24 08:55:44 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
Sep 24 08:55:58 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
Sep 24 08:56:12 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
Sep 24 08:56:26 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
Sep 24 08:56:40 r2 ospfd[7137]: ospf_flood_through: LOCAL NSSA FLOOD of Type-7.
ip route 2.2.2.2/32 blackhole
router ospf
network 10.0.23.0/24 area 1
area 1 nssa
!
r2# conf t
r2(config)# router ospf
r2(config-router)# redistribute static
r2# sh ip os da
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
2.2.2.2 10.0.25.2 13 0x8000000f 0x3f17 E2 2.2.2.2/32 [0x0] <<< Seq: f, age 13
r2# sh ip os da
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
2.2.2.2 10.0.25.2 0 0x80000010 0x3d18 E2 2.2.2.2/32 [0x0] <<< Seq: 10, age 0
r2# sh ip os da
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
2.2.2.2 10.0.25.2 3 0x8000001b 0x2723 E2 2.2.2.2/32 [0x0] <<< Seq: 1b, age 3
Soman K S [Sun, 18 Oct 2020 11:49:32 +0000 (17:19 +0530)]
ospfd: External LSA not flushed when area is configured as nssa or stub
Issue:
When the ospf area is changed from default to nssa or stub, the previously
advertised external LSAs are not removed from the neighbor.
The LSAs remain in database till maxage timeout.
Fix:
Advertise the external LSAs with age set to maxage and flood to the
nssa or stub area.
Chirag Shah [Tue, 27 Oct 2020 05:18:46 +0000 (22:18 -0700)]
bgpd: fix mem leak in router bgp import vrf check
==916511== 18 bytes in 2 blocks are definitely lost in loss record 7 of 147
==916511== at 0x483877F: malloc (vg_replace_malloc.c:307)
==916511== by 0x4BE0F0A: strdup (strdup.c:42)
==916511== by 0x48D66CE: qstrdup (memory.c:122)
==916511== by 0x1E6E31: bgp_vpn_leak_export (bgp_mplsvpn.c:2690)
==916511== by 0x28E892: bgp_router_create (bgp_nb_config.c:124)
==916511== by 0x48E05AB: nb_callback_create (northbound.c:869)
==916511== by 0x48E0FA2: nb_callback_configuration (northbound.c:1183)
==916511== by 0x48E13D0: nb_transaction_process (northbound.c:1308)
==916511== by 0x48E0137: nb_candidate_commit_apply (northbound.c:741)
==916511== by 0x48E024B: nb_candidate_commit (northbound.c:773)
==916511== by 0x48E6B21: nb_cli_classic_commit (northbound_cli.c:64)
==916511== by 0x48E757E: nb_cli_apply_changes (northbound_cli.c:281)
Don Slice [Wed, 21 Oct 2020 14:46:49 +0000 (07:46 -0700)]
bgpd: delay local routes until update-delay is over
Problem found that turning an update-delay would only delay prefixes
learned from peers by delaying bestpath, but would allow local routes
(network statements or redistributed) to be immediately advertised,
followed by an End of Rib indicator. This fix delays sending local
routes until the update-delay process is completed, which matches
what testing shows other vendors do..
Ticket: CM-31743 Signed-off-by: Don Slice <dslice@nvidia.com>
This problem was accidentally introduced as a part of another fixup -
[
commit e378f5020d1af1aee587659e5602ee8c07eb05f4 (anuradhak/mh-misc-fixes, mh-misc-fixes)
Author: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Date: Tue Sep 15 16:50:14 2020 -0700
zebra: fix use of freed es during zebra shutdown
]
zif->es_info.es is cleared as a part of zebra_evpn_es_local_info_clear so it
cannot be passed around as a pointer from zebra_evpn_local_es_update/del.
Because of this bug removing ES from an interface resulted in
a zebra crash.
==1263169== 304 (40 direct, 264 indirect) bytes in 1 blocks are definitely lost in loss record 61 of 78
==1263169== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==1263169== by 0x48DD878: qcalloc (memory.c:110)
==1263169== by 0x5116D5: bgp_table_init (bgp_table.c:110)
==1263169== by 0x4EB5C4: bgp_static_set_safi (bgp_route.c:5927)
==1263169== by 0x4C3382: vpnv4_network (bgp_mplsvpn.c:1911)
==1263169== by 0x489FBEC: cmd_execute_command_real (command.c:916)
==1263169== by 0x489F7CB: cmd_execute_command (command.c:976)
==1263169== by 0x489FD04: cmd_execute (command.c:1138)
==1263169== by 0x493AF73: vty_command (vty.c:517)
==1263169== by 0x493AA07: vty_execute (vty.c:1282)
==1263169== by 0x4939B54: vtysh_read (vty.c:2115)
==1263169== by 0x492E63C: thread_call (thread.c:1585)
The bgp_static_delete function was not unlocking the right bgp_dest. This
problem goes away after fixing this.
Donald Sharp [Fri, 23 Oct 2020 00:51:24 +0000 (20:51 -0400)]
isisd: Fix memory leak on shutdown
==935465== 40 bytes in 1 blocks are definitely lost in loss record 71 of 546
==935465== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==935465== by 0x48D6611: qcalloc (memory.c:110)
==935465== by 0x48CFE02: list_new (linklist.c:32)
==935465== by 0x15DBF0: isis_new (isisd.c:213)
==935465== by 0x15DAC4: isis_global_instance_create (isisd.c:179)
==935465== by 0x121892: main (isis_main.c:264)
==935465== 64 (40 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 101 of 546
==935465== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==935465== by 0x48D6611: qcalloc (memory.c:110)
==935465== by 0x48CFE02: list_new (linklist.c:32)
==935465== by 0x15DBE3: isis_new (isisd.c:212)
==935465== by 0x15DAC4: isis_global_instance_create (isisd.c:179)
==935465== by 0x121892: main (isis_main.c:264)
On isis shutdown we are seeing the above memory leaks. Modify
the code to start cleaning this up.
Stephen Worley [Fri, 23 Oct 2020 18:57:29 +0000 (14:57 -0400)]
zebra: fix unitialized msg header reading at startup
Fixes the valgrind error we were seeing on startup due to
initializing the msg header struct:
```
==2534283== Thread 3 zebra_dplane:
==2534283== Syscall param recvmsg(msg) points to uninitialised byte(s)
==2534283== at 0x4D616DD: recvmsg (in /usr/lib64/libpthread-2.31.so)
==2534283== by 0x43107C: netlink_recv_msg (kernel_netlink.c:744)
==2534283== by 0x4330E4: nl_batch_read_resp (kernel_netlink.c:1070)
==2534283== by 0x431D12: nl_batch_send (kernel_netlink.c:1201)
==2534283== by 0x431E8B: kernel_update_multi (kernel_netlink.c:1369)
==2534283== by 0x46019B: kernel_dplane_process_func (zebra_dplane.c:3979)
==2534283== by 0x45EB7F: dplane_thread_loop (zebra_dplane.c:4368)
==2534283== by 0x493F5CC: thread_call (thread.c:1585)
==2534283== by 0x48D3450: fpt_run (frr_pthread.c:303)
==2534283== by 0x48D3D41: frr_pthread_inner (frr_pthread.c:156)
==2534283== by 0x4D56431: start_thread (in /usr/lib64/libpthread-2.31.so)
==2534283== by 0x4E709D2: clone (in /usr/lib64/libc-2.31.so)
==2534283== Address 0x85cd850 is on thread 3's stack
==2534283== in frame #2, created by nl_batch_read_resp (kernel_netlink.c:1051)
==2534283==
==2534283== Syscall param recvmsg(msg.msg_control) points to unaddressable byte(s)
==2534283== at 0x4D616DD: recvmsg (in /usr/lib64/libpthread-2.31.so)
==2534283== by 0x43107C: netlink_recv_msg (kernel_netlink.c:744)
==2534283== by 0x4330E4: nl_batch_read_resp (kernel_netlink.c:1070)
==2534283== by 0x431D12: nl_batch_send (kernel_netlink.c:1201)
==2534283== by 0x431E8B: kernel_update_multi (kernel_netlink.c:1369)
==2534283== by 0x46019B: kernel_dplane_process_func (zebra_dplane.c:3979)
==2534283== by 0x45EB7F: dplane_thread_loop (zebra_dplane.c:4368)
==2534283== by 0x493F5CC: thread_call (thread.c:1585)
==2534283== by 0x48D3450: fpt_run (frr_pthread.c:303)
==2534283== by 0x48D3D41: frr_pthread_inner (frr_pthread.c:156)
==2534283== by 0x4D56431: start_thread (in /usr/lib64/libpthread-2.31.so)
==2534283== by 0x4E709D2: clone (in /usr/lib64/libc-2.31.so)
==2534283== Address 0xa0 is not stack'd, malloc'd or (recently) free'd
==2534283==
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Donald Sharp [Thu, 22 Oct 2020 12:02:33 +0000 (08:02 -0400)]
zebra: Do not delete nhg's when retain_mode is engaged
When `-r` is specified to zebra, on shutdown we should
not remove any routes from the fib. This was a problem
with nhg's on shutdown due to their ref-count behavior.
Introduce a methodology where on shutdown we don't mess
with the nexthop groups in the kernel. That way on
next startup things will be ok.
When the ASBR stops announcing a prefix into the NSSA area, the LSA
type 7 is removed from the area. However the ABR is refreshing the
type 5 in its LSDB while removing the Type 7 LSA. Routers outside
the area do not get an update.
With the following topology: r1---r2---r3, with r3 being the ASBR
announcing type 7 LSA:
r3 configuration
router ospf
redistribute static
network 10.0.23.0/24 area 1
area 1 nssa
!
We stop announcing prefix 3.3.3.3 in the ASBR
r3# conf
r3(config)# router ospf
r3(config-router)# no redistribute static
r3(config-router)#
r2 (ABR)
r2# sh ip os database
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 33.33.33.33 3600 0x8000002f 0x13be E2 3.3.3.3/32 [0x0] <-- flushed
AS External Link States
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.25.2 7 0x8000002f 0x73c7 E2 3.3.3.3/32 [0x0] <-- refreshed(?)
With PR#7086 the LSA type 5 is flushed from the LSDB in r2 and the change is
announced to routers outside the area (r1)
r2# sh ip os da
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 33.33.33.33 3600 0x80000002 0x6d91 E2 3.3.3.3/32 [0x0] <-- flushed
AS External Link States
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.25.2 3600 0x80000002 0xcd9a E2 3.3.3.3/32 [0x0] <-- flushed
r1# sh ip os da
AS External Link States
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.25.2 3600 0x80000002 0xcd9a E2 3.3.3.3/32 [0x0] <-- flushed
Unfortunately I just realized that with PR#7086 I'm introducing a new bug, as Type-5 LSA
are not being refreshed when reaching MaxAge
r2# sh ip os da
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 33.33.33.33 35 0x80000002 0x6d91 E2 3.3.3.3/32 [0x0] <--- refreshed
AS External Link States
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.25.2 3600 0x80000002 0xcd9a E2 3.3.3.3/32 [0x0] <--- not refreshed!
So this PR should fix the original issue and the bug introduced later, so when stopping
redistribution in the ASBR, both type 5 and type 7 are flushed:
r2# sh ip os da
NSSA-external Link States (Area 0.0.0.1 [NSSA])
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 33.33.33.33 3600 0x80000002 0x6d91 E2 3.3.3.3/32 [0x0]
AS External Link States
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.25.2 3600 0x80000002 0xcd9a E2 3.3.3.3/32 [0x0]
Routers outside the area are also notified
r1# sh ip os da
Link ID ADV Router Age Seq# CkSum Route
3.3.3.3 10.0.25.2 3600 0x80000002 0xcd9a E2 3.3.3.3/32 [0x0]
Re-enabling redistribution, both LSA will be advertised again
Igor Ryzhov [Wed, 14 Oct 2020 20:01:49 +0000 (23:01 +0300)]
isisd: fix check for area-tag modification
Interface area-tag is not supposed to be modified once defined, but the
necessary check is currently broken, because the circuit is never in
init_circ_list if the area-tag is already configured for the interface.
Donald Sharp [Tue, 13 Oct 2020 12:16:15 +0000 (08:16 -0400)]
ospfd: Prevent crash if transferring config amongst instances
If we enter:
int eth0
ip ospf area 0
ip ospf 10 area 0
!
This will crash ospf. Prevent this from happening.
OSPF instances:
a) Cannot be mixed with non-instance
b) Are their own process.
Since in multi-instance world ospf instances are their own process,
when an ospf processes receives an instance command we must remove
our config( if present ) and allow the new config to be active
in the new process. The problem here is that if you have not
done a `router ospf` above the lookup of the ospf pointer will
fail and we will just crash. Put some code in to prevent a crash
in this case.
Igor Ryzhov [Tue, 13 Oct 2020 11:03:42 +0000 (14:03 +0300)]
ospfd: fix "no ip ospf area"
This commit fixes the following behavior:
```
nfware(config)# interface enp2s0
nfware(config-if)# ip ospf area 0
nfware(config-if)# no ip ospf area 0
% [ospfd]: command ignored as it targets an instance that is not running
```
We should be able to use the command without configuring the instance.
Igor Ryzhov [Tue, 20 Oct 2020 19:43:31 +0000 (22:43 +0300)]
ospf6d: fix crash on message receive
OSPF6 daemon starts listening on its socket and reading messages right
after the initialization before the ospf6 router is created. If any
message is received, ospf6d crashes because ospf6_receive doesn't
NULL-check ospf6 pointer.
Fix this by opening the socket and reading messages only after the
creation of ospf6 router.
Mark Stapp [Fri, 16 Oct 2020 21:37:09 +0000 (17:37 -0400)]
zebra: support multiple connected subnets on an interface
[7.5 version]
We support configuration of multiple addresses in the same
subnet on a single interface: make sure that zebra supports
multiple instances of the corresponding connected route.
Donald Sharp [Fri, 16 Oct 2020 17:51:52 +0000 (13:51 -0400)]
zebra: Fix use after free in debug path
When zebra is running with debugs turned on there
is a use after free reported by the address sanitizer:
2020/10/16 12:58:02 ZEBRA: rib_delnode: (0:254):4.5.6.16/32: rn 0x60b000026f20, re 0x6080000131a0, removing
2020/10/16 12:58:02 ZEBRA: rib_meta_queue_add: (0:254):4.5.6.16/32: queued rn 0x60b000026f20 into sub-queue 3
=================================================================
==3101430==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000011d28 at pc 0x555555705ab6 bp 0x7fffffffdab0 sp 0x7fffffffdaa8
READ of size 8 at 0x608000011d28 thread T0
#0 0x555555705ab5 in re_list_const_first zebra/rib.h:222
#1 0x555555705b54 in re_list_first zebra/rib.h:222
#2 0x555555711a4f in process_subq_route zebra/zebra_rib.c:2248
#3 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
#4 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
#5 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
#6 0x7ffff7450e9c in thread_call lib/thread.c:1581
#7 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
#8 0x55555561a578 in main zebra/main.c:455
#9 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
#10 0x5555555e3429 in _start (/usr/lib/frr/zebra+0x8f429)
0x608000011d28 is located 8 bytes inside of 88-byte region [0x608000011d20,0x608000011d78)
freed by thread T0 here:
#0 0x7ffff768bb6f in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.6+0xa9b6f)
#1 0x7ffff739ccad in qfree lib/memory.c:129
#2 0x555555709ee4 in rib_gc_dest zebra/zebra_rib.c:746
#3 0x55555570ca76 in rib_process zebra/zebra_rib.c:1240
#4 0x555555711a05 in process_subq_route zebra/zebra_rib.c:2245
#5 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
#6 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
#7 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
#8 0x7ffff7450e9c in thread_call lib/thread.c:1581
#9 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
#10 0x55555561a578 in main zebra/main.c:455
#11 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
#0 0x7ffff768c037 in calloc (/lib/x86_64-linux-gnu/libasan.so.6+0xaa037)
#1 0x7ffff739cb98 in qcalloc lib/memory.c:110
#2 0x555555712ace in zebra_rib_create_dest zebra/zebra_rib.c:2515
#3 0x555555712c6c in rib_link zebra/zebra_rib.c:2576
#4 0x555555712faa in rib_addnode zebra/zebra_rib.c:2607
#5 0x555555715bf0 in rib_add_multipath_nhe zebra/zebra_rib.c:3012
#6 0x555555715f56 in rib_add_multipath zebra/zebra_rib.c:3049
#7 0x55555571788b in rib_add zebra/zebra_rib.c:3327
#8 0x5555555e584a in connected_up zebra/connected.c:254
#9 0x5555555e42ff in connected_announce zebra/connected.c:94
#10 0x5555555e4fd3 in connected_update zebra/connected.c:195
#11 0x5555555e61ad in connected_add_ipv4 zebra/connected.c:340
#12 0x5555555f26f5 in netlink_interface_addr zebra/if_netlink.c:1213
#13 0x55555560f756 in netlink_information_fetch zebra/kernel_netlink.c:350
#14 0x555555612e49 in netlink_parse_info zebra/kernel_netlink.c:941
#15 0x55555560f9f1 in kernel_read zebra/kernel_netlink.c:402
#16 0x7ffff7450e9c in thread_call lib/thread.c:1581
#17 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
#18 0x55555561a578 in main zebra/main.c:455
#19 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free zebra/rib.h:222 in re_list_const_first
This is happening because we are using the dest pointer after a call into
rib_gc_dest. In process_subq_route, we call rib_process() and if the
dest is deleted dest pointer is now garbage. We must reload the
dest pointer in this case.