diff options
Diffstat (limited to 'doc/user')
| -rw-r--r-- | doc/user/basic.rst | 17 | ||||
| -rw-r--r-- | doc/user/bfd.rst | 29 | ||||
| -rw-r--r-- | doc/user/bgp.rst | 39 | ||||
| -rw-r--r-- | doc/user/conf.py | 3 | ||||
| -rw-r--r-- | doc/user/ipv6.rst | 11 | ||||
| -rw-r--r-- | doc/user/isisd.rst | 6 | ||||
| -rw-r--r-- | doc/user/overview.rst | 2 | ||||
| -rw-r--r-- | doc/user/pim.rst | 5 | ||||
| -rw-r--r-- | doc/user/setup.rst | 16 | ||||
| -rw-r--r-- | doc/user/subdir.am | 1 | ||||
| -rw-r--r-- | doc/user/wecmp_linkbw.rst | 298 | ||||
| -rw-r--r-- | doc/user/zebra.rst | 168 |
12 files changed, 547 insertions, 48 deletions
diff --git a/doc/user/basic.rst b/doc/user/basic.rst index edcfce45ad..5b7786de18 100644 --- a/doc/user/basic.rst +++ b/doc/user/basic.rst @@ -86,6 +86,15 @@ Basic Config Commands debugging. Note that the existing code logs its most important messages with severity ``errors``. + .. warning:: + + FRRouting uses the ``writev()`` system call to write log messages. This + call is supposed to be atomic, but in reality this does not hold for + pipes or terminals, only regular files. This means that in rare cases, + concurrent log messages from distinct threads may get jumbled in + terminal output. Use a log file and ``tail -f`` if this rare chance is + inacceptable to your setup. + .. index:: single: no log file [FILENAME [LEVEL]] single: log file FILENAME [LEVEL] @@ -104,14 +113,6 @@ Basic Config Commands deprecated ``log trap`` command) will be used. The ``no`` form of the command disables logging to a file. - .. note:: - - If you do not configure any file logging, and a daemon crashes due to a - signal or an assertion failure, it will attempt to save the crash - information in a file named :file:`/var/tmp/frr.<daemon name>.crashlog`. - For security reasons, this will not happen if the file exists already, so - it is important to delete the file after reporting the crash information. - .. index:: single: no log syslog [LEVEL] single: log syslog [LEVEL] diff --git a/doc/user/bfd.rst b/doc/user/bfd.rst index e6a3c4977a..32397d1303 100644 --- a/doc/user/bfd.rst +++ b/doc/user/bfd.rst @@ -476,13 +476,36 @@ You can also clear packet counters per session with the following commands, only Session down events: 0 Zebra notifications: 4 -Logging / debugging -=================== +Debugging +========= -There are no fine grained debug controls for bfdd. Just enable debug logs. +By default only informational, warning and errors messages are going to be +displayed. If you want to get debug messages and other diagnostics then make +sure you have `debugging` level enabled: :: config log file /var/log/frr/frr.log debugging log syslog debugging + +You may also fine tune the debug messages by selecting one or more of the +debug levels: + +.. index:: [no] debug bfd network +.. clicmd:: [no] debug bfd network + + Toggle network events: show messages about socket failures and unexpected + BFD messages that may not belong to registered peers. + +.. index:: [no] debug bfd peer +.. clicmd:: [no] debug bfd peer + + Toggle peer event log messages: show messages about peer creation/removal + and state changes. + +.. index:: [no] debug bfd zebra +.. clicmd:: [no] debug bfd zebra + + Toggle zebra message events: show messages about interfaces, local + addresses, VRF and daemon peer registrations. diff --git a/doc/user/bgp.rst b/doc/user/bgp.rst index 85ccc277a8..eb718007e8 100644 --- a/doc/user/bgp.rst +++ b/doc/user/bgp.rst @@ -414,7 +414,11 @@ Require policy on EBGP .. index:: [no] bgp ebgp-requires-policy .. clicmd:: [no] bgp ebgp-requires-policy - This command requires incoming and outgoing filters to be applied for eBGP sessions. Without the incoming filter, no routes will be accepted. Without the outgoing filter, no routes will be announced. + This command requires incoming and outgoing filters to be applied + for eBGP sessions. Without the incoming filter, no routes will be + accepted. Without the outgoing filter, no routes will be announced. + + This is enabled by default. Reject routes with AS_SET or AS_CONFED_SET types ------------------------------------------------ @@ -1997,6 +2001,18 @@ BGP Extended Communities in Route Map This command set Site of Origin value. +.. index:: set extcommunity bandwidth <(1-25600) | cumulative | num-multipaths> [non-transitive] +.. clicmd:: set extcommunity bandwidth <(1-25600) | cumulative | num-multipaths> [non-transitive] + + This command sets the BGP link-bandwidth extended community for the prefix + (best path) for which it is applied. The link-bandwidth can be specified as + an ``explicit value`` (specified in Mbps), or the router can be told to use + the ``cumulative bandwidth`` of all multipaths for the prefix or to compute + it based on the ``number of multipaths``. The link bandwidth extended + community is encoded as ``transitive`` unless the set command explicitly + configures it as ``non-transitive``. + +.. seealso:: :ref:`wecmp_linkbw` Note that the extended expanded community is only used for `match` rule, not for `set` actions. @@ -2634,7 +2650,14 @@ structure is extended with :clicmd:`show bgp [afi] [safi]`. These commands display BGP routes for the specific routing table indicated by the selected afi and the selected safi. If no afi and no safi value is given, - the command falls back to the default IPv6 routing table + the command falls back to the default IPv6 routing table. + For EVPN prefixes, you can display the full BGP table for this AFI/SAFI + using the standard `show bgp [afi] [safi]` syntax. + +.. index:: show bgp l2vpn evpn route [type <macip|2|multicast|3|es|4|prefix|5>] +.. clicmd:: show bgp l2vpn evpn route [type <macip|2|multicast|3|es|4|prefix|5>] + + Additionally, you can also filter this output by route type. .. index:: show bgp [afi] [safi] summary .. clicmd:: show bgp [afi] [safi] summary @@ -2665,6 +2688,16 @@ structure is extended with :clicmd:`show bgp [afi] [safi]`. Display flap statistics of routes of the selected afi and safi selected. +.. index:: show bgp [afi] [safi] statistics +.. clicmd:: show bgp [afi] [safi] statistics + + Display statistics of routes of the selected afi and safi. + +.. index:: show bgp statistics-all +.. clicmd:: show bgp statistics-all + + Display statistics of routes of all the afi and safi. + .. _bgp-display-routes-by-community: Displaying Routes by Community Attribute @@ -3152,6 +3185,8 @@ Example of how to set up a 6-Bone connection. .. include:: rpki.rst +.. include:: wecmp_linkbw.rst + .. include:: flowspec.rst .. [#med-transitivity-rant] For some set of objects to have an order, there *must* be some binary ordering relation that is defined for *every* combination of those objects, and that relation *must* be transitive. I.e.:, if the relation operator is <, and if a < b and b < c then that relation must carry over and it *must* be that a < c for the objects to have an order. The ordering relation may allow for equality, i.e. a < b and b < a may both be true and imply that a and b are equal in the order and not distinguished by it, in which case the set has a partial order. Otherwise, if there is an order, all the objects have a distinct place in the order and the set has a total order) diff --git a/doc/user/conf.py b/doc/user/conf.py index 5582847431..d8a188b152 100644 --- a/doc/user/conf.py +++ b/doc/user/conf.py @@ -132,7 +132,8 @@ language = None # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build', 'rpki.rst', 'routeserver.rst', - 'ospf_fundamentals.rst', 'flowspec.rst', 'snmptrap.rst'] + 'ospf_fundamentals.rst', 'flowspec.rst', 'snmptrap.rst', + 'wecmp_linkbw.rst'] # The reST default role (used for this markup: `text`) to use for all # documents. diff --git a/doc/user/ipv6.rst b/doc/user/ipv6.rst index f3f064b850..8af54ee23d 100644 --- a/doc/user/ipv6.rst +++ b/doc/user/ipv6.rst @@ -91,6 +91,17 @@ Router Advertisement Default: enabled .. index:: + single: ipv6 nd ra-hop-limit (0-255) + single: no ipv6 nd ra-hop-limit [(0-255)] +.. clicmd:: [no] ipv6 nd ra-hop-limit [(0-255)] + + The value to be placed in the hop count field of router advertisements sent + from the interface, in hops. Indicates the maximum diameter of the network. + Setting the value to zero indicates that the value is unspecified by this + router. Must be between zero or 255 hops. + Default: ``64`` + +.. index:: single: ipv6 nd ra-lifetime (0-9000) single: no ipv6 nd ra-lifetime [(0-9000)] .. clicmd:: [no] ipv6 nd ra-lifetime [(0-9000)] diff --git a/doc/user/isisd.rst b/doc/user/isisd.rst index 6684a83e1f..9a0a0afb0c 100644 --- a/doc/user/isisd.rst +++ b/doc/user/isisd.rst @@ -111,6 +111,12 @@ writing, *isisd* does not support multiple ISIS processes. Enable or disable :rfc:`6232` purge originator identification. +.. index:: [no] lsp-mtu (128-4352) +.. clicmd:: [no] lsp-mtu (128-4352) + + Configure the maximum size of generated LSPs, in bytes. + + .. _isis-timer: ISIS Timer diff --git a/doc/user/overview.rst b/doc/user/overview.rst index c9934d1c68..cf8cc44097 100644 --- a/doc/user/overview.rst +++ b/doc/user/overview.rst @@ -300,6 +300,8 @@ BGP :t:`The Generalized TTL Security Mechanism (GTSM). V. Gill, J. Heasley, D. Meyer, P. Savola, C. Pingnataro. October 2007.` - :rfc:`5575` :t:`Dissemination of Flow Specification Rules. P. Marques, N. Sheth, R. Raszuk, B. Greene, J. Mauch, D. McPherson. August 2009` +- :rfc:`6286` + :t:`Autonomous-System-Wide Unique BGP Identifier for BGP-4. E. Chen, J. Yuan, June 2011.` - :rfc:`6608` :t:`Subcodes for BGP Finite State Machine Error. J. Dong, M. Chen, Huawei Technologies, A. Suryanarayana, Cisco Systems. May 2012.` - :rfc:`6810` diff --git a/doc/user/pim.rst b/doc/user/pim.rst index 2aa66d9dd9..2944e0b705 100644 --- a/doc/user/pim.rst +++ b/doc/user/pim.rst @@ -166,6 +166,11 @@ Certain signals have special meanings to *pimd*. urib-only Lookup in the Unicast Rib only. +.. index:: no ip msdp mesh-group [WORD] +.. clicmd:: no ip msdp mesh-group [WORD] + + Delete multicast source discovery protocol mesh-group + .. index:: ip igmp generate-query-once [version (2-3)] .. clicmd:: ip igmp generate-query-once [version (2-3)] diff --git a/doc/user/setup.rst b/doc/user/setup.rst index 6d61a970d2..f60a66b9fd 100644 --- a/doc/user/setup.rst +++ b/doc/user/setup.rst @@ -6,6 +6,22 @@ Basic Setup After installing FRR, some basic configuration must be completed before it is ready to use. +Crash logs +---------- + +If any daemon should crash for some reason (segmentation fault, assertion +failure, etc.), it will attempt to write a backtrace to a file located in +:file:`/var/tmp/frr/<daemon>[-<instance>].<pid>/crashlog`. This feature is +not affected by any configuration options. + +The crashlog file's directory also contains files corresponding to per-thread +message buffers in files named +:file:`/var/tmp/frr/<daemon>[-<instance>].<pid>/logbuf.<tid>`. In case of a +crash, these may contain unwritten buffered log messages. To show the contents +of these buffers, pipe their contents through ``tr '\0' '\n'``. A blank line +marks the end of valid unwritten data (it will generally be followed by +garbled, older log messages since the buffer is not cleared.) + Daemons Configuration File -------------------------- After a fresh install, starting FRR will do nothing. This is because daemons diff --git a/doc/user/subdir.am b/doc/user/subdir.am index ce519fbfbf..0b64232f3d 100644 --- a/doc/user/subdir.am +++ b/doc/user/subdir.am @@ -44,6 +44,7 @@ user_RSTFILES = \ doc/user/bfd.rst \ doc/user/flowspec.rst \ doc/user/watchfrr.rst \ + doc/user/wecmp_linkbw.rst \ # end EXTRA_DIST += \ diff --git a/doc/user/wecmp_linkbw.rst b/doc/user/wecmp_linkbw.rst new file mode 100644 index 0000000000..0d2fe9d756 --- /dev/null +++ b/doc/user/wecmp_linkbw.rst @@ -0,0 +1,298 @@ +.. _wecmp_linkbw: + +Weighted ECMP using BGP link bandwidth +====================================== + +.. _features-of-wecmp-linkbw: + +Overview +-------- + +In normal equal cost multipath (ECMP), the route to a destination has +multiple next hops and traffic is expected to be equally distributed +across these next hops. In practice, flow-based hashing is used so that +all traffic associated with a particular flow uses the same next hop, +and by extension, the same path across the network. + +Weigted ECMP using BGP link bandwidth introduces support for network-wide +unequal cost multipathing (UCMP) to an IP destination. The unequal cost +load balancing is implemented by the forwarding plane based on the weights +associated with the next hops of the IP prefix. These weights are computed +based on the bandwidths of the corresponding multipaths which are encoded +in the ``BGP link bandwidth extended community`` as specified in +[Draft-IETF-idr-link-bandwidth]_. Exchange of an appropriate BGP link +bandwidth value for a prefix across the network results in network-wide +unequal cost multipathing. + +One of the primary use cases of this capability is in the data center when +a service (represented by its anycast IP) has an unequal set of resources +across the regions (e.g., PODs) of the data center and the network itself +provides the load balancing function instead of an external load balancer. +Refer to [Draft-IETF-mohanty-bess-ebgp-dmz]_ and :rfc:`7938` for details +on this use case. This use case is applicable in a pure L3 network as +well as in a EVPN network. + +The traditional use case for BGP link bandwidth to load balance traffic +to the exit routers in the AS based on the bandwidth of their external +eBGP peering links is also supported. + + +Design Principles +----------------- + +Next hop weight computation and usage +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As described, in UCMP, there is a weight associated with each next hop of an +IP prefix, and traffic is expected to be distributed across the next hops in +proportion to their weight. The weight of a next hop is a simple factoring +of the bandwidth of the corresponding path against the total bandwidth of +all multipaths, mapped to the range 1 to 100. What happens if not all the +paths in the multipath set have link bandwidth associated with them? In such +a case, in adherence to [Draft-IETF-idr-link-bandwidth]_, the behavior +reverts to standard ECMP among all the multipaths, with the link bandwidth +being effectively ignored. + +Note that there is no change to either the BGP best path selection algorithm +or to the multipath computation algorithm; the mapping of link bandwidth to +weight happens at the time of installation of the route in the RIB. + +If data forwarding is implemented by means of the Linux kernel, the next hop’s +weight is used in the hash calculation. The kernel uses the Hash threshold +algorithm and use of the next hop weight is built into it; next hops need +not be expanded to achieve UCMP. UCMP for IPv4 is available in older Linux +kernels too, while UCMP for IPv6 is available from the 4.16 kernel onwards. + +If data forwarding is realized in hardware, common implementations expand +the next hops (i.e., they are repeated) in the ECMP container in proportion +to their weight. For example, if the weights associated with 3 next hops for +a particular route are 50, 25 and 25 and the ECMP container has a size of 16 +next hops, the first next hop will be repeated 8 times and the other 2 next +hops repeated 4 times each. Other implementations are also possible. + +Unequal cost multipath across a network +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For the use cases listed above, it is not sufficient to support UCMP on just +one router (e.g., egress router), or individually, on multiple routers; UCMP +must be deployed across the entire network. This is achieved by employing the +BGP link-bandwidth extended community. + +At the router which originates the BGP link bandwidth, there has to be user +configuration to trigger it, which is described below. Receiving routers +would use the received link bandwidth from their downstream routers to +determine the next hop weight as described in the earlier section. Further, +if the received link bandwidth is a transitive attribute, it would be +propagated to eBGP peers, with the additional change that if the next hop +is set to oneself, the cumulative link bandwidth of all downstream paths +is propagated to other routers. In this manner, the entire network will +know how to distribute traffic to an anycast service across the network. + +The BGP link-bandwidth extended community is encoded in bytes-per-second. +In the use case where UCMP must be based on the number of paths, a reference +bandwidth of 1 Mbps is used. So, for example, if there are 4 equal cost paths +to an anycast IP, the encoded bandwidth in the extended community will be +500,000. The actual value itself doesn’t matter as long as all routers +originating the link-bandwidth are doing it in the same way. + + +Configuration Guide +------------------- + +The configuration for weighted ECMP using BGP link bandwidth requires +one essential step - using a route-map to inject the link bandwidth +extended community. An additional option is provided to control the +processing of received link bandwidth. + +Injecting link bandwidth into the network +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +At the "entry point" router that is injecting the prefix to which weighted +load balancing must be performed, a route-map must be configured to +attach the link bandwidth extended community. + +For the use case of providing weighted load balancing for an anycast service, +this configuration will typically need to be applied at the TOR or Leaf +router that is connected to servers which provide the anycast service and +the bandwidth would be based on the number of multipaths for the destination. + +For the use case of load balancing to the exit router, the exit router should +be configured with the route map specifying the a bandwidth value that +corresponds to the bandwidth of the link connecting to its eBGP peer in the +adjoining AS. In addition, the link bandwidth extended community must be +explicitly configured to be non-transitive. + +The complete syntax of the route-map set command can be found at +:ref:`bgp-extended-communities-in-route-map` + +This route-map is supported only at two attachment points: +(a) the outbound route-map attached to a peer or peer-group, per address-family +(b) the EVPN advertise route-map used to inject IPv4 or IPv6 unicast routes +into EVPN as type-5 routes. + +Since the link bandwidth origination is done by using a route-map, it can +be constrained to certain prefixes (e.g., only for anycast services) or it +can be generated for all prefixes. Further, when the route-map is used in +the neighbor context, the link bandwidth usage can be constrained to certain +peers only. + +A sample configuration is shown below and illustrates link bandwidth +advertisement towards the "SPINE" peer-group for anycast IPs in the +range 192.168.x.x + +.. code-block:: frr + + ip prefix-list anycast_ip seq 10 permit 192.168.0.0/16 le 32 + route-map anycast_ip permit 10 + match ip address prefix-list anycast_ip + set extcommunity bandwidth num-multipaths + route-map anycast_ip permit 20 + ! + router bgp 65001 + neighbor SPINE peer-group + neighbor SPINE remote-as external + neighbor 172.16.35.1 peer-group SPINE + neighbor 172.16.36.1 peer-group SPINE + ! + address-family ipv4 unicast + network 110.0.0.1/32 + network 192.168.44.1/32 + neighbor SPINE route-map anycast_ip out + exit-address-family + ! + + +Controlling link bandwidth processing on the receiver +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +There is no configuration necessary to process received link bandwidth and +translate it into the weight associated with the corresponding next hop; +that happens by default. If some of the multipaths do not have the link +bandwidth extended community, the default behavior is to revert to normal +ECMP as recommended in [Draft-IETF-idr-link-bandwidth]_. + +The operator can change these behaviors with the following configuration: + +.. index:: bgp bestpath bandwidth <ignore | skip-missing | default-weight-for-missing> +.. clicmd:: bgp bestpath bandwidth <ignore | skip-missing | default-weight-for-missing> + +The different options imply behavior as follows: + +- ignore: Ignore link bandwidth completely for route installation + (i.e., do regular ECMP, not weighted) +- skip-missing: Skip paths without link bandwidth and do UCMP among + the others (if at least some paths have link-bandwidth) +- default-weight-for-missing: Assign a low default weight (value 1) + to paths not having link bandwidth + +This configuration is per BGP instance similar to other BGP route-selection +controls; it operates on both IPv4-unicast and IPv6-unicast routes in that +instance. In an EVPN network, this configuration (if required) should be +implemented in the tenant VRF and is again applicable for IPv4-unicast and +IPv6-unicast, including the ones sourced from EVPN type-5 routes. + +A sample snippet of FRR configuration on a receiver to skip paths without +link bandwidth and do weighted ECMP among the other paths (if some of them +have link bandwidth) is as shown below. + +.. code-block:: frr + + router bgp 65021 + bgp bestpath as-path multipath-relax + bgp bestpath bandwidth skip-missing + neighbor LEAF peer-group + neighbor LEAF remote-as external + neighbor 172.16.35.2 peer-group LEAF + neighbor 172.16.36.2 peer-group LEAF + ! + address-family ipv4 unicast + network 130.0.0.1/32 + exit-address-family + ! + + +Stopping the propagation of the link bandwidth outside a domain +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The link bandwidth extended community will get automatically propagated +with the prefix to EBGP peers, if it is encoded as a transitive attribute +by the originator. If this propagation has to be stopped outside of a +particular domain (e.g., stopped from being propagated to routers outside +of the data center core network), the mechanism available is to disable +the advertisement of all BGP extended communities on the specific peering/s. +In other words, the propagation cannot be blocked just for the link bandwidth +extended community. The configuration to disable all extended communities +can be applied to a peer or peer-group (per address-family). + +Of course, the other common way to stop the propagation of the link bandwidth +outside the domain is to block the prefixes themselves from being advertised +and possibly, announce only an aggregate route. This would be quite common +in a EVPN network. + +BGP link bandwidth and UCMP monitoring & troubleshooting +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Existing operational commands to display the BGP routing table for a specific +prefix will show the link bandwidth extended community also, if present. + +An example of an IPv4-unicast route received with the link bandwidth +attribute from two peers is shown below: + +.. code-block:: frr + + CLI# show bgp ipv4 unicast 192.168.10.1/32 + BGP routing table entry for 192.168.10.1/32 + Paths: (2 available, best #2, table default) + Advertised to non peer-group peers: + l1(swp1) l2(swp2) l3(swp3) l4(swp4) + 65002 + fe80::202:ff:fe00:1b from l2(swp2) (110.0.0.2) + (fe80::202:ff:fe00:1b) (used) + Origin IGP, metric 0, valid, external, multipath, bestpath-from-AS 65002 + Extended Community: LB:65002:125000000 (1000.000 Mbps) + Last update: Thu Feb 20 18:34:16 2020 + + 65001 + fe80::202:ff:fe00:15 from l1(swp1) (110.0.0.1) + (fe80::202:ff:fe00:15) (used) + Origin IGP, metric 0, valid, external, multipath, bestpath-from-AS 65001, best (Older Path) + Extended Community: LB:65001:62500000 (500.000 Mbps) + Last update: Thu Feb 20 18:22:34 2020 + +The weights associated with the next hops of a route can be seen by querying +the RIB for a specific route. + +For example, the next hop weights corresponding to the link bandwidths in the +above example is illustrated below: + +.. code-block:: frr + + spine1# show ip route 192.168.10.1/32 + Routing entry for 192.168.10.1/32 + Known via "bgp", distance 20, metric 0, best + Last update 00:00:32 ago + * fe80::202:ff:fe00:1b, via swp2, weight 66 + * fe80::202:ff:fe00:15, via swp1, weight 33 + +For troubleshooting, existing debug logs ``debug bgp updates``, +``debug bgp bestpath <prefix>``, ``debug bgp zebra`` and +``debug zebra kernel`` can be used. + +A debug log snippet when ``debug bgp zebra`` is enabled and a route is +installed by BGP in the RIB with next hop weights is shown below: + +.. code-block:: frr + + 2020-02-29T06:26:19.927754+00:00 leaf1 bgpd[5459]: bgp_zebra_announce: p=192.168.150.1/32, bgp_is_valid_label: 0 + 2020-02-29T06:26:19.928096+00:00 leaf1 bgpd[5459]: Tx route add VRF 33 192.168.150.1/32 metric 0 tag 0 count 2 + 2020-02-29T06:26:19.928289+00:00 leaf1 bgpd[5459]: nhop [1]: 110.0.0.6 if 35 VRF 33 wt 50 RMAC 0a:11:2f:7d:35:20 + 2020-02-29T06:26:19.928479+00:00 leaf1 bgpd[5459]: nhop [2]: 110.0.0.5 if 35 VRF 33 wt 50 RMAC 32:1e:32:a3:6c:bf + 2020-02-29T06:26:19.928668+00:00 leaf1 bgpd[5459]: bgp_zebra_announce: 192.168.150.1/32: announcing to zebra (recursion NOT set) + + +References +---------- + +.. [Draft-IETF-idr-link-bandwidth] <https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth> +.. [Draft-IETF-mohanty-bess-ebgp-dmz] <https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz> + diff --git a/doc/user/zebra.rst b/doc/user/zebra.rst index 520080e83a..3629b47877 100644 --- a/doc/user/zebra.rst +++ b/doc/user/zebra.rst @@ -736,43 +736,30 @@ these cases, the FIB needs to be maintained reliably in the fast path as well. We refer to the component that programs the forwarding plane (directly or indirectly) as the Forwarding Plane Manager or FPM. -The FIB push interface comprises of a TCP connection between zebra and -the FPM. The connection is initiated by zebra -- that is, the FPM acts -as the TCP server. - .. program:: configure The relevant zebra code kicks in when zebra is configured with the -:option:`--enable-fpm` flag. Zebra periodically attempts to connect to -the well-known FPM port. Once the connection is up, zebra starts -sending messages containing routes over the socket to the FPM. Zebra -sends a complete copy of the forwarding table to the FPM, including -routes that it may have picked up from the kernel. The existing -interaction of zebra with the kernel remains unchanged -- that is, the -kernel continues to receive FIB updates as before. - -The encapsulation header for the messages exchanged with the FPM is -defined by the file :file:`fpm/fpm.h` in the frr tree. The routes -themselves are encoded in Netlink or protobuf format, with Netlink -being the default. - -Protobuf is one of a number of new serialization formats wherein the -message schema is expressed in a purpose-built language. Code for -encoding/decoding to/from the wire format is generated from the -schema. Protobuf messages can be extended easily while maintaining -backward-compatibility with older code. Protobuf has the following -advantages over Netlink: - -- Code for serialization/deserialization is generated automatically. This - reduces the likelihood of bugs, allows third-party programs to be integrated - quickly, and makes it easy to add fields. -- The message format is not tied to an OS (Linux), and can be evolved - independently. - -As mentioned before, zebra encodes routes sent to the FPM in Netlink -format by default. The format can be controlled via the FPM module's -load-time option to zebra, which currently takes the values `Netlink` -and `protobuf`. +:option:`--enable-fpm` flag and started with the module (``-M fpm`` +or ``-M dplane_fpm_nl``). + +.. note:: + + The ``fpm`` implementation attempts to connect to ``127.0.0.1`` port ``2620`` + by default without configurations. The ``dplane_fpm_nl`` only attempts to + connect to a server if configured. + +Zebra periodically attempts to connect to the well-known FPM port (``2620``). +Once the connection is up, zebra starts sending messages containing routes +over the socket to the FPM. Zebra sends a complete copy of the forwarding +table to the FPM, including routes that it may have picked up from the kernel. +The existing interaction of zebra with the kernel remains unchanged -- that +is, the kernel continues to receive FIB updates as before. + +The default FPM message format is netlink, however it can be controlled +with the module load-time option. The modules accept the following options: + +- ``fpm``: ``netlink`` and ``protobuf``. +- ``dplane_fpm_nl``: none, it only implements netlink. The zebra FPM interface uses replace semantics. That is, if a 'route add' message for a prefix is followed by another 'route add' message, @@ -782,6 +769,119 @@ replaces the information sent in the first message. If the connection to the FPM goes down for some reason, zebra sends the FPM a complete copy of the forwarding table(s) when it reconnects. +For more details on the implementation, please read the developer's manual FPM +section. + +FPM Commands +============ + +``fpm`` implementation +---------------------- + +.. index:: fpm connection ip A.B.C.D port (1-65535) +.. clicmd:: fpm connection ip A.B.C.D port (1-65535) + + Configure ``zebra`` to connect to a different FPM server than + ``127.0.0.1`` port ``2620``. + + +.. index:: no fpm connection ip A.B.C.D port (1-65535) +.. clicmd:: no fpm connection ip A.B.C.D port (1-65535) + + Configure ``zebra`` to connect to the default FPM server at ``127.0.0.1`` + port ``2620``. + + +.. index:: show zebra fpm stats +.. clicmd:: show zebra fpm stats + + Shows the FPM statistics. + + Sample output: + + :: + + Counter Total Last 10 secs + + connect_calls 3 2 + connect_no_sock 0 0 + read_cb_calls 2 2 + write_cb_calls 2 0 + write_calls 1 0 + partial_writes 0 0 + max_writes_hit 0 0 + t_write_yields 0 0 + nop_deletes_skipped 6 0 + route_adds 5 0 + route_dels 0 0 + updates_triggered 11 0 + redundant_triggers 0 0 + dests_del_after_update 0 0 + t_conn_down_starts 0 0 + t_conn_down_dests_processed 0 0 + t_conn_down_yields 0 0 + t_conn_down_finishes 0 0 + t_conn_up_starts 1 0 + t_conn_up_dests_processed 11 0 + t_conn_up_yields 0 0 + t_conn_up_aborts 0 0 + t_conn_up_finishes 1 0 + + +.. index:: clear zebra fpm stats +.. clicmd:: clear zebra fpm stats + + Resets all FPM counters. + + +``dplane_fpm_nl`` implementation +-------------------------------- + +.. index:: fpm address <A.B.C.D|X:X::X:X> [port (1-65535)] +.. clicmd:: fpm address <A.B.C.D|X:X::X:X> [port (1-65535)] + + Configures the FPM server address. Once configured ``zebra`` will attempt + to connect to it immediately. + + +.. index:: no fpm address [<A.B.C.D|X:X::X:X> [port (1-65535)]] +.. clicmd:: no fpm address [<A.B.C.D|X:X::X:X> [port (1-65535)]] + + Disables FPM entirely. ``zebra`` will close any current connections and + will not attempt to connect to it anymore. + + +.. index:: show fpm counters [json] +.. clicmd:: show fpm counters [json] + + Show the FPM statistics (plain text or JSON formatted). + + Sample output: + + :: + + FPM counters + ============ + Input bytes: 0 + Output bytes: 308 + Output buffer current size: 0 + Output buffer peak size: 308 + Connection closes: 0 + Connection errors: 0 + Data plane items processed: 0 + Data plane items enqueued: 0 + Data plane items queue peak: 0 + Buffer full hits: 0 + User FPM configurations: 1 + User FPM disable requests: 0 + + +.. index:: clear fpm counters +.. clicmd:: clear fpm counters + + Resets all FPM counters. + + .. _zebra-dplane: Dataplane Commands |
