From e53d58537c6a42a4c80ba9561d6c081b25a182cd Mon Sep 17 00:00:00 2001 From: Quentin Young Date: Tue, 13 Feb 2018 18:09:58 -0500 Subject: [PATCH] doc: document CLI BNF grammar, add DFA figures Technical details on CLI implementation. Signed-off-by: Quentin Young --- doc/developer/cli.rst | 161 ++++++++++++++++++----------- doc/figures/cligraph.svg | 211 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 311 insertions(+), 61 deletions(-) create mode 100644 doc/figures/cligraph.svg diff --git a/doc/developer/cli.rst b/doc/developer/cli.rst index 2ecb70e932..cff3d21ef7 100644 --- a/doc/developer/cli.rst +++ b/doc/developer/cli.rst @@ -66,10 +66,58 @@ parser is implemented in Bison and the lexer in Flex. These may be found in to look. Bison is very stable and if it detects a syntax error, 99% of the time it will be a syntax error in your definition. +The formal grammar in BNF is given below. This is the grammar implemented in +the Bison parser. At runtime, the Bison parser reads all of the CLI strings and +builds a combined directed graph that is used to match and interpret user +input. + +Human-friendly explanations of how to use this grammar are given a bit later in +this section alongside information on the :ref:`cli-data-structures` constructed +by the parser. + +.. productionlist:: + command: `cmd_token_seq` + : `cmd_token_seq` `placeholder_token` "..." + cmd_token_seq: *empty* + : `cmd_token_seq` `cmd_token` + cmd_token: `simple_token` + : `selector` + simple_token: `literal_token` + : `placeholder_token` + literal_token: WORD `varname_token` + varname_token: "$" WORD + placeholder_token: `placeholder_token_real` `varname_token` + placeholder_token_real: IPV4 + : IPV4_PREFIX + : IPV6 + : IPV6_PREFIX + : VARIABLE + : RANGE + : MAC + : MAC_PREFIX + selector: "<" `selector_seq_seq` ">" `varname_token` + : "{" `selector_seq_seq` "}" `varname_token` + : "[" `selector_seq_seq` "]" `varname_token` + selector_seq_seq: `selector_seq_seq` "|" `selector_token_seq` + : `selector_token_seq` + selector_token_seq: `selector_token_seq` `selector_token` + : `selector_token` + selector_token: `selector` + : `simple_token` + Tokens ~~~~~~ - -Each element in a command definition is assigned a type by the parser based on a set of regular expression rules. +The various capitalized tokens in the BNF above are in fact themselves +placeholders, but not defined as such in the formal grammar; the grammar +provides the structure, and the tokens are actually more like a type system for +the strings you write in your CLI definitions. A CLI definition string is +broken apart and each piece is assigned a type by the lexer based on a set of +regular expressions. The parser uses the type information to verify the string +and determine the structure of the CLI graph; additional metadata (such as the +raw text of each token) is encoded into the graph as it is constructed by the +parser, but this is merely a dumb copy job. + +Here is a brief summary of the various token types along with examples. +-----------------+-----------------+-------------------------------------------------------------+ | Token type | Syntax | Description | @@ -382,6 +430,8 @@ In the examples below, each arrowed token needs a doc string. "command [example]" ^ ^ ^ ^ +.. _cli-data-structures: + Data Structures --------------- @@ -401,76 +451,65 @@ self-contained 'subgraphs'. Each subgraph is a tree except that all of the 'leaves' actually share a child node. This helps with minimizing graph size and debugging. -As an example, the subgraph generated by looks like this: +As a working example, here is the graph of the following command: :: + + show [ip] bgp neighbors [] [json] + +.. figure:: ../figures/cligraph.svg + :align: center + + Graph of example CLI command -:: - . - . - | - +----+---+ - +--- -+ FORK +----+ - | +--------+ | - +--v---+ +--v---+ - | foo | | bar | - +--+---+ +--+---+ - | +------+ | - +------> JOIN <-----+ - +---+--+ - | - . - . - -FORK and JOIN nodes are plumbing nodes that don't correspond to user +``FORK`` and ``JOIN`` nodes are plumbing nodes that don't correspond to user input. They're necessary in order to deduplicate these constructs where applicable. -Options follow the same form, except that there is an edge from the FORK -node to the JOIN node. +Options follow the same form, except that there is an edge from the ``FORK`` +node to the ``JOIN`` node. Since all of the subgraphs in the example command +are optional, all of them have this edge. -Keywords follow the same form, except that there is an edge from JOIN to -FORK. Because of this the CLI graph cannot be called acyclic. There is -special logic in the input matching code that keeps a stack of paths -already taken through the node in order to disallow following the same -path more than once. +Keywords follow the same form, except that there is an edge from ``JOIN`` to +``FORK``. Because of this the CLI graph cannot be called acyclic. There is +special logic in the input matching code that keeps a stack of paths already +taken through the node in order to disallow following the same path more than +once. -Variadics are a bit special; they have an edge back to themselves, which -allows repeating the same input indefinitely. +Variadics are a bit special; they have an edge back to themselves, which allows +repeating the same input indefinitely. -The leaves of the graph are nodes that have no out edges. These nodes -are special; their data section does not contain a token, as most nodes -do, or NULL, as in FORK/JOIN nodes, but instead has a pointer to a +The leaves of the graph are nodes that have no out edges. These nodes are +special; their data section does not contain a token, as most nodes do, or +NULL, as in ``FORK``/``JOIN`` nodes, but instead has a pointer to a cmd\_element. All paths through the graph that terminate on a leaf are guaranteed to be defined by that command. When a user enters a complete -command, the command matcher tokenizes the input and executes a DFS on -the CLI graph. If it is simultaneously able to exhaust all input (one -input token per graph node), and then find exactly one leaf connected to -the last node it reaches, then the input has matched the corresponding -command and the command is executed. If it finds more than one node, -then the command is ambiguous (more on this in deduplication). If it -cannot exhaust all input, the command is unknown. If it exhausts all -input but does not find an edge node, the command is incomplete. - -The parser uses an incremental strategy to build the CLI graph for a -node. Each command is parsed into its own graph, and then this graph is -merged into the overall graph. During this merge step, the parser makes -a best-effort attempt to remove duplicate nodes. If it finds a node in -the overall graph that is equal to a node in the corresponding position -in the command graph, it will intelligently merge the properties from -the node in the command graph into the already-existing node. Subgraphs -are also checked for isomorphism and merged where possible. The -definition of whether two nodes are 'equal' is based on the equality of -some set of token properties; read the parser source for the most +command, the command matcher tokenizes the input and executes a DFS on the CLI +graph. If it is simultaneously able to exhaust all input (one input token per +graph node), and then find exactly one leaf connected to the last node it +reaches, then the input has matched the corresponding command and the command +is executed. If it finds more than one node, then the command is ambiguous +(more on this in deduplication). If it cannot exhaust all input, the command is +unknown. If it exhausts all input but does not find an edge node, the command +is incomplete. + +The parser uses an incremental strategy to build the CLI graph for a node. Each +command is parsed into its own graph, and then this graph is merged into the +overall graph. During this merge step, the parser makes a best-effort attempt +to remove duplicate nodes. If it finds a node in the overall graph that is +equal to a node in the corresponding position in the command graph, it will +intelligently merge the properties from the node in the command graph into the +already-existing node. Subgraphs are also checked for isomorphism and merged +where possible. The definition of whether two nodes are 'equal' is based on the +equality of some set of token properties; read the parser source for the most up-to-date definition of equality. -When the parser is unable to deduplicate some complicated constructs, -this can result in two identical paths through separate parts of the -graph. If this occurs and the user enters input that matches these -paths, they will receive an 'ambiguous command' error and will be unable -to execute the command. Most of the time the parser can detect and warn -about duplicate commands, but it will not always be able to do this. -Hence care should be taken before defining a new command to ensure it is -not defined elsewhere. +When the parser is unable to deduplicate some complicated constructs, this can +result in two identical paths through separate parts of the graph. If this +occurs and the user enters input that matches these paths, they will receive an +'ambiguous command' error and will be unable to execute the command. Most of +the time the parser can detect and warn about duplicate commands, but it will +not always be able to do this. Hence care should be taken before defining a +new command to ensure it is not defined elsewhere. Command handlers ---------------- @@ -481,7 +520,7 @@ this: :: - int (*func) (const struct cmd_element *, struct vty *, int, struct cmd_token *[]); + int (*func) (const struct cmd_element *, struct vty *, int, struct cmd_token *[]); The first argument is the command definition struct. The last argument is an ordered array of tokens that correspond to the path taken through diff --git a/doc/figures/cligraph.svg b/doc/figures/cligraph.svg new file mode 100644 index 0000000000..a1dd01702b --- /dev/null +++ b/doc/figures/cligraph.svg @@ -0,0 +1,211 @@ + + + + + + +%3 + + +n0xd46960 + +START_TKN + + +n0xd46be0 + +WORD_TKN +" +show +" + + +n0xd46960->n0xd46be0 + + + + +n0xd47f80 + +FORK_TKN + + +n0xd46be0->n0xd47f80 + + + + +n0xd47c70 + +WORD_TKN +" +ip +" + + +n0xd47f80->n0xd47c70 + + + + +n0xd484c0 + +JOIN_TKN + + +n0xd47f80->n0xd484c0 + + + + +n0xd47c70->n0xd484c0 + + + + +n0xd47ca0 + +WORD_TKN +" +bgp +" + + +n0xd484c0->n0xd47ca0 + + + + +n0xd48540 + +WORD_TKN +" +neighbors +" + + +n0xd47ca0->n0xd48540 + + + + +n0xd490c0 + +FORK_TKN + + +n0xd48540->n0xd490c0 + + + + +n0xd48fc0 + +IPV4_TKN +A.B.C.D + + +n0xd490c0->n0xd48fc0 + + + + +n0xd491e0 + +JOIN_TKN + + +n0xd490c0->n0xd491e0 + + + + +n0xd49340 + +IPV6_TKN +X:X::X:X + + +n0xd490c0->n0xd49340 + + + + +n0xd49480 + +VARIABLE_TKN +WORD + + +n0xd490c0->n0xd49480 + + + + +n0xd48fc0->n0xd491e0 + + + + +n0xd496e0 + +FORK_TKN + + +n0xd491e0->n0xd496e0 + + + + +n0xd495e0 + +WORD_TKN +" +json +" + + +n0xd496e0->n0xd495e0 + + + + +n0xd497c0 + +JOIN_TKN + + +n0xd496e0->n0xd497c0 + + + + +n0xd495e0->n0xd497c0 + + + + +end0xd49900 + +end + + +n0xd497c0->end0xd49900 + + + + +n0xd49340->n0xd491e0 + + + + +n0xd49480->n0xd491e0 + + + + + -- 2.39.5