2. Crash course on parsers 语法分析器速成课程

标签：Term Crash tree Num parse course parsers 123 解析

If you've never worked with a parser generator before, or aren't really familiar with context-free grammars, this section is just a very brief introduction into the basic idea. Basically a grammar is a nice way of writing out what kinds of inputs are legal. In our example, we want to support parenthesized numbers, so things like 123, (123), etc. We can express this with a simple grammar like:

GPT

如果你从未使用过语法分析器生成器，或者不太熟悉上下文无关文法，这一节将简要介绍其基本概念。基本上，文法是一种清晰的方式，用来描述哪些输入是合法的。在我们的例子中，我们希望支持带括号的数字，像 123、(123) 等等。我们可以通过一个简单的文法来表达这个需求：

MS

如果您以前从未使用过解析器生成器，或者不太熟悉上下文无关的语法，那么本节只是对基本概念的非常简短的介绍。基本上，语法是写出哪些类型的输入是合法的的好方法。在我们的示例中，我们希望支持带括号的数字，例如 123、（123）等。我们可以用简单的语法来表达这一点，例如：

Term = Num | "(" Term ")"

Here we say we are trying to parse a term, and a term can either be a number (Num) or some other term enclosing in parentheses (here I did not define what a number is, but in the real LALRPOP example we'll do that with a regular expression). Now imagine a potential input like ((123)). We can show how this would be parsed by writing out something called a "parse tree":

GPT

在这里，我们说我们要解析一个术语（Term），而一个术语可以是一个数字（Num），也可以是另一个用括号括起来的术语（Term）。这里我并没有定义数字是什么，但在实际的 LALRPOP 示例中，我们会用正则表达式来定义数字。现在，假设有一个潜在的输入 ((123))，我们可以通过编写一个叫做“解析树”的东西来展示它是如何被解析的：

MS

在这里，我们说我们正在尝试解析一个术语，一个术语可以是一个数字（Num）或括在括号中的其他一些术语（这里我没有定义什么是数字，但在真正的 LALRPOP 示例中，我们将使用正则表达式来定义）。现在想象一个像（（123））这样的潜在输入。我们可以通过写出一个叫做 “parse tree” 的东西来展示如何解析它：

(  (  1  2  3  )  )
|  |  |     |  |  |
|  |  +-Num-+  |  |
|  |     |     |  |
|  |   Term    |  |
|  |     |     |  |
|  +---Term----+  |
|        |        |
+------Term-------+

Here you can see that we parsed ((123)) by finding a Num in the middle, calling that Num a Term, and matching up the parentheses to form two more terms on top of that.

GPT

在这里，你可以看到我们是通过在中间找到一个 Num，把它作为一个 Term，然后匹配括号来将它解析成两个更高层次的术语（Term）从而解析出 ((123)) 的。

MS

在这里，你可以看到我们解析了（（123）），在中间找到一个 Num，称该 Num 为 Term，并匹配括号以在此基础上形成另外两个 Term。

Note that this parse tree is not a data structure but more a visualization of the parse. I mean, you can build up a parse tree as a data structure, but typically you don't want to: it is more detailed than you need. For example, you may not be that interested in the no-op conversion from a Num to a Term. The other weird thing about a parse tree is that it is intimately tied to your grammar, but often you have some existing data structures you would like to parse into -- so if you built up a parse tree, you'd then have to convert from the parse tree into those data structures, and that might be annoying.

GPT

请注意，这棵解析树并不是一个数据结构，而更像是解析过程的可视化。我的意思是，你可以将解析树构建为一个数据结构，但通常你并不想这样做，因为它包含的细节可能超出你所需的。例如，你可能并不关心将一个 Num 转换为 Term 的无操作过程。解析树的另一个奇怪之处在于，它与文法紧密相关，但通常你会有一些现有的数据结构，希望将输入解析成这些数据结构——所以，如果你构建了一个解析树，你就得从解析树转换到那些数据结构，而这可能会很麻烦。

MS

请注意，此 parse tree 不是数据结构，而更像是 parse 的可视化。我的意思是，你可以将解析树构建为数据结构，但通常你不想这样做：它比你需要的更详细。例如，您可能对从 Num 到 Term 的无操作转换不太感兴趣。解析树的另一个奇怪之处在于它与你的语法密切相关，但通常你有一些想要解析的现有数据结构——所以如果你构建了一个解析树，那么你就必须从解析树转换为这些数据结构，这可能会很烦人。

Therefore, what a parser generator usually does, is instead let you choose how to represent each node in the parse tree, and how to do the conversions. You give each nonterminal a type, which can be any Rust type, and you write code that will execute each time a new node in the parse tree would have been constructed. In fact, in the examples that follow, we'll eventually build up something like a parse tree, but in the beginning, we won't do that at all. Instead, we'll represent each number and term as an i32, and we'll propagate this value around.

GPT

因此，语法分析器生成器通常会让你选择如何表示解析树中的每个节点，以及如何进行转换。你为每个非终结符指定一个类型，这个类型可以是任何 Rust 类型，并且你会编写代码，每当解析树中构建一个新节点时，这段代码就会执行。事实上，在接下来的示例中，我们最终会构建类似解析树的东西，但一开始我们不会这么做。相反，我们会将每个数字和术语表示为一个 i32 类型，并在解析过程中传播这个值。

MS

因此，解析器生成器通常所做的是让您选择如何表示解析树中的每个节点，以及如何进行转换。您为每个非终端指定一个类型，该类型可以是任何 Rust 类型，并且您编写的代码将在每次构建解析树中的新节点时执行。事实上，在下面的例子中，我们最终会构建一个类似解析树的东西，但在开始时，我们根本不会这样做。相反，我们将每个数字和术语表示为 i32，并将该值传播出去。

To make this a bit more concrete, here's a version of the grammar above written in LALRPOP notation (we'll revisit this again in more detail of course). You can see that the Term nonterminal has been given the type i32, and that each of the definitions has some code that follows a => symbol. This is the code that will execute to convert from the thing that was matched (like a number, or a parenthesized term) into an i32:

GPT

为了让这个概念更具体一点，下面是上述文法的 LALRPOP 版本（我们会在后续更详细地讨论）。你可以看到，Term 非终结符已经被指定为 i32 类型，并且每个定义后面都有一些跟随 => 符号的代码。这些代码将在匹配到某个内容（例如数字或括号中的术语）后执行，用来将匹配到的内容转换为 i32：

MS

为了更具体地说明这一点，下面是上面用 LALRPOP 表示法编写的语法版本（当然，我们将再次更详细地重新讨论这一点）。您可以看到 Term non terminal 已被赋予类型 i32，并且每个定义都有一些跟在 => 符号后面的代码。这是将执行的代码，用于将匹配的事物（如数字或带括号的术语）转换为 i32：

Term: i32 = {
    Num => /* ... number code ... */,
    "(" Term ")" => /* ... parenthesized code ... */,
};

OK, that's enough background, let's do this for real!

GPT

好的，背景知识讲解完了，接下来我们正式开始吧！

MS

好了，背景就够了，让我们真的来做吧！

标签：Term,Crash,tree,Num,parse,course,parsers,123,解析
From： https://www.cnblogs.com/Tifahfyf/p/18639349

2. Crash course on parsers 语法分析器速成课程

相关文章

赞助商

阅读排行