パート I: DSL を構築するための式インタープリターの実装

ホームページ

バックエンド開発

Golang

パート I: DSL を構築するための式インタープリターの実装 - PEG パーサーの紹介

PHPz

Aug 05, 2024 pm 08:40 PM

Part I: Implement an expression interpreter for building DSL - Introduce the PEG parser

前回の投稿では、このプロジェクトを開始した理由と方法、そして最終的に DSL がどのようになるかを紹介しました。この投稿から、プロジェクト全体の実装を共有していきます。

一般に、言語を実装するとき、最初に頭に浮かぶのはレクサーであり、次にパーサーです。そこでこの投稿では、あまり混乱しないように、詳細を指定して概念を少なくして DSL を実装する方法を紹介します。

レクサーって何？

一般に、レクサーは字句解析またはトークン化に使用されます。 「We will Rock you!」 (クイーンの有名なロックンロールミュージック) を例に考えてみましょう。英語の文法規則に従ってトークン化した後、リスト [主語("We")、補助語("will")、動詞("rock")、目的語("you")、句読点("!") を出力します。 )]。したがって、レクサーは主に、テキストをその語彙的な意味によって特定のタイプに分類するために使用されます。 構文は実際には文字や単語ではなく語彙要素で構成されているため、これは私たちにとって重要です。

通常、Ragel、nex などの Regular Express を解析できるコードジェネレーターを使用してレクサーを実装します。しかし、Rob Pike の Go での Lexical Scanning を確認すると、レクサーの実装がいかに簡単かに驚かれると思います。彼は、レクサーの中核であると思われる有限状態マシンを実装するための興味深いパターンを導入していました。

パーサーはどうですか？

では、パーサーはどうでしょうか?何に使われますか?基本的に、パーサーは、文法とも呼ばれる指定されたパターンを持つ語彙要素のリストを識別するために使用されます。 前に紹介した 「We willrock you!」 を例に挙げてみましょう。「未来時制」のパターンに一致する [主語("We")、補助語("will")、動詞("rock")、目的語("you")、句読点("!")] のシーケンスが生成されます。英語の文法。したがって、これはまさにパーサーが行うことであり、いわゆる「構文分析」と呼ばれるものです。

よりコンピューター的な方法で別の例を考えてみましょう。1 + 2 * 3 のような式はどうでしょうか?レクサーによってこれらが [Number(1), Operator(+), Number(2), Operator(*), Number(3)] に変換されることは明らかですが、このシーケンスがパーサーによってどのように変換されるかは明らかです。一般的な数式文法を使用していますか?一般に、字句要素シーケンスは、次のようなパーサーによって 抽象構文ツリー (または略して AST) に変換されます。

ログイン後にコピー

抽象構文ツリーを使用すると、文法の規則に従って正しい順序で構文を分析できます。

パーサーを実装してみよう

次に、パーサーを独自に実装します。完全に私たちだけでというわけではありませんが、パーサーのコードを生成するのに役立つツールがまだ必要です。それを正しく実装するのは難しく、手書きのパーサーは保守が難しい可能性があるためです。たとえ実装したとしても、パフォーマンスが低下する可能性があります。貧しい。

幸いなことに、文法定義ファイルを使用して golang コードを生成するのに役立つ、成熟したパーサー生成ツールが多数あります。私の頭に浮かんだ最初の選択肢は、パーサー用の Go コードを生成する公式ツールである go-yacc です。以前は SQL アナライザーを作成するためにこれを使用していましたが、次のような理由から快適ではありませんでした。

には外部レクサーが必要です。
ドキュメントが不足しています。
共用体定義と値型宣言はどちらも非常に混乱します。

「何か新しいことに挑戦してみませんか？」素晴らしいツールペグを使用して、レクサーとパーサーの両方を 1 つの文法ファイル、1 つのインターフェイスに実装することができました。ペグを詳しく見てみましょう。

PEGを詳しく見てみる

PEG は、2004 年にブライアンフォードによって導入された Parsing Expression Grammar の略で、プログラミング言語の構文とプロトコルの記述と表現に使用される従来の Context Free Grammar に代わるものです。

過去数十年間、私たちは CFG をサポートする yacc や bison などのパーサー生成ツールを使用してパーサーコードを生成してきました。以前にこれらを使用したことがある場合は、あいまいさを避けてレクサーやレクサーと統合するのが難しいと感じるかもしれません。正規表現。実際、プログラミング言語の構文は字句要素のパターンだけでなく、何らかの理由で CFG が欠落している字句要素のルールでもあるため、yacc のようなツールを使用する場合 自分でレクサーを実装する必要があります。さらに、CFG では曖昧さ (プラスと乗算の間の優先順位など、これを確認してください) を避けるために、各演算子の優先順位を定義する必要があります。これらすべての重要な事実により、パーサーの開発が不必要に困難になります。

But thanks to Bryan Ford, now we have another good choice, the PEG that allows us to define the lexical and syntax rule all in one single file with a tight DSL and resolve ambiguity in an elegant and simple way. Let me show you how easy it can be done with peg.

Example and code come first

I gonna take examples from my gendsl library which implements a lisp-like syntax(you can check it out here). Here is a simple snippet that can parse hex and decimal number literals in the golang style:

package playground

type parser Peg {
}

Script          <- Value EOF

EOF             <- !.

Value           <- IntegerLiteral

IntegerLiteral  <- [+\-]? ('0' ('x' / 'X') HexNumeral 
                           / DecimalNumeral ) [uU]?

HexNumeral      <- HexDigit ([_]* HexDigit)* / '0'

HexDigit        <- [0-9] / [A-F] / [a-f]

DecimalNumeral  <- [1-9] ([_]* [0-9])* / '0'     

# ...

ログイン後にコピー

The first line package gendsl is package declaration which decides which package the generated golang file belongs to. The following type declaration type parser Peg {} used to define the parser type, which we will use it later for evaluation but you can ignore it for now.

After the parser type declaration we can start to define your syntax rule till the end. This is different from the workflow I used to work with with yacc when I have to define a union type and a lot of token types before I can actually define my grammar, which could be really confusing. Anyway, let's take a quick look at the grammar definition.

The very first rule

If you have worked with CFG before you might find the definition DSL syntax quite familiar. The right hand side of the '<-' refers to the pattern of lexical elements, which could be some other patterns or character sequence, and the left hand side is the name of the pattern. Pretty easy, right?

Let's pay attention to the first pattern rule here since the first rule is always to entry point of the parser. The entry point Script here is consist of two parts, one is a rule refers to Value which is consist of a sequence of specified characters(we will get back to this later), the other one EOF is kind of interesting. Let's jump to the next rule to find the pattern of EOF. As you can see, EOF is consist of !.. What does !. mean? The !actually means NOT, and . means any character, so !. means NOTHING AT ALL or End Of File if you will. As a result whenever the parser find there is no character to read, it will stop here and treat it as an dummy rule call EOF which might produces the rule Script. This is quite a common pattern for PEG.

More about the rule syntax

So much like the regular expression(RE), the syntax of PEG is simple:

. stands for any character.
character set like [a-z] is also supported.
X matches one character if X is a character or a pattern when X is the name of an rule.
X? matches one character or pattern or none, while X* matches zero or more and 'X+' matches one or more.
X \ Y matches X or Y where X and Y could be characters, patterns or rule name.

Take the rule of DecimalNumeral as an example. The first part [1-9] means the start of an DecimalNumeral must be one of a digit ranging from 1 to 9, ([_]* [0-9])* means starting from the second position, all character, if there is any, must all be digit(0-9) that has might have no '_' or more than one '_' as its prefix so it could match string like "10_2_3". Otherwise, indicated by the operator \, it could also just be one single character '0' which means 0 obviously .

Resolving ambiguity

I'd like to spend more time to explain the or operator \, since it is quite important as the solution to the ambiguity. The PEG will always try to match the first pattern and then the second, the third and so on til it finds one matched, which is considered as earliest-match-first. For example, a string "ab" will never be able to match the grammar G <- 'a' / 'a' 'b', since the first character 'a' will be reduced to G but the 'b' left cannot match anything. By the way, CFG doesn't allow such a rule and will throw the reduce/shift conflict error.

There is no much syntax left, you can explore them yourself in the pointlander/peg README or peg doc.

Let's have a try

Now that we already have a simple syntax rule prepared above, though it is not the whole grammar for the gendsl project but it can still parse some numbers. Anyway let's generate some code and see if it works as we expect.

Preparation

First we have to install the peg binary tool for code generate following this guide, then we gonna setup our workspace directory for playing:

> mkdir peg_playground && peg_playground 
> go mod init peg_playground 
> touch grammar.peg

ログイン後にコピー

Paste the grammar we have before into the peg_playground/grammar.peg, now we should be able to genreate the code using the generate tool but why not make a Makefile in peg_playground/makefile

GO := go

.SUFFIXES: .peg .go

grammar.go: grammar.peg
    peg -switch -inline -strict -output ./$@ $<

all: grammar.go

clean:
    rm grammar.go

ログイン後にコピー

Generate and test the parser

Now that we have everything ready, let's generate the code for parser:

make grammar.go

ログイン後にコピー

After running the command, you should see a generated grammar.go in the workspace directory. Let's write a function to parse a string and access our parser:

// peg_playground/parser.go
package playground

func PrintAST(script string) error {
    parser := &parser{
        Buffer: script,
        Pretty: true,
    }

    if err := parser.Init(); err != nil {
        return err
    }
    if err := parser.Parse(); err != nil {
        return err
    }

    parser.PrintSyntaxTree()
    return nil
}

ログイン後にコピー

The snippet here is pretty simple, it initializes the parser, parses the script we pass to it and print the syntax tree in final. Let's write an unit test to see if it works:

// peg_playground/parser_test.go
package playground

import (
    "testing"
)

func TestPrintTree(t *testing.T) {
    if err := PrintAST(`0x123`); err != nil {
        t.Fatal(err)
    }
    t.Log("-----------------------------------------------------")

    if err := PrintAST(`10_2_3`); err != nil {
        t.Fatal(err)
    }
    t.Log("-----------------------------------------------------")
}

ログイン後にコピー

The test function TestPrintTree calls the PrintAST and check the error. Let's run it now and see what it gonna print:

go test . -v

ログイン後にコピー

Now we should see the whole syntax tree in the output:

=== RUN   TestPrintTree
Script "0x123"
 Value "0x123"
  IntegerLiteral "0x123"
   HexNumeral "123"
    HexDigit "1"
    HexDigit "2"
    HexDigit "3"
    parser_test.go:11: -----------------------------------------------------
Script "10_2_3"
 Value "10_2_3"
  IntegerLiteral "10_2_3"
   DecimalNumeral "10_2_3"
    parser_test.go:16: -----------------------------------------------------
--- PASS: TestPrintTree (0.00s)
PASS
ok      playground      0.649s

ログイン後にコピー

It looks great, right? Everything works as we expected. No syntax error thrown and it prints every rule matched and the string it matches in a format of tree, which could be really useful when debugging.

Wrap it up

In this post, I have introduced you the two basic but significant parts of interpreter programming language:

Lexer, for lexical analysis that transforms a string into a sequence of lexical elements.
Parser, for syntax analysis that identify the the pattern (so called grammar) in the lexical elements and produces a syntax tree.

And then I introduce the PEG for parser code generating, and address its advantages comparing the traditional CFG:

Lexer rule integrated, no standalone lexer need to be implemented.
Simple, regular expression like syntax to start up fast.
No ambiguity, no reduce/shift conflict, always earlier-match-first.

Finally I have a tiny demonstration of how to generate parser with PEG, which is the basis of our interpreter.
In next post, I will walk you through the gendsl grammar in detail.
Thank you for checking this post, hope you enjoy it.

以上がパート I: DSL を構築するための式インタープリターの実装 - PEG パーサーの紹介の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

このウェブサイトの声明

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

ホットAIツール

Undresser.AI Undress

リアルなヌード写真を作成する AI 搭載アプリ

AI Clothes Remover

写真から衣服を削除するオンライン AI ツール。

Undress AI Tool

脱衣画像を無料で

Clothoff.io

AI衣類リムーバー

Video Face Swap

完全無料の AI 顔交換ツールを使用して、あらゆるビデオの顔を簡単に交換できます。

ホットツール

メモ帳++7.3.1

使いやすく無料のコードエディター

SublimeText3 中国語版

中国語版、とても使いやすい

ゼンドスタジオ 13.0.1

強力な PHP 統合開発環境

ドリームウィーバー CS6

ビジュアル Web 開発ツール

SublimeText3 Mac版

神レベルのコード編集ソフト（SublimeText3）

ホットトピック

Java チュートリアル

1673

CakePHP チュートリアル

1429

Laravel チュートリアル

1333

PHP チュートリアル

1278

C# チュートリアル

1257

Related knowledge

Golang vs. Python：パフォーマンスとスケーラビリティ Apr 19, 2025 am 12:18 AM

Golangは、パフォーマンスとスケーラビリティの点でPythonよりも優れています。 1）Golangのコンピレーションタイプの特性と効率的な並行性モデルにより、高い並行性シナリオでうまく機能します。 2）Pythonは解釈された言語として、ゆっくりと実行されますが、Cythonなどのツールを介してパフォーマンスを最適化できます。

Golang and C：Concurrency vs. Raw Speed Apr 21, 2025 am 12:16 AM

Golangは並行性がCよりも優れていますが、Cは生の速度ではGolangよりも優れています。 1）Golangは、GoroutineとChannelを通じて効率的な並行性を達成します。これは、多数の同時タスクの処理に適しています。 2）Cコンパイラの最適化と標準ライブラリを介して、極端な最適化を必要とするアプリケーションに適したハードウェアに近い高性能を提供します。

ゴーを始めましょう：初心者のガイド Apr 26, 2025 am 12:21 AM

goisidealforforbeginnersandsutable forcloudnetworkservicesduetoitssimplicity、andconcurrencyfeatures.1）installgofromtheofficialwebsiteandverify with'goversion'.2）

Golang vs. C：パフォーマンスと速度の比較 Apr 21, 2025 am 12:13 AM

Golangは迅速な発展と同時シナリオに適しており、Cは極端なパフォーマンスと低レベルの制御が必要なシナリオに適しています。 1）Golangは、ごみ収集と並行機関のメカニズムを通じてパフォーマンスを向上させ、高配列Webサービス開発に適しています。 2）Cは、手動のメモリ管理とコンパイラの最適化を通じて究極のパフォーマンスを実現し、埋め込みシステム開発に適しています。

Golang vs. Python：重要な違いと類似点 Apr 17, 2025 am 12:15 AM

GolangとPythonにはそれぞれ独自の利点があります。Golangは高性能と同時プログラミングに適していますが、PythonはデータサイエンスとWeb開発に適しています。 Golangは同時性モデルと効率的なパフォーマンスで知られていますが、Pythonは簡潔な構文とリッチライブラリエコシステムで知られています。

GolangとC：パフォーマンスのトレードオフ Apr 17, 2025 am 12:18 AM

GolangとCのパフォーマンスの違いは、主にメモリ管理、コンピレーションの最適化、ランタイム効率に反映されています。 1）Golangのゴミ収集メカニズムは便利ですが、パフォーマンスに影響を与える可能性があります。

パフォーマンスレース：ゴラン対c Apr 16, 2025 am 12:07 AM

GolangとCにはそれぞれパフォーマンス競争において独自の利点があります。1）Golangは、高い並行性と迅速な発展に適しており、2）Cはより高いパフォーマンスと微細な制御を提供します。選択は、プロジェクトの要件とチームテクノロジースタックに基づいている必要があります。

Golang vs. Python：長所と短所 Apr 21, 2025 am 12:17 AM

GolangisidealforBuildingsCalables Systemsduetoitsefficiency andConcurrency、Whilepythonexcelsinquickscriptinganddataanalysisduetoitssimplicityand vastecosystem.golang'ssignencouragesclean、readisinediteNeditinesinedinediseNabletinedinedinedisedisedioncourase

See all articles

パート I: DSL を構築するための式インタープリターの実装 - PEG パーサーの紹介

レクサーって何？

パーサーはどうですか？

パーサーを実装してみよう

PEGを詳しく見てみる

Example and code come first

The very first rule

More about the rule syntax

Resolving ambiguity

Let's have a try

Preparation

Generate and test the parser

Wrap it up

ホットAIツール

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

人気の記事

ホットツール

メモ帳++7.3.1

SublimeText3 中国語版

ゼンドスタジオ 13.0.1

ドリームウィーバー CS6

SublimeText3 Mac版

ホットトピック