1부: DSL 구축을 위한 표현식 해석기 구현

집

백엔드 개발

Golang

1부: DSL 구축을 위한 표현식 해석기 구현 - PEG 파서 소개

PHPz

Aug 05, 2024 pm 08:40 PM

Part I: Implement an expression interpreter for building DSL - Introduce the PEG parser

지난 게시물에서 제가 이 프로젝트를 시작하는 이유와 방법, 그리고 최종적으로 DSL이 어떤 모습일지 소개했습니다. 이번 포스팅부터 전체 프로젝트의 구현을 여러분과 공유하겠습니다.

일반적으로 언어를 구현할 때 가장 먼저 떠오르는 것은 렉서와 파서입니다. 그래서 이번 포스팅에서는 너무 혼란스럽지 않도록 구체적인 세부 사항은 있지만 개념은 덜면서 DSL을 구현하는 방법을 소개하겠습니다.

렉서는 무엇입니까?

일반적으로 어휘 분석은 어휘 분석이나 토큰화에 사용됩니다. "We will rock you!"(퀸의 유명한 로큰롤 음악)라는 구절을 예로 들어보겠습니다. 영어 문법 규칙에 따라 토큰화한 후 [Subject("We"), Auxiliary("will"), Verb("rock"), Object("you"), Punctuation("!") 목록을 출력합니다. )]. 그래서 렉서는 어휘적 의미에 따라 텍스트를 특정 유형으로 분류하는 데 주로 사용됩니다. 구문은 실제로 문자나 단어 대신 어휘 요소로 구성되기 때문에 이는 우리에게 중요합니다.

일반적으로 우리는 Ragel, nex 등과 같은 Regular Express를 구문 분석할 수 있는 일부 코드 생성기를 사용하여 어휘 분석기를 구현합니다. 하지만 Rob Pike가 Go에서 Lexical Scanning을 살펴본 후 어휘분석기를 구현하는 것이 얼마나 쉬운지 놀라실 것입니다. 그는 어휘 분석기의 핵심이라고 생각되는 유한 상태 기계를 구현하기 위해 흥미로운 패턴을 도입했습니다.

파서는 어떻습니까?

그럼 파서는 어떨까요? 그것은 무엇을 위해 사용됩니까? 기본적으로 파서는 문법이라고도 하는 지정된 패턴으로 어휘 요소 목록을 식별하는 데 사용됩니다. 이전에 소개한 "We will rock you!"를 예로 들어 보겠습니다. 이는 '미래 시제' 패턴과 일치하는 [Subject("We"), Auxiliary("will"), Verb("rock"), Object("you"), Punctuation("!")]의 시퀀스를 생성합니다. 영어로 된 문법. 이것이 바로 파서가 하는 일이며 소위 '구문 분석'이라고 합니다.

좀 더 컴퓨터적인 방식으로 또 다른 예를 들어보겠습니다. 1 + 2 * 3 과 같은 표현은 어떻습니까? 어휘분석기에 의해 [Number(1), Operator(+), Number(2), Operator(*), Number(3)]로 변환될 것은 분명하지만, 이 시퀀스는 파서에 의해 변환될 것입니다. 일반적인 수학적 표현 문법으로? 일반적으로 어휘 요소 시퀀스는 다음과 같이 파서에 의해 추상 구문 트리(또는 줄여서 AST)로 변환됩니다.

로그인 후 복사

추상 구문 트리를 사용하면 문법 규칙에 따라 올바른 순서로 구문을 분석할 수 있습니다.

파서를 구현해보자

이제 자체적으로 파서를 구현해 보겠습니다. 완전히 우리 스스로는 아니지만, 파서를 위한 코드를 생성하는 데 도움이 되는 도구가 여전히 필요합니다. 올바르게 구현하기 어렵고 손으로 쓰는 파서는 유지 관리하기 어려울 수 있기 때문입니다. 그렇게 한다고 해도 성능이 저하될 수 있습니다. 가난해요.

다행히도 문법 정의 파일로 golang 코드를 생성하는 데 도움이 되는 성숙한 파서 생성 도구가 많이 있습니다. 내 마음 속에 떠오른 첫 번째 선택은 파서용 go 코드를 생성하는 공식 도구인 go-yacc입니다. 저는 SQL 분석기를 작성하는 데 이 도구를 사용했는데 다음과 같은 이유로 별로 즐겁지 않았습니다.

외부 어휘 분석기가 필요합니다.
문서가 부족합니다.
공용체 정의와 값 유형 선언은 모두 상당히 혼란스럽습니다.

"새로운 것을 시도해 보는 것은 어떨까요?" 내 생각에, 이제 놀라운 도구 페그를 사용하여 하나의 단일 문법 파일, 하나의 단일 인터페이스에서 어휘 분석기와 파서 모두를 구현할 수 있었습니다. 페그를 자세히 살펴보겠습니다.

PEG에 대해 자세히 살펴보기

PEG는 Bryan Ford가 2004년에 도입한 Parsing Expression Grammar의 약자로, 프로그래밍 언어 구문과 프로토콜을 기술하고 표현하는 데 사용되는 전통적인 Context Free Grammar의 대안입니다.

지난 수십 년 동안 우리는 파서 코드 생성을 위해 CFG를 지원하는 yacc, bison과 같은 파서 생성 도구를 사용해 왔으며, 이전에 이러한 도구를 사용해 본 적이 있다면 모호함을 피하고 어휘분석기 또는 정규식. 실제로 프로그래밍 언어의 구문은 어휘 요소의 패턴일 뿐만 아니라 CFG에서 누락된 어휘 요소의 규칙이기도 합니다. 따라서 yacc와 같은 도구를 사용할 때는 직접 어휘 분석기를 구현해야 합니다. 더욱이 CFG에서 모호함(예: 더하기와 곱하기의 우선순위 등을 확인하세요)을 피하기 위해 각 연산자의 우선순위를 정의해야 합니다. 이러한 모든 중요한 사실로 인해 파서 개발이 불필요하게 어려워집니다.

But thanks to Bryan Ford, now we have another good choice, the PEG that allows us to define the lexical and syntax rule all in one single file with a tight DSL and resolve ambiguity in an elegant and simple way. Let me show you how easy it can be done with peg.

Example and code come first

I gonna take examples from my gendsl library which implements a lisp-like syntax(you can check it out here). Here is a simple snippet that can parse hex and decimal number literals in the golang style:

package playground

type parser Peg {
}

Script          <- Value EOF

EOF             <- !.

Value           <- IntegerLiteral

IntegerLiteral  <- [+\-]? ('0' ('x' / 'X') HexNumeral 
                           / DecimalNumeral ) [uU]?

HexNumeral      <- HexDigit ([_]* HexDigit)* / '0'

HexDigit        <- [0-9] / [A-F] / [a-f]

DecimalNumeral  <- [1-9] ([_]* [0-9])* / '0'     

# ...

로그인 후 복사

The first line package gendsl is package declaration which decides which package the generated golang file belongs to. The following type declaration type parser Peg {} used to define the parser type, which we will use it later for evaluation but you can ignore it for now.

After the parser type declaration we can start to define your syntax rule till the end. This is different from the workflow I used to work with with yacc when I have to define a union type and a lot of token types before I can actually define my grammar, which could be really confusing. Anyway, let's take a quick look at the grammar definition.

The very first rule

If you have worked with CFG before you might find the definition DSL syntax quite familiar. The right hand side of the '<-' refers to the pattern of lexical elements, which could be some other patterns or character sequence, and the left hand side is the name of the pattern. Pretty easy, right?

Let's pay attention to the first pattern rule here since the first rule is always to entry point of the parser. The entry point Script here is consist of two parts, one is a rule refers to Value which is consist of a sequence of specified characters(we will get back to this later), the other one EOF is kind of interesting. Let's jump to the next rule to find the pattern of EOF. As you can see, EOF is consist of !.. What does !. mean? The !actually means NOT, and . means any character, so !. means NOTHING AT ALL or End Of File if you will. As a result whenever the parser find there is no character to read, it will stop here and treat it as an dummy rule call EOF which might produces the rule Script. This is quite a common pattern for PEG.

More about the rule syntax

So much like the regular expression(RE), the syntax of PEG is simple:

. stands for any character.
character set like [a-z] is also supported.
X matches one character if X is a character or a pattern when X is the name of an rule.
X? matches one character or pattern or none, while X* matches zero or more and 'X+' matches one or more.
X \ Y matches X or Y where X and Y could be characters, patterns or rule name.

Take the rule of DecimalNumeral as an example. The first part [1-9] means the start of an DecimalNumeral must be one of a digit ranging from 1 to 9, ([_]* [0-9])* means starting from the second position, all character, if there is any, must all be digit(0-9) that has might have no '_' or more than one '_' as its prefix so it could match string like "10_2_3". Otherwise, indicated by the operator \, it could also just be one single character '0' which means 0 obviously .

Resolving ambiguity

I'd like to spend more time to explain the or operator \, since it is quite important as the solution to the ambiguity. The PEG will always try to match the first pattern and then the second, the third and so on til it finds one matched, which is considered as earliest-match-first. For example, a string "ab" will never be able to match the grammar G <- 'a' / 'a' 'b', since the first character 'a' will be reduced to G but the 'b' left cannot match anything. By the way, CFG doesn't allow such a rule and will throw the reduce/shift conflict error.

There is no much syntax left, you can explore them yourself in the pointlander/peg README or peg doc.

Let's have a try

Now that we already have a simple syntax rule prepared above, though it is not the whole grammar for the gendsl project but it can still parse some numbers. Anyway let's generate some code and see if it works as we expect.

Preparation

First we have to install the peg binary tool for code generate following this guide, then we gonna setup our workspace directory for playing:

> mkdir peg_playground && peg_playground 
> go mod init peg_playground 
> touch grammar.peg

로그인 후 복사

Paste the grammar we have before into the peg_playground/grammar.peg, now we should be able to genreate the code using the generate tool but why not make a Makefile in peg_playground/makefile

GO := go

.SUFFIXES: .peg .go

grammar.go: grammar.peg
    peg -switch -inline -strict -output ./$@ $<

all: grammar.go

clean:
    rm grammar.go

로그인 후 복사

Generate and test the parser

Now that we have everything ready, let's generate the code for parser:

make grammar.go

로그인 후 복사

After running the command, you should see a generated grammar.go in the workspace directory. Let's write a function to parse a string and access our parser:

// peg_playground/parser.go
package playground

func PrintAST(script string) error {
    parser := &parser{
        Buffer: script,
        Pretty: true,
    }

    if err := parser.Init(); err != nil {
        return err
    }
    if err := parser.Parse(); err != nil {
        return err
    }

    parser.PrintSyntaxTree()
    return nil
}

로그인 후 복사

The snippet here is pretty simple, it initializes the parser, parses the script we pass to it and print the syntax tree in final. Let's write an unit test to see if it works:

// peg_playground/parser_test.go
package playground

import (
    "testing"
)

func TestPrintTree(t *testing.T) {
    if err := PrintAST(`0x123`); err != nil {
        t.Fatal(err)
    }
    t.Log("-----------------------------------------------------")

    if err := PrintAST(`10_2_3`); err != nil {
        t.Fatal(err)
    }
    t.Log("-----------------------------------------------------")
}

로그인 후 복사

The test function TestPrintTree calls the PrintAST and check the error. Let's run it now and see what it gonna print:

go test . -v

로그인 후 복사

Now we should see the whole syntax tree in the output:

=== RUN   TestPrintTree
Script "0x123"
 Value "0x123"
  IntegerLiteral "0x123"
   HexNumeral "123"
    HexDigit "1"
    HexDigit "2"
    HexDigit "3"
    parser_test.go:11: -----------------------------------------------------
Script "10_2_3"
 Value "10_2_3"
  IntegerLiteral "10_2_3"
   DecimalNumeral "10_2_3"
    parser_test.go:16: -----------------------------------------------------
--- PASS: TestPrintTree (0.00s)
PASS
ok      playground      0.649s

로그인 후 복사

It looks great, right? Everything works as we expected. No syntax error thrown and it prints every rule matched and the string it matches in a format of tree, which could be really useful when debugging.

Wrap it up

In this post, I have introduced you the two basic but significant parts of interpreter programming language:

Lexer, for lexical analysis that transforms a string into a sequence of lexical elements.
Parser, for syntax analysis that identify the the pattern (so called grammar) in the lexical elements and produces a syntax tree.

And then I introduce the PEG for parser code generating, and address its advantages comparing the traditional CFG:

Lexer rule integrated, no standalone lexer need to be implemented.
Simple, regular expression like syntax to start up fast.
No ambiguity, no reduce/shift conflict, always earlier-match-first.

Finally I have a tiny demonstration of how to generate parser with PEG, which is the basis of our interpreter.
In next post, I will walk you through the gendsl grammar in detail.
Thank you for checking this post, hope you enjoy it.

위 내용은 1부: DSL 구축을 위한 표현식 해석기 구현 - PEG 파서 소개의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

본 웹사이트의 성명

본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.

핫 AI 도구

Undresser.AI Undress

사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover

사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool

무료로 이미지를 벗다

Clothoff.io

AI 옷 제거제

Video Face Swap

완전히 무료인 AI 얼굴 교환 도구를 사용하여 모든 비디오의 얼굴을 쉽게 바꾸세요!

뜨거운 도구

메모장++7.3.1

사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전

중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기

강력한 PHP 통합 개발 환경

드림위버 CS6

시각적 웹 개발 도구

SublimeText3 Mac 버전

신 수준의 코드 편집 소프트웨어(SublimeText3)

뜨거운 주제

자바 튜토리얼

1673

Cakephp 튜토리얼

1429

라라벨 튜토리얼

1333

PHP 튜토리얼

1278

C# 튜토리얼

1257

Related knowledge

Golang vs. Python : 성능 및 확장 성 Apr 19, 2025 am 12:18 AM

Golang은 성능과 확장 성 측면에서 Python보다 낫습니다. 1) Golang의 컴파일 유형 특성과 효율적인 동시성 모델은 높은 동시성 시나리오에서 잘 수행합니다. 2) 해석 된 언어로서 파이썬은 천천히 실행되지만 Cython과 같은 도구를 통해 성능을 최적화 할 수 있습니다.

Golang 및 C : 동시성 대 원시 속도 Apr 21, 2025 am 12:16 AM

Golang은 동시성에서 C보다 낫고 C는 원시 속도에서 Golang보다 낫습니다. 1) Golang은 Goroutine 및 Channel을 통해 효율적인 동시성을 달성하며, 이는 많은 동시 작업을 처리하는 데 적합합니다. 2) C 컴파일러 최적화 및 표준 라이브러리를 통해 하드웨어에 가까운 고성능을 제공하며 극도의 최적화가 필요한 애플리케이션에 적합합니다.

GOT GO로 시작 : 초보자 가이드 Apr 26, 2025 am 12:21 AM

goisidealforbeginnersandsuitableforcloudandnetworkservicesduetoitssimplicity, 효율성, 및 콘크리 론 피처

Golang vs. C : 성능 및 속도 비교 Apr 21, 2025 am 12:13 AM

Golang은 빠른 개발 및 동시 시나리오에 적합하며 C는 극도의 성능 및 저수준 제어가 필요한 시나리오에 적합합니다. 1) Golang은 쓰레기 수집 및 동시성 메커니즘을 통해 성능을 향상시키고, 고전성 웹 서비스 개발에 적합합니다. 2) C는 수동 메모리 관리 및 컴파일러 최적화를 통해 궁극적 인 성능을 달성하며 임베디드 시스템 개발에 적합합니다.

Golang vs. Python : 주요 차이점과 유사성 Apr 17, 2025 am 12:15 AM

Golang과 Python은 각각 고유 한 장점이 있습니다. Golang은 고성능 및 동시 프로그래밍에 적합하지만 Python은 데이터 과학 및 웹 개발에 적합합니다. Golang은 동시성 모델과 효율적인 성능으로 유명하며 Python은 간결한 구문 및 풍부한 라이브러리 생태계로 유명합니다.

Golang 및 C : 성능 상충 Apr 17, 2025 am 12:18 AM

Golang과 C의 성능 차이는 주로 메모리 관리, 컴파일 최적화 및 런타임 효율에 반영됩니다. 1) Golang의 쓰레기 수집 메커니즘은 편리하지만 성능에 영향을 줄 수 있습니다. 2) C의 수동 메모리 관리 및 컴파일러 최적화는 재귀 컴퓨팅에서 더 효율적입니다.

공연 경주 : 골랑 대 c Apr 16, 2025 am 12:07 AM

Golang과 C는 각각 공연 경쟁에서 고유 한 장점을 가지고 있습니다. 1) Golang은 높은 동시성과 빠른 발전에 적합하며 2) C는 더 높은 성능과 세밀한 제어를 제공합니다. 선택은 프로젝트 요구 사항 및 팀 기술 스택을 기반으로해야합니다.

Golang vs. Python : 장단점 Apr 21, 2025 am 12:17 AM

golangisidealforbuildingscalablesystemsdueToitsefficiencyandconcurrency

See all articles

1부: DSL 구축을 위한 표현식 해석기 구현 - PEG 파서 소개

렉서는 무엇입니까?

파서는 어떻습니까?

파서를 구현해보자

PEG에 대해 자세히 살펴보기

Example and code come first

The very first rule

More about the rule syntax

Resolving ambiguity

Let's have a try

Preparation

Generate and test the parser

Wrap it up

핫 AI 도구

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

인기 기사

뜨거운 도구

메모장++7.3.1

SublimeText3 중국어 버전

스튜디오 13.0.1 보내기

드림위버 CS6

SublimeText3 Mac 버전

뜨거운 주제