解析轉換器3:手寫PHP轉Python編譯器的詞法部分

高洛峰
發布: 2017-03-12 10:17:00
原創
1252 人瀏覽過

这篇文章解析转换器3:手写PHP转Python编译器的词法部分

一时技痒,自然而然地想搞个大家伙,把整个PHP程序转成Python。不比模板,可以用正则匹配偷懒,这次非写一个Php编译器不可。

上网搜了一下,发现大部分Python to xxx的transpile都是直接基于AST,省略了最重要的Tokenizer,Parser。直接写个Visitor了事。要不然就是基于Antlr之类的生成器,搞一大堆代码,看得令人心烦。

既然大家都不想做这个苦力,我就来试试,手工写一个Php编译器。分Tokenizer,Parser,Visitor三个部分来实现。

翻出《龙书》《虎书》做参考,仔细学了一回PHP,不学不知道,原来PHP有那么多特性,做个编译器真心累人。

词法部分很简单,就是一个自动机。设计了一个结构存放自动机,然后简单粗暴地在自动机上编程,也顾不上什么性能了,就是个一锤子买卖。

写得还算快,调试不是很顺,不过我是不会说的,哈

自动机不复杂,发上来大家看看,敬请指正。


self.statemachine = {
            'current': {
                'state': 'default', 'content': '', 'line': 0},
            'default': [
                {'name': 'open', 'next': 'php', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',
                 &#39;token&#39;: r&#39;<\?&#39;},
                {&#39;name&#39;: &#39;open&#39;, &#39;next&#39;: &#39;php&#39;, &#39;extra&#39;: 0, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;,
                 &#39;token&#39;: r&#39;<\?php&#39;}],
            &#39;php&#39;: [
                {&#39;name&#39;: &#39;close&#39;, &#39;next&#39;: &#39;default&#39;, &#39;extra&#39;: 0,
                 &#39;token&#39;: r&#39;\?>&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;lnum&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 0, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;,
                 &#39;token&#39;: r&#39;[0-9]+&#39;},
                {&#39;name&#39;: &#39;dnum&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 0, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;,
                 &#39;token&#39;: r&#39;([0-9]*\.[0-9]+)|([0-9]+\.[0-9]*)&#39;},
                {&#39;name&#39;: &#39;exponent&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 0, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;,
                 &#39;token&#39;: r&#39;(([0-9]+|([0-9]*\.[0-9]+)|([0-9]+\.[0-9]*))[eE][+-]?[0-9]+)&#39;},
                {&#39;name&#39;: &#39;hnum&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 0, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;,
                 &#39;token&#39;: r&#39;0x[0-9a-fA-F]+&#39;},
                {&#39;name&#39;: &#39;bnum&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 0, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;,
                 &#39;token&#39;: r&#39;0b[01]+&#39;},
                {&#39;name&#39;: &#39;label&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 0, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;,
                 &#39;token&#39;: r&#39;[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*&#39;},
                {&#39;name&#39;: &#39;comment&#39;, &#39;next&#39;: &#39;commentline&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;//&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;comment&#39;, &#39;next&#39;: &#39;commentline&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;#&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;comment&#39;, &#39;next&#39;: &#39;comment&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;/\*&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;string1&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;\&#39;&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;string2&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;"&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;symbol&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 0, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;,
                 &#39;token&#39;: r&#39;[\\\{\};:,\.\[\]\(\)\|\^&\+-/\*=%!~$<>\?@]&#39;}],
            &#39;string1&#39;: [
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;php&#39;, &#39;extra&#39;: 0,
                 &#39;token&#39;: r&#39;\&#39;&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;escape1&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;\\&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;}],
            &#39;escape1&#39;: [
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;string1&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;.&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;}],
            &#39;string2&#39;: [
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;php&#39;, &#39;extra&#39;: 0,
                 &#39;token&#39;: r&#39;\&#39;&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;escape2&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;\\&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;}],
            &#39;escape2&#39;: [
                {&#39;name&#39;: &#39;string&#39;, &#39;next&#39;: &#39;string2&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;.&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;}],
            &#39;commentline&#39;: [
                {&#39;name&#39;: &#39;comment&#39;, &#39;next&#39;: &#39;php&#39;, &#39;extra&#39;: 0,
                 &#39;token&#39;: r&#39;(\r|\n|\r\n)&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;comment&#39;, &#39;next&#39;: &#39;php&#39;, &#39;extra&#39;: 0,
                 &#39;token&#39;: r&#39;&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;}],
            &#39;comment&#39;: [
                {&#39;name&#39;: &#39;comment&#39;, &#39;next&#39;: &#39;php&#39;, &#39;extra&#39;: 0,
                 &#39;token&#39;: r&#39;\*/&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;},
                {&#39;name&#39;: &#39;comment&#39;, &#39;next&#39;: &#39;&#39;, &#39;extra&#39;: 1,
                 &#39;token&#39;: r&#39;&#39;, &#39;start&#39;: 0, &#39;end&#39;: 0, &#39;cache&#39;: &#39;&#39;}]}
登入後複製

以上是解析轉換器3:手寫PHP轉Python編譯器的詞法部分的詳細內容。更多資訊請關注PHP中文網其他相關文章!

相關標籤:
來源:php.cn
本網站聲明
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn
最新問題
熱門教學
更多>
最新下載
更多>
網站特效
網站源碼
網站素材
前端模板
關於我們 免責聲明 Sitemap
PHP中文網:公益線上PHP培訓,幫助PHP學習者快速成長!