上一篇 我们简单描述了Abstract Form的基本组成。现在,我们来看看如何利用Abstract Form动态生成和修改module。
在第一篇
探索 Erlang Abstract Form--生成和获取,我们就说过,要获得Abstract Form有两种方法,一种读取beam文件中的debug_info,另一种方法就是直接解析源代码。
提供源代码文本
修改一个module最有用的功能是增加新的函数。我们从beam文件可以获取现有模块的Abstract Form,但是如果需要动态增加方法,最容易想到的就是提供函数的源代码文本。
解析源代码通常需要两个工具,即扫描器和解析器。Erlang提供的基本扫描器是
erl_scan,解析器为
erl_parse。我们看看它们的文档。
MODULE
erl_scan
MODULE SUMMARY
The Erlang Token Scanner
DESCRIPTION
This module contains functions for tokenizing characters into Erlang tokens.
EXPORTS
string(CharList,StartLine]) -> {ok, Tokens, EndLine} | Error
string(CharList) -> {ok, Tokens, EndLine} | Error
Types:
CharList = string()
StartLine = EndLine = Line = integer()
Tokens = [{atom(),Line}|{atom(),Line,term()}]
Error = {error, ErrorInfo, EndLine}
Takes the list of characters CharList
and tries to scan (tokenize) them. Returns {ok, Tokens, EndLine}
, where Tokens
are the Erlang tokens from CharList
. EndLine
is the last line where a token was found.
StartLine
indicates the initial line when scanning starts. string/1
is equivalent to string(CharList,1)
.
{error, ErrorInfo, EndLine}
is returned if an error occurs. EndLine
indicates where the error occurred.
erl_scan:string方法扫描字符串文本,如果没有发生错误,则在结果tuple中返回所有的token,不然返回错误的行号。
Eshell V5.5 (abort with ^G)
1> c(simplest, [debug_info]).
{ok,simplest}
2> {ok, Tokens, EndLine} = erl_scan:string("test() -> ok.").
{ok,[{atom,1,test},{'(',1},{')',1},{'->',1},{atom,1,ok},{dot,1}],1}
3>
我们传入一个最简单函数test,erl_scan:string扫描的结果返回Tokens,其中包含5个token。分别是atom类型的函数名test、左括号、右括号、函数头的结束->、atom类型的atom ok,以及最后结束的dot。
有了这个 Tokens,我们可以用erl_parse解析生成Abstract Form。erl_parse的文档中说:
MODULE
erl_parse
MODULE SUMMARY
The Erlang Parser
DESCRIPTION
This module is the basic Erlang parser which converts tokens into the abstract form of either forms (i.e., top-level constructs), expressions, or terms. The Abstract Format is described in the ERTS User's Guide. Note that a token list must end with the dot token in order to be acceptable to the parse functions (see erl_scan).
EXPORTS
parse_form(Tokens) -> {ok, AbsForm} | {error, ErrorInfo}
Types:
Tokens = [Token]
Token = {Tag,Line} | {Tag,Line,term()}
Tag = atom()
AbsForm = term()
ErrorInfo = see section Error Information below.
This function parses Tokens
as if it were a form. It returns:
-
{ok, AbsForm}
- The parsing was successful.
AbsForm
is the abstract form of the parsed form.
-
{error, ErrorInfo}
- An error occurred.
- erl_parse可以解析很多种token,包括表达式、term、Form等等,我们需要的是完全解析Form的函数parse_form。同样,如果解析成功,那么返回的tuple中将包含tokens代表的Abstract Form,不然返回语法错误信息。
3> erl_parse:parse_form(Tokens).
{ok,{function,1,test,0,[{clause,1,[],[],[{atom,1,ok}]}]}}
我们可以看到,返回结果中包含了函数的Abtract Form。
编译Abstract Form
有了Abstract Form以后,我们就可以编译,并加载它。
compile模块的文档中说:
forms(Forms)
Is the same as forms(File, [verbose,report_errors,report_warnings])
.
forms(Forms, Options) -> CompRet
Types:
Forms = [Form]
CompRet = BinRet | ErrRet
BinRet = {ok,ModuleName,BinaryOrCode} | {ok,ModuleName,BinaryOrCode,Warnings}
BinaryOrCode = binary() | term() ErrRet = error | {error,Errors,Warnings}
Analogous to file/1
, but takes a list of forms (in the Erlang abstract format representation) as first argument. The option binary
is implicit; i.e., no object code file is produced. Options that would ordinarily produce a listing file, such as 'E', will instead cause the internal format for that compiler pass (an Erlang term; usually not a binary) to be returned instead of a binary.
compile:forms这个函数取要编译的Form作为参数,把它编译成可以被虚拟机执行的二进制对象码数据。它和compile:file基本一样,只不过提供的是Form参数,而compile:file是需要编译的文件名。
实际上,compile:file这个方法我们一直都在用。erl的shell中,输入c(File, options)就是在编译文件:
c(File) -> {ok, Module} | error
c(File, Options) -> {ok, Module} | error
Types:
File = Filename | Module
Filename = string() | atom()
Options = [Opt] -- see compile:file/2
Module = atom()
c/1,2
compiles and then purges and loads the code for a file. Options
defaults to []. Compilation is equivalent to:
compile:file(File, Options ++ [report_errors, report_warnings])
Note that purging the code means that any processes lingering in old code for the module are killed without warning. See code/3
for more information.
c调用compile的file方法编译.erl文件,然后从内存中移去原先存在的代码,然后加载新的代码。
要编译Abstract Form,我们必须提供整个module完整的Form。因此,我们需要提供module属性、export属性等等。
1> c(simplest,[debug_info]).
{ok,simplest}
2> {ok, Tokens, EndLine} = erl_scan:string("test() -> ok.").
{ok,[{atom,1,test},{'(',1},{')',1},{'->',1},{atom,1,ok},{dot,1}],1}
3> {ok, Forms} = erl_parse:parse_form(Tokens).
{ok,{function,1,test,0,[{clause,1,[],[],[{atom,1,ok}]}]}}
4>
4> NewForms = [{attribute, 1, module, simplest},{attribute, 2, export, [{test,0}]}, Forms].
[{attribute,1,module,simplest},
{attribute,2,export,[{test,0}]},
{function,1,test,0,[{clause,1,[],[],[{atom,1,ok}]}]}]
5> compile:forms(NewForms).
{ok,simplest,
<<70,79,82,49,0,0,1,188,66,69,65,77,65,116,111,109,0,0,0,55,0,0,0,6,7,...>>}
加载新编译的对象码
阅读上面文档的另外一个好处就是我们知道编译以后如何加载。code模块定义了这些函数:
purge(Module) -> true | false
Purges the code for Module
, that is, removes code marked as old. If some processes still linger in the old code, these processes are killed before the code is removed.
Returns true
if successful and any process needed to be killed, otherwise false
.
purge函数把现有的代码移去并标记为老版本,如果有任何process在使用旧代码,那么这些process将被杀死。注意尽管purge总是成功的,但是它的返回值只有在任何process需要被杀死的情况下才会返回true.
load_binary(Module, Filename, Binary) -> {module, Module} | {error, What}
Types:
Module = atom()
Filename = string()
What = sticky_directory | badarg | term()
This function can be used to load object code on remote Erlang nodes. It can also be used to load object code where the file name and module name differ. This, however, is a very unusual situation and not recommended. The parameter Binary
must contain object code for Module
. Filename
is only used by the code server to keep a record of from which file the object code for Module
comes. Accordingly, Filename
is not opened and read by the code server.
Returns {module, Module}
if successful, or {error, sticky_directory}
if the object code resides in a sticky directory, or {error, badarg}
if any argument is invalid. Also if the loading fails, an error tuple is returned. See erlang:load_module/2 for possible values of What
.
load_binary加载编译好的对象码,从而使得Module可以被程序使用。如果对象代码存在于sticky目录下的话,可能无法成功替换。sticky目录是erlang自己的运行时系统,包括kernel、stdlib和compiler,为了保证erlang的运行正常,缺省情况下这些目录是受保护的,被认为是sticky的。
组合起来
利用我们前面讨论过的内容,我们可以进行完整的试验.
假设现在有以下程序simplest.erl:
-module(simplest).
-export([foo/0]).
foo() ->
io:format("foo~n").
我们用erl一步一步进行试验。
1> c(simplest,[debug_info]).
{ok,simplest}
2> simplest:foo().
foo
ok
3> simplest:test().
=ERROR REPORT==== 18-Aug-2006::15:06:17 ===
Error in process <0.32.0> with exit value: {undef,[{simplest,test,[]},{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}]}
** exited: {undef,[{simplest,test,[]},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_loop,3}]} **
4> {ok, Tokens, EndLine} = erl_scan:string("test() -> ok.").
{ok,[{atom,1,test},{'(',1},{')',1},{'->',1},{atom,1,ok},{dot,1}],1}
5> {ok, Forms} = erl_parse:parse_form(Tokens).
{ok,{function,1,test,0,[{clause,1,[],[],[{atom,1,ok}]}]}}
6> NewForms = [{attribute, 1, module, simplest},{attribute, 2, export, [{test,0}]}, Forms].
[{attribute,1,module,simplest},
{attribute,2,export,[{test,0}]},
{function,1,test,0,[{clause,1,[],[],[{atom,1,ok}]}]}]
7> {ok,simplest,Binary} = compile:forms(NewForms).
{ok,simplest,
<<70,79,82,49,0,0,1,188,66,69,65,77,65,116,111,109,0,0,0,56,0,0,0,6,8,...>>}
8> code:purge(simplest).
false
9> code:load_binary(simplest,"simplest.erl",Binary).
{module,simplest}
10> simplest:foo().
=ERROR REPORT==== 18-Aug-2006::15:08:13 ===
Error in process <0.40.0> with exit value: {undef,[{simplest,foo,[]},{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}]}
** exited: {undef,[{simplest,foo,[]},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_loop,3}]} **
11> simplest:test().
ok
12>
结果是新加入了一个export的test/0函数,原先的foo/0函数没有了。
下一步
当然,我们的目的不是完全覆盖原先的module,而是能够增加、删除和替换原先的函数。因此,我们需要通过前面的beam_lib:chunks方法读取并保存老的Form,然后把根据不同的操作,加入、替换新函数的Form,然后一起编译。
这就是Smerl所做的事情。