Lua源码阅读：数据结构与栈

2024-04-03

topic

5.1k 词

    <p>承接上篇虚拟机概述，解析函数执行流程。</p>

数据结构与栈：

每个Lua 虚拟机对应一个 lua_State 结构体，它使用 TValue 数组来模拟栈，其中包括几个与栈相关的成员：

stack：栈数组的起始位置
base：当前函数栈的基地址
top：当前栈的下一个可用位置

这些成员的初始化操作在 stack_init 函数中完成。

然而 lua_State 里面存放的是一个 lua 虚拟机的全局状态，当执行到一个函数时，需要有对应的数据结构来表示函数相关的信息，这个数据结构就是 CallInfo，这个结构体中同样有 top、base这两个与栈相关的成员。





** case, the actual 'func' value is saved in field 'extra'. 
** When a function calls another with a continuation, 'extra' keeps
** the function index so that, in case of errors, the continuation
** function can be called with the correct top.
*/
typedef struct  {
  StkId func;  /* function index in the stack */
  StkId	top;  /* top for this function */
  struct  *previous, *next;  /* dynamic call link */
  union {
    struct {  /* only for Lua functions */
      StkId base;  /* base for this function */
      const Instruction *savedpc;
    } l;
    struct {  /* only for C functions */
      lua_KFunction k;  /* continuation in case of yields */
      ptrdiff_t old_errfunc;
      lua_KContext ctx;  /* context info. in case of yields */
    } c;
  } u;
  ptrdiff_t extra;
  short nresults;  /* expected number of results from this function */
  lu_byte callstatus;
} CallInfo;

在 lua_State 中，有一个 base_ci 的 CallInfo 数组，存储的就是 CallInfo 的信息，而另一个 ci 成员，指向的就是当前函数的 CallInfo 指针。

在调用函数之前，一般会调用 luaD_precall 函数：

保存当前虚拟机执行的命令 savedpc 到当前 CallInfo 的 savedpc 中。此处保存下来是为了后面调用完毕之后恢复执行。
分别计算出待调用函数的 base、top 值，这些值的计算依赖于函数的参数数量。
从 lua_State 的 base_ci 数组中分配一个新的 CallInfo 指针，存储前面两步计算出来的信息，切换到这个函数准备调用。

命令的解析：

首先，分析阶段要用到的数据结构是 FuncState。这个结构体用于在语法分析时保存解析函数之后相关的信息，根据其中的 prev 指针成员来串联起来。对于下面的 Lua 代码：

-- 最外层 FuncState fs1
local function a()	-- 函数 a 的 FuncState fsa
	local function b()	-- 函数 b 的 FuncState fsb
	end
end

涉及到三个 FuncState 指针，一层一层嵌套包围，其中fs1是 fsa 的父指针:

在 FuncState 结构体中，有一个成员 Proto *f，它用来保存这个 FuncState 解析命令后生成的命令，其中除了自己的，还包括内部嵌套的子函数的：

/* state needed to generate code for a given function */
typedef struct FuncState {
  Proto *f;  /* current function header */
  struct FuncState *prev;  /* enclosing function */
  struct LexState *ls;  /* lexical state */
  struct BlockCnt *bl;  /* chain of current blocks */
  int pc;  /* next position to code (equivalent to 'ncode') */
  int lasttarget;   /* 'label' of last 'jump label' */
  int jpc;  /* list of pending jumps to 'pc' */
  int nk;  /* number of elements in 'k' */
  int np;  /* number of elements in 'p' */
  int firstlocal;  /* index of first local var (in Dyndata array) */
  short nlocvars;  /* number of elements in 'f->locvars' */
  lu_byte nactvar;  /* number of active local variables */
  lu_byte nups;  /* number of upvalues */
  lu_byte freereg;  /* first free register */
} FuncState;

各个层次的 Proto 数据是逐层包含的，因此最外层的全局 FuncState 结构体中的 Proto 数组一定有这个全局结构中所有 Proto 的信息，也就是解析完毕之后的命令信息。

而 luaY_parser 这个函数，它是分析阶段的唯一入口函数，这个函数的返回值就是 Proto 指针，而 FuncState 等数据结构仅是用于分析过程中的临时数据结构，它们最终都是为了解析代码生成命令到 Proto 结构体服务的。

Proto 结构体：

/*
** Function Prototypes
*/
typedef struct Proto {
  CommonHeader;
  lu_byte numparams;  /* number of fixed parameters */
  lu_byte is_vararg;
  lu_byte maxstacksize;  /* maximum stack used by this function */
  int sizeupvalues;  /* size of 'upvalues' */
  int sizek;  /* size of 'k' */
  int sizecode;
  int sizelineinfo;
  int sizep;  /* size of 'p' */
  int sizelocvars;
  int linedefined;
  int lastlinedefined;
  TValue *k;  /* constants used by the function 函数使用的常量（数组） */
  Instruction *code;	/* 保存分析之后生成的 OpCode 数组 */
  struct Proto **p;  /* functions defined inside the function 函数内定义的函数 */
  int *lineinfo;  /* map from opcodes to source lines (debug information) */
  LocVar *locvars;  /* information about local variables (debug information) 局部变量的数组（调试信息） */
  Upvaldesc *upvalues;  /* upvalue information 保存 upvalue 的数组 */
  struct LClosure *cache;  /* last created closure with this prototype */
  TString  *source;  /* used for debug information */
  GCObject *gclist;
} Proto;

命令格式：

lua 虚拟机的命令格式：

首先，Lua 的命令是 32 位的：

Opcode：操作数
A、B、C：参数

不同操作数的命令格式不同、含义不同。操作数是 6 位的，所以 Lua 最多支持 2^6-1 = 31 个命令。Lua 代码中，将每个操作数及其对应的命令格式都在 lopcodes.h 中的 OpCode 枚举类型中定义。

Lua 虚拟机命令中有着不同的参数：

格式	说明
R(A)	A参数作为寄存器索引，R(B)、R(C)依此类推
pc	进程计数器，这个数据用于指示当前命令的地址
Kst(n)	常量数组中的第n个数据
Upvalue(n)	upvalue 数组中的第 n 个数据
Gbl[sym]	全局符号表中取名为 sym 的数据
RK(B)	B 可能是寄存器索引，也可能是常量数组索引，RK(C)类似
sBx	有符号整数，用于表示跳转偏移量

在lopcodes.h 文档中还定义了每个命令的格式，以及相关的宏；这里定义了在一个命令中每个参数对应的大小和位置。

RK(B)的意思有两层，一是这个命令格式只可能作用在 OpCode 的 B、C 参数上，不会作用在参数 A 上；二是这个数据除了从函数栈获取之外，还有可能从常量数组中获取，关键在于宏 ISK 的判断。

1 2	/* test whether value is a constant */

判断这个数据的第 8 位是不是 1，如果是，则认为应该从 K 数组获取数据，否则就是从函数栈寄存器中获取数据。

从寄存器中取命令是在以 R 开头的宏中，实际代码中会使用一个 base 再加上对应的地址，base 值保存的是函数栈基址：


/* to be used after possible stack reallocation */
#define RB(i)	check_exp(getBMode(GET_OPCODE(i)) == OpArgR, base+GETARG_B(i))
#define RC(i)	check_exp(getCMode(GET_OPCODE(i)) == OpArgR, base+GETARG_C(i))

命令的执行：

命令执行的入口函数是 luaV_execute：

void luaV_execute (lua_State *L) {
  CallInfo *ci = L->ci;
  LClosure *cl;
  TValue *k;
  StkId base;
 newframe:  /* reentry point when frame changes (call/return) */
  lua_assert(ci == L->ci);
  cl = clLvalue(ci->func);
  k = cl->p->k;
  base = ci->u.l.base;
  /* main loop of interpreter */
  for (;;) {
    Instruction i = *(ci->u.l.savedpc++);
    StkId ra;
    if ((L->hookmask & (LUA_MASKLINE | LUA_MASKCOUNT)) &&
        (--L->hookcount == 0 || L->hookmask & LUA_MASKLINE)) {
      Protect(luaG_traceexec(L));
    }
    /* WARNING: several calls may realloc the stack and invalidate 'ra' */
    ra = RA(i);
    lua_assert(base == ci->u.l.base);
    lua_assert(base <= L->top && L->top < L->stack + L->stacksize);

ci：用于保存当前命令的执行位置（以及一些其他信息，包括动态调用链的双向链表）
cl：当前所在的函数环境
base：当前函数环境的栈 base 地址
k：当前函数环境的常量数组

总结：

分析阶段最后的结果就是Proto结构体。在这个结构体中，code成员用于存储命令，f数组用于保存里面嵌套的函数的Proto结构体。
每个环境都有自己对应的栈，base指向这个栈的基地址，top指向这个栈的栈顶地址。取函数栈内的数据，都是以base基地址为基础地址的操作。
在虚拟机开始执行命令之前，需要把对应的命令和栈地址切换到所要执行的函数的数据上。

参考：

《Lua设计与实现》（codedump 著）
Lua 源码解析

最新文章

归档