


luajit 在不同平台下的表现


luajit 的jit 模式


luajit 的interpreter模式


luajit 的编译方式(trace compiler)

trace compiler的不稳定性,还有jit在平台上编译转换的时候,很多时候都是失败的


官方优化指导link 以下是个人译文(错误指出(Conerlius))

This is a mailing list post by Mike Pall.

从Mike Pall发来的邮件

The following things are needed to get the best speed out for numerical computations with LuaJIT (in order of importance): 需要以下几点可以让LuaJIT的数值计算获得最佳的执行速度(按重要性排序):

Reduce number of unbiased/unpredictable branches. 减少不确定/不可预测的分支代码

Heavily biased branches (>95% in one direction) are fine.

选择确定性分支(在一个分支树上大于95%偏向)优先判断 Prefer branch-free algorithms. 优先选择无分支算法

Use math.min() and math.max().

使用math.min() 和 math.max()

Use bit.*, e.g. for conditional index computations. 尽可能使用bitop,如:使用数组索引

      if 条件1 then
      elseif 条件2 

Use FFI data structures. 使用FFI数据结构(项目目前没有用,以后再说吧,没时间)

Use int32_t, avoid uint32_t data types.

使用int32_t,避免使用 uint32_t data

Use double, avoid float data types. 使用double, 避免使用 float

Metamethods are fine, but don’t overdose them. 元方法不要过量地使用

Call C functions only via the FFI. 尽可能用ffi来调用c函数

Avoid calling trivial functions, better rewrite them in Lua.


Avoid callbacks – use pull-style APIs (read/write) and iterators instead. 避免使用有返参数的调用。建议使用拉取式api或迭代器

Use plain ‘for i=start,stop,step do … end’ loops. 实现循环时,最好使用简单的for i = start, stop, step do这样的写法,或者使用ipairs,而尽量避免使用for k,v in pairs(x) do

Prefer plain array indexing, e.g. ‘a[i+2]’.


Avoid pointer arithmetic. 避免指针运算

Find the right balance for unrolling. 找到展开式的合适平衡点

Avoid inner loops with low iteration count (< 10).


Only unroll loops if the loop body has not too many instructions. 只展开那些循环体内结构不是太复杂的函数

Consider using templates instead of hand-unrolling (see GSL Shell). 考虑使用模板而不是手动展开(请参阅GSL Shell)。

You may have to experiment a bit. 你可能需要尝试一下。(官方这说法也是够操蛋的了) Define and call only ‘local’ (!) functions within a module. local function 最好只在模块内定义和使用

Cache often-used functions from other modules in upvalues. 缓存经常使用的、来自其他模块的方法

E.g. local sin = math.sin … local function foo() return 2*sin(x) end

比如 local sin = math.sin … local function foo() return 2*sin(x) end

Don’t do this for FFI C functions, cache the namespace instead, e.g. local lib = ffi.load(“lib”). FFI的C方法就不要如此做了,缓存其名称空间就可以了

Avoid inventing your own dispatch mechanisms. 避免使用你自己实现的分发调用机制

Prefer to use built-in mechanisms, e.g. metamethods.


Do not try to second-guess the JIT compiler. 无需过多去帮jit编译器做手工优化

It’s perfectly ok to write ‘z = x[a+b] + y[a+b]’.

‘z = x[a+b] + y[a+b]’这种写法是完全ok的

Do not try CSE (Common Subexpression Elimination) by hand, e.g. ‘local c = a+b’.


It may become detrimental if the lifetime of the temporary is longer than needed. If the compiler cannot deduce that it’s dead, then the useless temporary will block a register or stack slot and/or it needs to be stored to the Lua stack. 如果临时变量的生命周期超出作用域,则可能会变得更坏。如果编译器不能推断变量已经无用了,那么无用的临时变量将阻塞寄存器或堆栈或把临时变量存储到Lua堆栈。

Duplicate expression involving basic arithmetic operators that are relatively close to each other (and likely in the same trace) should not be manually CSEd. Loads only need to be manually hoisted, if alias analysis is likely to fail. 涉及相似的基本算术运算符的重复表达式尽可能不要手动去消除子表达式。当分析可能失败了,才需手动提升负载。

It’s perfectly ok to write ‘a[i][j] = a[i][j] * a[i][j+1]’. 比较好的表达写法是 a[i][j] = a[i][j] * a[i][j+1]

Do not try to cache partial FFI struct/array references (e.g. a[i]) unless they are long-lived (e.g. in a big loop).


There are quite a few “easy” optimizations where the compiler is in a better position to perform them. Better focus on the difficult things, like algorithmic improvements. 有很多“简单”的优化,编译器可以更好地执行。最好是多关注复杂的事情,如算法改进。

Be careful with aliasing, esp. when using multiple arrays. 变量的别名可能会阻止jit优化掉子表达式,尤其是在使用多个数组的时候

LuaJIT uses strict type-based disambiguation, but there are limits to this due to C99 conformance.


E.g. in ‘x[i] = a[i] + c[i]; y[i] = a[i] + d[i]’ the load of a[i] needs to be done twice, because x could alias a. It does make sense to use ‘do local t = a[i] … end’ here. 比如在x[i] = a[i] + c[i]; y[i] = a[i] + d[i]a[i]执行了两次,因为x可能是a的别名。在这里使用do local t = a[i] ... end是有一定道理的

Reduce the number of live temporary variables. 减少存活着的临时变量的数量

Best to initialize on definition, e.g. ‘local y = f(x)’

最好在初始化的时候去定义。比如local y = f(x)

Yes, this means you should interleave this with other code.


Do not hoist variable definitions to the start of a function – Lua is not JavaScript nor K&R C.


Use ‘do local y = f(x) … end’ to bound variable lifetimes. 使用 do local y = f(x) ... end 去控制变量的生命周期

Do not intersperse expensive or uncompiled operations. 减少使用高消耗或者不支持jit的操作

print() is not compiled, use io.write().

print() 还没有完成,建议使用io.write。

E.g. avoid assert(type(x) == “number”, “x is a “..mytype(x)”) 比如。避免assert(type(x) == "number", "x is a "..mytype(x)")

The problem is not the assert() or the condition (basically free). The problem is the string concatenation, which has to be executed every time, even if the assertion never fails!


Watch the output of -jv and -jdump. 关注-jv和-jdump的输出

You need to take all of these factors into account before deciding on a certain algorithm. An advanced algorithm, that’s fast in theory, may be slower than a simpler algorithm, if the simpler algorithm has much fewer unbiased branches. 在决定算法之前,需要考虑所有的因素。理论上高级的算法速度更快,但实际上有可能币简单算法更慢,如果简单算法具有较少的分歧分支的话。
