This loop is now roughly 1000x faster than the Lua interpreter:
local function f(a,b,...) end; for i=1,2e8 do f(1,2,i) end
Yet another silly microbenchmark -- I know.
Add 64 bit lightuserdata handling. Keep the tagged 64 bit value.
Allocate/save/restore 64 bit spill slots for 64 bit lightuserdata.
Fix code generation for 64 bit loads/stores/moves/compares.
Fix code generation for stack pointer adjustments.
Add fixed spill slot definitions for x64. Reduce reserved spill slots.
Disable STRREF + ADD fusion in 64 bit mode (avoid negative 32 bit ofs).