aboutsummaryrefslogtreecommitdiff
path: root/doc/ext_profiler.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/ext_profiler.html')
-rw-r--r--doc/ext_profiler.html361
1 files changed, 361 insertions, 0 deletions
diff --git a/doc/ext_profiler.html b/doc/ext_profiler.html
new file mode 100644
index 00000000..6059b4ea
--- /dev/null
+++ b/doc/ext_profiler.html
@@ -0,0 +1,361 @@
1<!DOCTYPE html>
2<html>
3<head>
4<title>Profiler</title>
5<meta charset="utf-8">
6<meta name="Copyright" content="Copyright (C) 2005-2022">
7<meta name="Language" content="en">
8<link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
9<link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
10</head>
11<body>
12<div id="site">
13<a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
14</div>
15<div id="head">
16<h1>Profiler</h1>
17</div>
18<div id="nav">
19<ul><li>
20<a href="luajit.html">LuaJIT</a>
21<ul><li>
22<a href="https://luajit.org/download.html">Download <span class="ext">&raquo;</span></a>
23</li><li>
24<a href="install.html">Installation</a>
25</li><li>
26<a href="running.html">Running</a>
27</li></ul>
28</li><li>
29<a href="extensions.html">Extensions</a>
30<ul><li>
31<a href="ext_ffi.html">FFI Library</a>
32<ul><li>
33<a href="ext_ffi_tutorial.html">FFI Tutorial</a>
34</li><li>
35<a href="ext_ffi_api.html">ffi.* API</a>
36</li><li>
37<a href="ext_ffi_semantics.html">FFI Semantics</a>
38</li></ul>
39</li><li>
40<a href="ext_buffer.html">String Buffers</a>
41</li><li>
42<a href="ext_jit.html">jit.* Library</a>
43</li><li>
44<a href="ext_c_api.html">Lua/C API</a>
45</li><li>
46<a class="current" href="ext_profiler.html">Profiler</a>
47</li></ul>
48</li><li>
49<a href="status.html">Status</a>
50</li><li>
51<a href="faq.html">FAQ</a>
52</li><li>
53<a href="http://wiki.luajit.org/">Wiki <span class="ext">&raquo;</span></a>
54</li><li>
55<a href="https://luajit.org/list.html">Mailing List <span class="ext">&raquo;</span></a>
56</li></ul>
57</div>
58<div id="main">
59<p>
60LuaJIT has an integrated statistical profiler with very low overhead. It
61allows sampling the currently executing stack and other parameters in
62regular intervals.
63</p>
64<p>
65The integrated profiler can be accessed from three levels:
66</p>
67<ul>
68<li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the
69<a href="#j_p"><tt>-jp</tt></a> command line option.</li>
70<li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li>
71<li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li>
72</ul>
73
74<h2 id="hl_profiler">High-Level Profiler</h2>
75<p>
76The bundled high-level profiler offers basic profiling functionality. It
77generates simple textual summaries or source code annotations. It can be
78accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option
79or from Lua code by loading the underlying <tt>jit.p</tt> module.
80</p>
81<p>
82To cut to the chase &mdash; run this to get a CPU usage profile by
83function name:
84</p>
85<pre class="code">
86luajit -jp myapp.lua
87</pre>
88<p>
89It's <em>not</em> a stated goal of the bundled profiler to add every
90possible option or to cater for special profiling needs. The low-level
91profiler APIs are documented below. They may be used by third-party
92authors to implement advanced functionality, e.g. IDE integration or
93graphical profilers.
94</p>
95<p>
96Note: Sampling works for both interpreted and JIT-compiled code. The
97results for JIT-compiled code may sometimes be surprising. LuaJIT
98heavily optimizes and inlines Lua code &mdash; there's no simple
99one-to-one correspondence between source code lines and the sampled
100machine code.
101</p>
102
103<h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3>
104<p>
105The <tt>-jp</tt> command line option starts the high-level profiler.
106When the application run by the command line terminates, the profiler
107stops and writes the results to <tt>stdout</tt> or to the specified
108<tt>output</tt> file.
109</p>
110<p>
111The <tt>options</tt> argument specifies how the profiling is to be
112performed:
113</p>
114<ul>
115<li><tt>f</tt> &mdash; Stack dump: function name, otherwise module:line.
116This is the default mode.</li>
117<li><tt>F</tt> &mdash; Stack dump: ditto, but dump module:name.</li>
118<li><tt>l</tt> &mdash; Stack dump: module:line.</li>
119<li><tt>&lt;number&gt;</tt> &mdash; stack dump depth (callee &larr;
120caller). Default: 1.</li>
121<li><tt>-&lt;number&gt;</tt> &mdash; Inverse stack dump depth (caller
122&rarr; callee).</li>
123<li><tt>s</tt> &mdash; Split stack dump after first stack level. Implies
124depth&nbsp;&ge;&nbsp;2 or depth&nbsp;&le;&nbsp;-2.</li>
125<li><tt>p</tt> &mdash; Show full path for module names.</li>
126<li><tt>v</tt> &mdash; Show VM states.</li>
127<li><tt>z</tt> &mdash; Show <a href="#jit_zone">zones</a>.</li>
128<li><tt>r</tt> &mdash; Show raw sample counts. Default: show percentages.</li>
129<li><tt>a</tt> &mdash; Annotate excerpts from source code files.</li>
130<li><tt>A</tt> &mdash; Annotate complete source code files.</li>
131<li><tt>G</tt> &mdash; Produce raw output suitable for graphical tools.</li>
132<li><tt>m&lt;number&gt;</tt> &mdash; Minimum sample percentage to be shown.
133Default: 3%.</li>
134<li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds.
135Default: 10ms.<br>
136Note: The actual sampling precision is OS-dependent.</li>
137</ul>
138<p>
139The default output for <tt>-jp</tt> is a list of the most CPU consuming
140spots in the application. Increasing the stack dump depth with (say)
141<tt>-jp=2</tt> may help to point out the main callers or callees of
142hotspots. But sample aggregation is still flat per unique stack dump.
143</p>
144<p>
145To get a two-level view (split view) of callers/callees, use
146<tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second
147level are relative to the first level.
148</p>
149<p>
150To see how much time is spent in each line relative to a function, use
151<tt>-jp=fl</tt>.
152</p>
153<p>
154To see how much time is spent in different VM states or
155<a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>.
156</p>
157<p>
158Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level
159views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time
160spent in a VM state or zone vs. hotspots. This can be used to answer
161questions like "Which time consuming functions are only interpreted?" or
162"What's the garbage collector overhead for a specific function?".
163</p>
164<p>
165Multiple options can be combined &mdash; but not all combinations make
166sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels
167deep in 4ms intervals and shows a split view of the CPU consuming
168functions and their callers with a 1% threshold.
169</p>
170<p>
171Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are
172always flat and at the line level. Obviously, the source code files need
173to be readable by the profiler script.
174</p>
175<p>
176The high-level profiler can also be started and stopped from Lua code with:
177</p>
178<pre class="code">
179require("jit.p").start(options, output)
180...
181require("jit.p").stop()
182</pre>
183
184<h3 id="jit_zone"><tt>jit.zone</tt> &mdash; Zones</h3>
185<p>
186Zones can be used to provide information about different parts of an
187application to the high-level profiler. E.g. a game could make use of an
188<tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical,
189organized as a stack.
190</p>
191<p>
192The <tt>jit.zone</tt> module needs to be loaded explicitly:
193</p>
194<pre class="code">
195local zone = require("jit.zone")
196</pre>
197<ul>
198<li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li>
199<li><tt>zone()</tt> pops the current zone from the zone stack and
200returns its name.</li>
201<li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li>
202<li><tt>zone:flush()</tt> flushes the zone stack.</li>
203</ul>
204<p>
205To show the time spent in each zone use <tt>-jp=z</tt>. To show the time
206spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>.
207</p>
208
209<h2 id="ll_lua_api">Low-level Lua API</h2>
210<p>
211The <tt>jit.profile</tt> module gives access to the low-level API of the
212profiler from Lua code. This module needs to be loaded explicitly:
213<pre class="code">
214local profile = require("jit.profile")
215</pre>
216<p>
217This module can be used to implement your own higher-level profiler.
218A typical profiling run starts the profiler, captures stack dumps in
219the profiler callback, adds them to a hash table to aggregate the number
220of samples, stops the profiler and then analyzes all of the captured
221stack dumps. Other parameters can be sampled in the profiler callback,
222too. But it's important not to spend too much time in the callback,
223since this may skew the statistics.
224</p>
225
226<h3 id="profile_start"><tt>profile.start(mode, cb)</tt>
227&mdash; Start profiler</h3>
228<p>
229This function starts the profiler. The <tt>mode</tt> argument is a
230string holding options:
231</p>
232<ul>
233<li><tt>f</tt> &mdash; Profile with precision down to the function level.</li>
234<li><tt>l</tt> &mdash; Profile with precision down to the line level.</li>
235<li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds (default
23610ms).</br>
237Note: The actual sampling precision is OS-dependent.
238</li>
239</ul>
240<p>
241The <tt>cb</tt> argument is a callback function which is called with
242three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is
243called on a separate coroutine, the <tt>thread</tt> argument is the
244state that holds the stack to sample for profiling. Note: do
245<em>not</em> modify the stack of that state or call functions on it.
246</p>
247<p>
248<tt>samples</tt> gives the number of accumulated samples since the last
249callback (usually 1).
250</p>
251<p>
252<tt>vmstate</tt> holds the VM state at the time the profiling timer
253triggered. This may or may not correspond to the state of the VM when
254the profiling callback is called. The state is either <tt>'N'</tt>
255native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt>
256C&nbsp;code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT
257compiler.
258</p>
259
260<h3 id="profile_stop"><tt>profile.stop()</tt>
261&mdash; Stop profiler</h3>
262<p>
263This function stops the profiler.
264</p>
265
266<h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt>
267&mdash; Dump stack </h3>
268<p>
269This function allows taking stack dumps in an efficient manner. It
270returns a string with a stack dump for the <tt>thread</tt> (coroutine),
271formatted according to the <tt>fmt</tt> argument:
272</p>
273<ul>
274<li><tt>p</tt> &mdash; Preserve the full path for module names. Otherwise
275only the file name is used.</li>
276<li><tt>f</tt> &mdash; Dump the function name if it can be derived. Otherwise
277use module:line.</li>
278<li><tt>F</tt> &mdash; Ditto, but dump module:name.</li>
279<li><tt>l</tt> &mdash; Dump module:line.</li>
280<li><tt>Z</tt> &mdash; Zap the following characters for the last dumped
281frame.</li>
282<li>All other characters are added verbatim to the output string.</li>
283</ul>
284<p>
285The <tt>depth</tt> argument gives the number of frames to dump, starting
286at the topmost frame of the thread. A negative number dumps the frames in
287inverse order.
288</p>
289<p>
290The first example prints a list of the current module names and line
291numbers of up to 10 frames in separate lines. The second example prints
292semicolon-separated function names for all frames (up to 100) in inverse
293order:
294</p>
295<pre class="code">
296print(profile.dumpstack(thread, "l\n", 10))
297print(profile.dumpstack(thread, "lZ;", -100))
298</pre>
299
300<h2 id="ll_c_api">Low-level C API</h2>
301<p>
302The profiler can be controlled directly from C&nbsp;code, e.g. for
303use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see
304<a href="ext_c_api.html">Lua/C API</a> extensions).
305</p>
306
307<h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt>
308&mdash; Start profiler</h3>
309<p>
310This function starts the profiler. <a href="#profile_start">See
311above</a> for a description of the <tt>mode</tt> argument.
312</p>
313<p>
314The <tt>cb</tt> argument is a callback function with the following
315declaration:
316</p>
317<pre class="code">
318typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
319 int samples, int vmstate);
320</pre>
321<p>
322<tt>data</tt> is available for use by the callback. <tt>L</tt> is the
323state that holds the stack to sample for profiling. Note: do
324<em>not</em> modify this stack or call functions on this stack &mdash;
325use a separate coroutine for this purpose. <a href="#profile_start">See
326above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>.
327</p>
328
329<h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt>
330&mdash; Stop profiler</h3>
331<p>
332This function stops the profiler.
333</p>
334
335<h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt>
336&mdash; Dump stack </h3>
337<p>
338This function allows taking stack dumps in an efficient manner.
339<a href="#profile_dump">See above</a> for a description of <tt>fmt</tt>
340and <tt>depth</tt>.
341</p>
342<p>
343This function returns a <tt>const&nbsp;char&nbsp;*</tt> pointing to a
344private string buffer of the profiler. The <tt>int&nbsp;*len</tt>
345argument returns the length of the output string. The buffer is
346overwritten on the next call and deallocated when the profiler stops.
347You either need to consume the content immediately or copy it for later
348use.
349</p>
350<br class="flush">
351</div>
352<div id="foot">
353<hr class="hide">
354Copyright &copy; 2005-2022
355<span class="noprint">
356&middot;
357<a href="contact.html">Contact</a>
358</span>
359</div>
360</body>
361</html>