aboutsummaryrefslogtreecommitdiff
path: root/doc/ext_profiler.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/ext_profiler.html')
-rw-r--r--doc/ext_profiler.html364
1 files changed, 364 insertions, 0 deletions
diff --git a/doc/ext_profiler.html b/doc/ext_profiler.html
new file mode 100644
index 00000000..b778cda4
--- /dev/null
+++ b/doc/ext_profiler.html
@@ -0,0 +1,364 @@
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
2<html>
3<head>
4<title>Profiler</title>
5<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
6<meta name="Copyright" content="Copyright (C) 2005-2020">
7<meta name="Language" content="en">
8<link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
9<link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
10</head>
11<body>
12<div id="site">
13<a href="http://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
14</div>
15<div id="head">
16<h1>Profiler</h1>
17</div>
18<div id="nav">
19<ul><li>
20<a href="luajit.html">LuaJIT</a>
21<ul><li>
22<a href="http://luajit.org/download.html">Download <span class="ext">&raquo;</span></a>
23</li><li>
24<a href="install.html">Installation</a>
25</li><li>
26<a href="running.html">Running</a>
27</li></ul>
28</li><li>
29<a href="extensions.html">Extensions</a>
30<ul><li>
31<a href="ext_ffi.html">FFI Library</a>
32<ul><li>
33<a href="ext_ffi_tutorial.html">FFI Tutorial</a>
34</li><li>
35<a href="ext_ffi_api.html">ffi.* API</a>
36</li><li>
37<a href="ext_ffi_semantics.html">FFI Semantics</a>
38</li></ul>
39</li><li>
40<a href="ext_jit.html">jit.* Library</a>
41</li><li>
42<a href="ext_c_api.html">Lua/C API</a>
43</li><li>
44<a class="current" href="ext_profiler.html">Profiler</a>
45</li></ul>
46</li><li>
47<a href="status.html">Status</a>
48<ul><li>
49<a href="changes.html">Changes</a>
50</li></ul>
51</li><li>
52<a href="faq.html">FAQ</a>
53</li><li>
54<a href="http://luajit.org/performance.html">Performance <span class="ext">&raquo;</span></a>
55</li><li>
56<a href="http://wiki.luajit.org/">Wiki <span class="ext">&raquo;</span></a>
57</li><li>
58<a href="http://luajit.org/list.html">Mailing List <span class="ext">&raquo;</span></a>
59</li></ul>
60</div>
61<div id="main">
62<p>
63LuaJIT has an integrated statistical profiler with very low overhead. It
64allows sampling the currently executing stack and other parameters in
65regular intervals.
66</p>
67<p>
68The integrated profiler can be accessed from three levels:
69</p>
70<ul>
71<li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the
72<a href="#j_p"><tt>-jp</tt></a> command line option.</li>
73<li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li>
74<li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li>
75</ul>
76
77<h2 id="hl_profiler">High-Level Profiler</h2>
78<p>
79The bundled high-level profiler offers basic profiling functionality. It
80generates simple textual summaries or source code annotations. It can be
81accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option
82or from Lua code by loading the underlying <tt>jit.p</tt> module.
83</p>
84<p>
85To cut to the chase &mdash; run this to get a CPU usage profile by
86function name:
87</p>
88<pre class="code">
89luajit -jp myapp.lua
90</pre>
91<p>
92It's <em>not</em> a stated goal of the bundled profiler to add every
93possible option or to cater for special profiling needs. The low-level
94profiler APIs are documented below. They may be used by third-party
95authors to implement advanced functionality, e.g. IDE integration or
96graphical profilers.
97</p>
98<p>
99Note: Sampling works for both interpreted and JIT-compiled code. The
100results for JIT-compiled code may sometimes be surprising. LuaJIT
101heavily optimizes and inlines Lua code &mdash; there's no simple
102one-to-one correspondence between source code lines and the sampled
103machine code.
104</p>
105
106<h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3>
107<p>
108The <tt>-jp</tt> command line option starts the high-level profiler.
109When the application run by the command line terminates, the profiler
110stops and writes the results to <tt>stdout</tt> or to the specified
111<tt>output</tt> file.
112</p>
113<p>
114The <tt>options</tt> argument specifies how the profiling is to be
115performed:
116</p>
117<ul>
118<li><tt>f</tt> &mdash; Stack dump: function name, otherwise module:line.
119This is the default mode.</li>
120<li><tt>F</tt> &mdash; Stack dump: ditto, but dump module:name.</li>
121<li><tt>l</tt> &mdash; Stack dump: module:line.</li>
122<li><tt>&lt;number&gt;</tt> &mdash; stack dump depth (callee &larr;
123caller). Default: 1.</li>
124<li><tt>-&lt;number&gt;</tt> &mdash; Inverse stack dump depth (caller
125&rarr; callee).</li>
126<li><tt>s</tt> &mdash; Split stack dump after first stack level. Implies
127depth&nbsp;&ge;&nbsp;2 or depth&nbsp;&le;&nbsp;-2.</li>
128<li><tt>p</tt> &mdash; Show full path for module names.</li>
129<li><tt>v</tt> &mdash; Show VM states.</li>
130<li><tt>z</tt> &mdash; Show <a href="#jit_zone">zones</a>.</li>
131<li><tt>r</tt> &mdash; Show raw sample counts. Default: show percentages.</li>
132<li><tt>a</tt> &mdash; Annotate excerpts from source code files.</li>
133<li><tt>A</tt> &mdash; Annotate complete source code files.</li>
134<li><tt>G</tt> &mdash; Produce raw output suitable for graphical tools.</li>
135<li><tt>m&lt;number&gt;</tt> &mdash; Minimum sample percentage to be shown.
136Default: 3%.</li>
137<li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds.
138Default: 10ms.<br>
139Note: The actual sampling precision is OS-dependent.</li>
140</ul>
141<p>
142The default output for <tt>-jp</tt> is a list of the most CPU consuming
143spots in the application. Increasing the stack dump depth with (say)
144<tt>-jp=2</tt> may help to point out the main callers or callees of
145hotspots. But sample aggregation is still flat per unique stack dump.
146</p>
147<p>
148To get a two-level view (split view) of callers/callees, use
149<tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second
150level are relative to the first level.
151</p>
152<p>
153To see how much time is spent in each line relative to a function, use
154<tt>-jp=fl</tt>.
155</p>
156<p>
157To see how much time is spent in different VM states or
158<a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>.
159</p>
160<p>
161Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level
162views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time
163spent in a VM state or zone vs. hotspots. This can be used to answer
164questions like "Which time consuming functions are only interpreted?" or
165"What's the garbage collector overhead for a specific function?".
166</p>
167<p>
168Multiple options can be combined &mdash; but not all combinations make
169sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels
170deep in 4ms intervals and shows a split view of the CPU consuming
171functions and their callers with a 1% threshold.
172</p>
173<p>
174Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are
175always flat and at the line level. Obviously, the source code files need
176to be readable by the profiler script.
177</p>
178<p>
179The high-level profiler can also be started and stopped from Lua code with:
180</p>
181<pre class="code">
182require("jit.p").start(options, output)
183...
184require("jit.p").stop()
185</pre>
186
187<h3 id="jit_zone"><tt>jit.zone</tt> &mdash; Zones</h3>
188<p>
189Zones can be used to provide information about different parts of an
190application to the high-level profiler. E.g. a game could make use of an
191<tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical,
192organized as a stack.
193</p>
194<p>
195The <tt>jit.zone</tt> module needs to be loaded explicitly:
196</p>
197<pre class="code">
198local zone = require("jit.zone")
199</pre>
200<ul>
201<li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li>
202<li><tt>zone()</tt> pops the current zone from the zone stack and
203returns its name.</li>
204<li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li>
205<li><tt>zone:flush()</tt> flushes the zone stack.</li>
206</ul>
207<p>
208To show the time spent in each zone use <tt>-jp=z</tt>. To show the time
209spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>.
210</p>
211
212<h2 id="ll_lua_api">Low-level Lua API</h2>
213<p>
214The <tt>jit.profile</tt> module gives access to the low-level API of the
215profiler from Lua code. This module needs to be loaded explicitly:
216<pre class="code">
217local profile = require("jit.profile")
218</pre>
219<p>
220This module can be used to implement your own higher-level profiler.
221A typical profiling run starts the profiler, captures stack dumps in
222the profiler callback, adds them to a hash table to aggregate the number
223of samples, stops the profiler and then analyzes all of the captured
224stack dumps. Other parameters can be sampled in the profiler callback,
225too. But it's important not to spend too much time in the callback,
226since this may skew the statistics.
227</p>
228
229<h3 id="profile_start"><tt>profile.start(mode, cb)</tt>
230&mdash; Start profiler</h3>
231<p>
232This function starts the profiler. The <tt>mode</tt> argument is a
233string holding options:
234</p>
235<ul>
236<li><tt>f</tt> &mdash; Profile with precision down to the function level.</li>
237<li><tt>l</tt> &mdash; Profile with precision down to the line level.</li>
238<li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds (default
23910ms).</br>
240Note: The actual sampling precision is OS-dependent.
241</li>
242</ul>
243<p>
244The <tt>cb</tt> argument is a callback function which is called with
245three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is
246called on a separate coroutine, the <tt>thread</tt> argument is the
247state that holds the stack to sample for profiling. Note: do
248<em>not</em> modify the stack of that state or call functions on it.
249</p>
250<p>
251<tt>samples</tt> gives the number of accumulated samples since the last
252callback (usually 1).
253</p>
254<p>
255<tt>vmstate</tt> holds the VM state at the time the profiling timer
256triggered. This may or may not correspond to the state of the VM when
257the profiling callback is called. The state is either <tt>'N'</tt>
258native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt>
259C&nbsp;code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT
260compiler.
261</p>
262
263<h3 id="profile_stop"><tt>profile.stop()</tt>
264&mdash; Stop profiler</h3>
265<p>
266This function stops the profiler.
267</p>
268
269<h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt>
270&mdash; Dump stack </h3>
271<p>
272This function allows taking stack dumps in an efficient manner. It
273returns a string with a stack dump for the <tt>thread</tt> (coroutine),
274formatted according to the <tt>fmt</tt> argument:
275</p>
276<ul>
277<li><tt>p</tt> &mdash; Preserve the full path for module names. Otherwise
278only the file name is used.</li>
279<li><tt>f</tt> &mdash; Dump the function name if it can be derived. Otherwise
280use module:line.</li>
281<li><tt>F</tt> &mdash; Ditto, but dump module:name.</li>
282<li><tt>l</tt> &mdash; Dump module:line.</li>
283<li><tt>Z</tt> &mdash; Zap the following characters for the last dumped
284frame.</li>
285<li>All other characters are added verbatim to the output string.</li>
286</ul>
287<p>
288The <tt>depth</tt> argument gives the number of frames to dump, starting
289at the topmost frame of the thread. A negative number dumps the frames in
290inverse order.
291</p>
292<p>
293The first example prints a list of the current module names and line
294numbers of up to 10 frames in separate lines. The second example prints
295semicolon-separated function names for all frames (up to 100) in inverse
296order:
297</p>
298<pre class="code">
299print(profile.dumpstack(thread, "l\n", 10))
300print(profile.dumpstack(thread, "lZ;", -100))
301</pre>
302
303<h2 id="ll_c_api">Low-level C API</h2>
304<p>
305The profiler can be controlled directly from C&nbsp;code, e.g. for
306use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see
307<a href="ext_c_api.html">Lua/C API</a> extensions).
308</p>
309
310<h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt>
311&mdash; Start profiler</h3>
312<p>
313This function starts the profiler. <a href="#profile_start">See
314above</a> for a description of the <tt>mode</tt> argument.
315</p>
316<p>
317The <tt>cb</tt> argument is a callback function with the following
318declaration:
319</p>
320<pre class="code">
321typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
322 int samples, int vmstate);
323</pre>
324<p>
325<tt>data</tt> is available for use by the callback. <tt>L</tt> is the
326state that holds the stack to sample for profiling. Note: do
327<em>not</em> modify this stack or call functions on this stack &mdash;
328use a separate coroutine for this purpose. <a href="#profile_start">See
329above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>.
330</p>
331
332<h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt>
333&mdash; Stop profiler</h3>
334<p>
335This function stops the profiler.
336</p>
337
338<h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt>
339&mdash; Dump stack </h3>
340<p>
341This function allows taking stack dumps in an efficient manner.
342<a href="#profile_dump">See above</a> for a description of <tt>fmt</tt>
343and <tt>depth</tt>.
344</p>
345<p>
346This function returns a <tt>const&nbsp;char&nbsp;*</tt> pointing to a
347private string buffer of the profiler. The <tt>int&nbsp;*len</tt>
348argument returns the length of the output string. The buffer is
349overwritten on the next call and deallocated when the profiler stops.
350You either need to consume the content immediately or copy it for later
351use.
352</p>
353<br class="flush">
354</div>
355<div id="foot">
356<hr class="hide">
357Copyright &copy; 2005-2020
358<span class="noprint">
359&middot;
360<a href="contact.html">Contact</a>
361</span>
362</div>
363</body>
364</html>