diff options
Diffstat (limited to 'doc/ext_profiler.html')
-rw-r--r-- | doc/ext_profiler.html | 361 |
1 files changed, 361 insertions, 0 deletions
diff --git a/doc/ext_profiler.html b/doc/ext_profiler.html new file mode 100644 index 00000000..6059b4ea --- /dev/null +++ b/doc/ext_profiler.html | |||
@@ -0,0 +1,361 @@ | |||
1 | <!DOCTYPE html> | ||
2 | <html> | ||
3 | <head> | ||
4 | <title>Profiler</title> | ||
5 | <meta charset="utf-8"> | ||
6 | <meta name="Copyright" content="Copyright (C) 2005-2022"> | ||
7 | <meta name="Language" content="en"> | ||
8 | <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen"> | ||
9 | <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print"> | ||
10 | </head> | ||
11 | <body> | ||
12 | <div id="site"> | ||
13 | <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a> | ||
14 | </div> | ||
15 | <div id="head"> | ||
16 | <h1>Profiler</h1> | ||
17 | </div> | ||
18 | <div id="nav"> | ||
19 | <ul><li> | ||
20 | <a href="luajit.html">LuaJIT</a> | ||
21 | <ul><li> | ||
22 | <a href="https://luajit.org/download.html">Download <span class="ext">»</span></a> | ||
23 | </li><li> | ||
24 | <a href="install.html">Installation</a> | ||
25 | </li><li> | ||
26 | <a href="running.html">Running</a> | ||
27 | </li></ul> | ||
28 | </li><li> | ||
29 | <a href="extensions.html">Extensions</a> | ||
30 | <ul><li> | ||
31 | <a href="ext_ffi.html">FFI Library</a> | ||
32 | <ul><li> | ||
33 | <a href="ext_ffi_tutorial.html">FFI Tutorial</a> | ||
34 | </li><li> | ||
35 | <a href="ext_ffi_api.html">ffi.* API</a> | ||
36 | </li><li> | ||
37 | <a href="ext_ffi_semantics.html">FFI Semantics</a> | ||
38 | </li></ul> | ||
39 | </li><li> | ||
40 | <a href="ext_buffer.html">String Buffers</a> | ||
41 | </li><li> | ||
42 | <a href="ext_jit.html">jit.* Library</a> | ||
43 | </li><li> | ||
44 | <a href="ext_c_api.html">Lua/C API</a> | ||
45 | </li><li> | ||
46 | <a class="current" href="ext_profiler.html">Profiler</a> | ||
47 | </li></ul> | ||
48 | </li><li> | ||
49 | <a href="status.html">Status</a> | ||
50 | </li><li> | ||
51 | <a href="faq.html">FAQ</a> | ||
52 | </li><li> | ||
53 | <a href="http://wiki.luajit.org/">Wiki <span class="ext">»</span></a> | ||
54 | </li><li> | ||
55 | <a href="https://luajit.org/list.html">Mailing List <span class="ext">»</span></a> | ||
56 | </li></ul> | ||
57 | </div> | ||
58 | <div id="main"> | ||
59 | <p> | ||
60 | LuaJIT has an integrated statistical profiler with very low overhead. It | ||
61 | allows sampling the currently executing stack and other parameters in | ||
62 | regular intervals. | ||
63 | </p> | ||
64 | <p> | ||
65 | The integrated profiler can be accessed from three levels: | ||
66 | </p> | ||
67 | <ul> | ||
68 | <li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the | ||
69 | <a href="#j_p"><tt>-jp</tt></a> command line option.</li> | ||
70 | <li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li> | ||
71 | <li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li> | ||
72 | </ul> | ||
73 | |||
74 | <h2 id="hl_profiler">High-Level Profiler</h2> | ||
75 | <p> | ||
76 | The bundled high-level profiler offers basic profiling functionality. It | ||
77 | generates simple textual summaries or source code annotations. It can be | ||
78 | accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option | ||
79 | or from Lua code by loading the underlying <tt>jit.p</tt> module. | ||
80 | </p> | ||
81 | <p> | ||
82 | To cut to the chase — run this to get a CPU usage profile by | ||
83 | function name: | ||
84 | </p> | ||
85 | <pre class="code"> | ||
86 | luajit -jp myapp.lua | ||
87 | </pre> | ||
88 | <p> | ||
89 | It's <em>not</em> a stated goal of the bundled profiler to add every | ||
90 | possible option or to cater for special profiling needs. The low-level | ||
91 | profiler APIs are documented below. They may be used by third-party | ||
92 | authors to implement advanced functionality, e.g. IDE integration or | ||
93 | graphical profilers. | ||
94 | </p> | ||
95 | <p> | ||
96 | Note: Sampling works for both interpreted and JIT-compiled code. The | ||
97 | results for JIT-compiled code may sometimes be surprising. LuaJIT | ||
98 | heavily optimizes and inlines Lua code — there's no simple | ||
99 | one-to-one correspondence between source code lines and the sampled | ||
100 | machine code. | ||
101 | </p> | ||
102 | |||
103 | <h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3> | ||
104 | <p> | ||
105 | The <tt>-jp</tt> command line option starts the high-level profiler. | ||
106 | When the application run by the command line terminates, the profiler | ||
107 | stops and writes the results to <tt>stdout</tt> or to the specified | ||
108 | <tt>output</tt> file. | ||
109 | </p> | ||
110 | <p> | ||
111 | The <tt>options</tt> argument specifies how the profiling is to be | ||
112 | performed: | ||
113 | </p> | ||
114 | <ul> | ||
115 | <li><tt>f</tt> — Stack dump: function name, otherwise module:line. | ||
116 | This is the default mode.</li> | ||
117 | <li><tt>F</tt> — Stack dump: ditto, but dump module:name.</li> | ||
118 | <li><tt>l</tt> — Stack dump: module:line.</li> | ||
119 | <li><tt><number></tt> — stack dump depth (callee ← | ||
120 | caller). Default: 1.</li> | ||
121 | <li><tt>-<number></tt> — Inverse stack dump depth (caller | ||
122 | → callee).</li> | ||
123 | <li><tt>s</tt> — Split stack dump after first stack level. Implies | ||
124 | depth ≥ 2 or depth ≤ -2.</li> | ||
125 | <li><tt>p</tt> — Show full path for module names.</li> | ||
126 | <li><tt>v</tt> — Show VM states.</li> | ||
127 | <li><tt>z</tt> — Show <a href="#jit_zone">zones</a>.</li> | ||
128 | <li><tt>r</tt> — Show raw sample counts. Default: show percentages.</li> | ||
129 | <li><tt>a</tt> — Annotate excerpts from source code files.</li> | ||
130 | <li><tt>A</tt> — Annotate complete source code files.</li> | ||
131 | <li><tt>G</tt> — Produce raw output suitable for graphical tools.</li> | ||
132 | <li><tt>m<number></tt> — Minimum sample percentage to be shown. | ||
133 | Default: 3%.</li> | ||
134 | <li><tt>i<number></tt> — Sampling interval in milliseconds. | ||
135 | Default: 10ms.<br> | ||
136 | Note: The actual sampling precision is OS-dependent.</li> | ||
137 | </ul> | ||
138 | <p> | ||
139 | The default output for <tt>-jp</tt> is a list of the most CPU consuming | ||
140 | spots in the application. Increasing the stack dump depth with (say) | ||
141 | <tt>-jp=2</tt> may help to point out the main callers or callees of | ||
142 | hotspots. But sample aggregation is still flat per unique stack dump. | ||
143 | </p> | ||
144 | <p> | ||
145 | To get a two-level view (split view) of callers/callees, use | ||
146 | <tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second | ||
147 | level are relative to the first level. | ||
148 | </p> | ||
149 | <p> | ||
150 | To see how much time is spent in each line relative to a function, use | ||
151 | <tt>-jp=fl</tt>. | ||
152 | </p> | ||
153 | <p> | ||
154 | To see how much time is spent in different VM states or | ||
155 | <a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>. | ||
156 | </p> | ||
157 | <p> | ||
158 | Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level | ||
159 | views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time | ||
160 | spent in a VM state or zone vs. hotspots. This can be used to answer | ||
161 | questions like "Which time consuming functions are only interpreted?" or | ||
162 | "What's the garbage collector overhead for a specific function?". | ||
163 | </p> | ||
164 | <p> | ||
165 | Multiple options can be combined — but not all combinations make | ||
166 | sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels | ||
167 | deep in 4ms intervals and shows a split view of the CPU consuming | ||
168 | functions and their callers with a 1% threshold. | ||
169 | </p> | ||
170 | <p> | ||
171 | Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are | ||
172 | always flat and at the line level. Obviously, the source code files need | ||
173 | to be readable by the profiler script. | ||
174 | </p> | ||
175 | <p> | ||
176 | The high-level profiler can also be started and stopped from Lua code with: | ||
177 | </p> | ||
178 | <pre class="code"> | ||
179 | require("jit.p").start(options, output) | ||
180 | ... | ||
181 | require("jit.p").stop() | ||
182 | </pre> | ||
183 | |||
184 | <h3 id="jit_zone"><tt>jit.zone</tt> — Zones</h3> | ||
185 | <p> | ||
186 | Zones can be used to provide information about different parts of an | ||
187 | application to the high-level profiler. E.g. a game could make use of an | ||
188 | <tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical, | ||
189 | organized as a stack. | ||
190 | </p> | ||
191 | <p> | ||
192 | The <tt>jit.zone</tt> module needs to be loaded explicitly: | ||
193 | </p> | ||
194 | <pre class="code"> | ||
195 | local zone = require("jit.zone") | ||
196 | </pre> | ||
197 | <ul> | ||
198 | <li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li> | ||
199 | <li><tt>zone()</tt> pops the current zone from the zone stack and | ||
200 | returns its name.</li> | ||
201 | <li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li> | ||
202 | <li><tt>zone:flush()</tt> flushes the zone stack.</li> | ||
203 | </ul> | ||
204 | <p> | ||
205 | To show the time spent in each zone use <tt>-jp=z</tt>. To show the time | ||
206 | spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>. | ||
207 | </p> | ||
208 | |||
209 | <h2 id="ll_lua_api">Low-level Lua API</h2> | ||
210 | <p> | ||
211 | The <tt>jit.profile</tt> module gives access to the low-level API of the | ||
212 | profiler from Lua code. This module needs to be loaded explicitly: | ||
213 | <pre class="code"> | ||
214 | local profile = require("jit.profile") | ||
215 | </pre> | ||
216 | <p> | ||
217 | This module can be used to implement your own higher-level profiler. | ||
218 | A typical profiling run starts the profiler, captures stack dumps in | ||
219 | the profiler callback, adds them to a hash table to aggregate the number | ||
220 | of samples, stops the profiler and then analyzes all of the captured | ||
221 | stack dumps. Other parameters can be sampled in the profiler callback, | ||
222 | too. But it's important not to spend too much time in the callback, | ||
223 | since this may skew the statistics. | ||
224 | </p> | ||
225 | |||
226 | <h3 id="profile_start"><tt>profile.start(mode, cb)</tt> | ||
227 | — Start profiler</h3> | ||
228 | <p> | ||
229 | This function starts the profiler. The <tt>mode</tt> argument is a | ||
230 | string holding options: | ||
231 | </p> | ||
232 | <ul> | ||
233 | <li><tt>f</tt> — Profile with precision down to the function level.</li> | ||
234 | <li><tt>l</tt> — Profile with precision down to the line level.</li> | ||
235 | <li><tt>i<number></tt> — Sampling interval in milliseconds (default | ||
236 | 10ms).</br> | ||
237 | Note: The actual sampling precision is OS-dependent. | ||
238 | </li> | ||
239 | </ul> | ||
240 | <p> | ||
241 | The <tt>cb</tt> argument is a callback function which is called with | ||
242 | three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is | ||
243 | called on a separate coroutine, the <tt>thread</tt> argument is the | ||
244 | state that holds the stack to sample for profiling. Note: do | ||
245 | <em>not</em> modify the stack of that state or call functions on it. | ||
246 | </p> | ||
247 | <p> | ||
248 | <tt>samples</tt> gives the number of accumulated samples since the last | ||
249 | callback (usually 1). | ||
250 | </p> | ||
251 | <p> | ||
252 | <tt>vmstate</tt> holds the VM state at the time the profiling timer | ||
253 | triggered. This may or may not correspond to the state of the VM when | ||
254 | the profiling callback is called. The state is either <tt>'N'</tt> | ||
255 | native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt> | ||
256 | C code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT | ||
257 | compiler. | ||
258 | </p> | ||
259 | |||
260 | <h3 id="profile_stop"><tt>profile.stop()</tt> | ||
261 | — Stop profiler</h3> | ||
262 | <p> | ||
263 | This function stops the profiler. | ||
264 | </p> | ||
265 | |||
266 | <h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt> | ||
267 | — Dump stack </h3> | ||
268 | <p> | ||
269 | This function allows taking stack dumps in an efficient manner. It | ||
270 | returns a string with a stack dump for the <tt>thread</tt> (coroutine), | ||
271 | formatted according to the <tt>fmt</tt> argument: | ||
272 | </p> | ||
273 | <ul> | ||
274 | <li><tt>p</tt> — Preserve the full path for module names. Otherwise | ||
275 | only the file name is used.</li> | ||
276 | <li><tt>f</tt> — Dump the function name if it can be derived. Otherwise | ||
277 | use module:line.</li> | ||
278 | <li><tt>F</tt> — Ditto, but dump module:name.</li> | ||
279 | <li><tt>l</tt> — Dump module:line.</li> | ||
280 | <li><tt>Z</tt> — Zap the following characters for the last dumped | ||
281 | frame.</li> | ||
282 | <li>All other characters are added verbatim to the output string.</li> | ||
283 | </ul> | ||
284 | <p> | ||
285 | The <tt>depth</tt> argument gives the number of frames to dump, starting | ||
286 | at the topmost frame of the thread. A negative number dumps the frames in | ||
287 | inverse order. | ||
288 | </p> | ||
289 | <p> | ||
290 | The first example prints a list of the current module names and line | ||
291 | numbers of up to 10 frames in separate lines. The second example prints | ||
292 | semicolon-separated function names for all frames (up to 100) in inverse | ||
293 | order: | ||
294 | </p> | ||
295 | <pre class="code"> | ||
296 | print(profile.dumpstack(thread, "l\n", 10)) | ||
297 | print(profile.dumpstack(thread, "lZ;", -100)) | ||
298 | </pre> | ||
299 | |||
300 | <h2 id="ll_c_api">Low-level C API</h2> | ||
301 | <p> | ||
302 | The profiler can be controlled directly from C code, e.g. for | ||
303 | use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see | ||
304 | <a href="ext_c_api.html">Lua/C API</a> extensions). | ||
305 | </p> | ||
306 | |||
307 | <h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt> | ||
308 | — Start profiler</h3> | ||
309 | <p> | ||
310 | This function starts the profiler. <a href="#profile_start">See | ||
311 | above</a> for a description of the <tt>mode</tt> argument. | ||
312 | </p> | ||
313 | <p> | ||
314 | The <tt>cb</tt> argument is a callback function with the following | ||
315 | declaration: | ||
316 | </p> | ||
317 | <pre class="code"> | ||
318 | typedef void (*luaJIT_profile_callback)(void *data, lua_State *L, | ||
319 | int samples, int vmstate); | ||
320 | </pre> | ||
321 | <p> | ||
322 | <tt>data</tt> is available for use by the callback. <tt>L</tt> is the | ||
323 | state that holds the stack to sample for profiling. Note: do | ||
324 | <em>not</em> modify this stack or call functions on this stack — | ||
325 | use a separate coroutine for this purpose. <a href="#profile_start">See | ||
326 | above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>. | ||
327 | </p> | ||
328 | |||
329 | <h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt> | ||
330 | — Stop profiler</h3> | ||
331 | <p> | ||
332 | This function stops the profiler. | ||
333 | </p> | ||
334 | |||
335 | <h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt> | ||
336 | — Dump stack </h3> | ||
337 | <p> | ||
338 | This function allows taking stack dumps in an efficient manner. | ||
339 | <a href="#profile_dump">See above</a> for a description of <tt>fmt</tt> | ||
340 | and <tt>depth</tt>. | ||
341 | </p> | ||
342 | <p> | ||
343 | This function returns a <tt>const char *</tt> pointing to a | ||
344 | private string buffer of the profiler. The <tt>int *len</tt> | ||
345 | argument returns the length of the output string. The buffer is | ||
346 | overwritten on the next call and deallocated when the profiler stops. | ||
347 | You either need to consume the content immediately or copy it for later | ||
348 | use. | ||
349 | </p> | ||
350 | <br class="flush"> | ||
351 | </div> | ||
352 | <div id="foot"> | ||
353 | <hr class="hide"> | ||
354 | Copyright © 2005-2022 | ||
355 | <span class="noprint"> | ||
356 | · | ||
357 | <a href="contact.html">Contact</a> | ||
358 | </span> | ||
359 | </div> | ||
360 | </body> | ||
361 | </html> | ||