diff options
Diffstat (limited to 'doc/ext_profiler.html')
-rw-r--r-- | doc/ext_profiler.html | 364 |
1 files changed, 364 insertions, 0 deletions
diff --git a/doc/ext_profiler.html b/doc/ext_profiler.html new file mode 100644 index 00000000..b778cda4 --- /dev/null +++ b/doc/ext_profiler.html | |||
@@ -0,0 +1,364 @@ | |||
1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> | ||
2 | <html> | ||
3 | <head> | ||
4 | <title>Profiler</title> | ||
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | ||
6 | <meta name="Copyright" content="Copyright (C) 2005-2020"> | ||
7 | <meta name="Language" content="en"> | ||
8 | <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen"> | ||
9 | <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print"> | ||
10 | </head> | ||
11 | <body> | ||
12 | <div id="site"> | ||
13 | <a href="http://luajit.org"><span>Lua<span id="logo">JIT</span></span></a> | ||
14 | </div> | ||
15 | <div id="head"> | ||
16 | <h1>Profiler</h1> | ||
17 | </div> | ||
18 | <div id="nav"> | ||
19 | <ul><li> | ||
20 | <a href="luajit.html">LuaJIT</a> | ||
21 | <ul><li> | ||
22 | <a href="http://luajit.org/download.html">Download <span class="ext">»</span></a> | ||
23 | </li><li> | ||
24 | <a href="install.html">Installation</a> | ||
25 | </li><li> | ||
26 | <a href="running.html">Running</a> | ||
27 | </li></ul> | ||
28 | </li><li> | ||
29 | <a href="extensions.html">Extensions</a> | ||
30 | <ul><li> | ||
31 | <a href="ext_ffi.html">FFI Library</a> | ||
32 | <ul><li> | ||
33 | <a href="ext_ffi_tutorial.html">FFI Tutorial</a> | ||
34 | </li><li> | ||
35 | <a href="ext_ffi_api.html">ffi.* API</a> | ||
36 | </li><li> | ||
37 | <a href="ext_ffi_semantics.html">FFI Semantics</a> | ||
38 | </li></ul> | ||
39 | </li><li> | ||
40 | <a href="ext_jit.html">jit.* Library</a> | ||
41 | </li><li> | ||
42 | <a href="ext_c_api.html">Lua/C API</a> | ||
43 | </li><li> | ||
44 | <a class="current" href="ext_profiler.html">Profiler</a> | ||
45 | </li></ul> | ||
46 | </li><li> | ||
47 | <a href="status.html">Status</a> | ||
48 | <ul><li> | ||
49 | <a href="changes.html">Changes</a> | ||
50 | </li></ul> | ||
51 | </li><li> | ||
52 | <a href="faq.html">FAQ</a> | ||
53 | </li><li> | ||
54 | <a href="http://luajit.org/performance.html">Performance <span class="ext">»</span></a> | ||
55 | </li><li> | ||
56 | <a href="http://wiki.luajit.org/">Wiki <span class="ext">»</span></a> | ||
57 | </li><li> | ||
58 | <a href="http://luajit.org/list.html">Mailing List <span class="ext">»</span></a> | ||
59 | </li></ul> | ||
60 | </div> | ||
61 | <div id="main"> | ||
62 | <p> | ||
63 | LuaJIT has an integrated statistical profiler with very low overhead. It | ||
64 | allows sampling the currently executing stack and other parameters in | ||
65 | regular intervals. | ||
66 | </p> | ||
67 | <p> | ||
68 | The integrated profiler can be accessed from three levels: | ||
69 | </p> | ||
70 | <ul> | ||
71 | <li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the | ||
72 | <a href="#j_p"><tt>-jp</tt></a> command line option.</li> | ||
73 | <li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li> | ||
74 | <li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li> | ||
75 | </ul> | ||
76 | |||
77 | <h2 id="hl_profiler">High-Level Profiler</h2> | ||
78 | <p> | ||
79 | The bundled high-level profiler offers basic profiling functionality. It | ||
80 | generates simple textual summaries or source code annotations. It can be | ||
81 | accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option | ||
82 | or from Lua code by loading the underlying <tt>jit.p</tt> module. | ||
83 | </p> | ||
84 | <p> | ||
85 | To cut to the chase — run this to get a CPU usage profile by | ||
86 | function name: | ||
87 | </p> | ||
88 | <pre class="code"> | ||
89 | luajit -jp myapp.lua | ||
90 | </pre> | ||
91 | <p> | ||
92 | It's <em>not</em> a stated goal of the bundled profiler to add every | ||
93 | possible option or to cater for special profiling needs. The low-level | ||
94 | profiler APIs are documented below. They may be used by third-party | ||
95 | authors to implement advanced functionality, e.g. IDE integration or | ||
96 | graphical profilers. | ||
97 | </p> | ||
98 | <p> | ||
99 | Note: Sampling works for both interpreted and JIT-compiled code. The | ||
100 | results for JIT-compiled code may sometimes be surprising. LuaJIT | ||
101 | heavily optimizes and inlines Lua code — there's no simple | ||
102 | one-to-one correspondence between source code lines and the sampled | ||
103 | machine code. | ||
104 | </p> | ||
105 | |||
106 | <h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3> | ||
107 | <p> | ||
108 | The <tt>-jp</tt> command line option starts the high-level profiler. | ||
109 | When the application run by the command line terminates, the profiler | ||
110 | stops and writes the results to <tt>stdout</tt> or to the specified | ||
111 | <tt>output</tt> file. | ||
112 | </p> | ||
113 | <p> | ||
114 | The <tt>options</tt> argument specifies how the profiling is to be | ||
115 | performed: | ||
116 | </p> | ||
117 | <ul> | ||
118 | <li><tt>f</tt> — Stack dump: function name, otherwise module:line. | ||
119 | This is the default mode.</li> | ||
120 | <li><tt>F</tt> — Stack dump: ditto, but dump module:name.</li> | ||
121 | <li><tt>l</tt> — Stack dump: module:line.</li> | ||
122 | <li><tt><number></tt> — stack dump depth (callee ← | ||
123 | caller). Default: 1.</li> | ||
124 | <li><tt>-<number></tt> — Inverse stack dump depth (caller | ||
125 | → callee).</li> | ||
126 | <li><tt>s</tt> — Split stack dump after first stack level. Implies | ||
127 | depth ≥ 2 or depth ≤ -2.</li> | ||
128 | <li><tt>p</tt> — Show full path for module names.</li> | ||
129 | <li><tt>v</tt> — Show VM states.</li> | ||
130 | <li><tt>z</tt> — Show <a href="#jit_zone">zones</a>.</li> | ||
131 | <li><tt>r</tt> — Show raw sample counts. Default: show percentages.</li> | ||
132 | <li><tt>a</tt> — Annotate excerpts from source code files.</li> | ||
133 | <li><tt>A</tt> — Annotate complete source code files.</li> | ||
134 | <li><tt>G</tt> — Produce raw output suitable for graphical tools.</li> | ||
135 | <li><tt>m<number></tt> — Minimum sample percentage to be shown. | ||
136 | Default: 3%.</li> | ||
137 | <li><tt>i<number></tt> — Sampling interval in milliseconds. | ||
138 | Default: 10ms.<br> | ||
139 | Note: The actual sampling precision is OS-dependent.</li> | ||
140 | </ul> | ||
141 | <p> | ||
142 | The default output for <tt>-jp</tt> is a list of the most CPU consuming | ||
143 | spots in the application. Increasing the stack dump depth with (say) | ||
144 | <tt>-jp=2</tt> may help to point out the main callers or callees of | ||
145 | hotspots. But sample aggregation is still flat per unique stack dump. | ||
146 | </p> | ||
147 | <p> | ||
148 | To get a two-level view (split view) of callers/callees, use | ||
149 | <tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second | ||
150 | level are relative to the first level. | ||
151 | </p> | ||
152 | <p> | ||
153 | To see how much time is spent in each line relative to a function, use | ||
154 | <tt>-jp=fl</tt>. | ||
155 | </p> | ||
156 | <p> | ||
157 | To see how much time is spent in different VM states or | ||
158 | <a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>. | ||
159 | </p> | ||
160 | <p> | ||
161 | Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level | ||
162 | views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time | ||
163 | spent in a VM state or zone vs. hotspots. This can be used to answer | ||
164 | questions like "Which time consuming functions are only interpreted?" or | ||
165 | "What's the garbage collector overhead for a specific function?". | ||
166 | </p> | ||
167 | <p> | ||
168 | Multiple options can be combined — but not all combinations make | ||
169 | sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels | ||
170 | deep in 4ms intervals and shows a split view of the CPU consuming | ||
171 | functions and their callers with a 1% threshold. | ||
172 | </p> | ||
173 | <p> | ||
174 | Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are | ||
175 | always flat and at the line level. Obviously, the source code files need | ||
176 | to be readable by the profiler script. | ||
177 | </p> | ||
178 | <p> | ||
179 | The high-level profiler can also be started and stopped from Lua code with: | ||
180 | </p> | ||
181 | <pre class="code"> | ||
182 | require("jit.p").start(options, output) | ||
183 | ... | ||
184 | require("jit.p").stop() | ||
185 | </pre> | ||
186 | |||
187 | <h3 id="jit_zone"><tt>jit.zone</tt> — Zones</h3> | ||
188 | <p> | ||
189 | Zones can be used to provide information about different parts of an | ||
190 | application to the high-level profiler. E.g. a game could make use of an | ||
191 | <tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical, | ||
192 | organized as a stack. | ||
193 | </p> | ||
194 | <p> | ||
195 | The <tt>jit.zone</tt> module needs to be loaded explicitly: | ||
196 | </p> | ||
197 | <pre class="code"> | ||
198 | local zone = require("jit.zone") | ||
199 | </pre> | ||
200 | <ul> | ||
201 | <li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li> | ||
202 | <li><tt>zone()</tt> pops the current zone from the zone stack and | ||
203 | returns its name.</li> | ||
204 | <li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li> | ||
205 | <li><tt>zone:flush()</tt> flushes the zone stack.</li> | ||
206 | </ul> | ||
207 | <p> | ||
208 | To show the time spent in each zone use <tt>-jp=z</tt>. To show the time | ||
209 | spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>. | ||
210 | </p> | ||
211 | |||
212 | <h2 id="ll_lua_api">Low-level Lua API</h2> | ||
213 | <p> | ||
214 | The <tt>jit.profile</tt> module gives access to the low-level API of the | ||
215 | profiler from Lua code. This module needs to be loaded explicitly: | ||
216 | <pre class="code"> | ||
217 | local profile = require("jit.profile") | ||
218 | </pre> | ||
219 | <p> | ||
220 | This module can be used to implement your own higher-level profiler. | ||
221 | A typical profiling run starts the profiler, captures stack dumps in | ||
222 | the profiler callback, adds them to a hash table to aggregate the number | ||
223 | of samples, stops the profiler and then analyzes all of the captured | ||
224 | stack dumps. Other parameters can be sampled in the profiler callback, | ||
225 | too. But it's important not to spend too much time in the callback, | ||
226 | since this may skew the statistics. | ||
227 | </p> | ||
228 | |||
229 | <h3 id="profile_start"><tt>profile.start(mode, cb)</tt> | ||
230 | — Start profiler</h3> | ||
231 | <p> | ||
232 | This function starts the profiler. The <tt>mode</tt> argument is a | ||
233 | string holding options: | ||
234 | </p> | ||
235 | <ul> | ||
236 | <li><tt>f</tt> — Profile with precision down to the function level.</li> | ||
237 | <li><tt>l</tt> — Profile with precision down to the line level.</li> | ||
238 | <li><tt>i<number></tt> — Sampling interval in milliseconds (default | ||
239 | 10ms).</br> | ||
240 | Note: The actual sampling precision is OS-dependent. | ||
241 | </li> | ||
242 | </ul> | ||
243 | <p> | ||
244 | The <tt>cb</tt> argument is a callback function which is called with | ||
245 | three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is | ||
246 | called on a separate coroutine, the <tt>thread</tt> argument is the | ||
247 | state that holds the stack to sample for profiling. Note: do | ||
248 | <em>not</em> modify the stack of that state or call functions on it. | ||
249 | </p> | ||
250 | <p> | ||
251 | <tt>samples</tt> gives the number of accumulated samples since the last | ||
252 | callback (usually 1). | ||
253 | </p> | ||
254 | <p> | ||
255 | <tt>vmstate</tt> holds the VM state at the time the profiling timer | ||
256 | triggered. This may or may not correspond to the state of the VM when | ||
257 | the profiling callback is called. The state is either <tt>'N'</tt> | ||
258 | native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt> | ||
259 | C code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT | ||
260 | compiler. | ||
261 | </p> | ||
262 | |||
263 | <h3 id="profile_stop"><tt>profile.stop()</tt> | ||
264 | — Stop profiler</h3> | ||
265 | <p> | ||
266 | This function stops the profiler. | ||
267 | </p> | ||
268 | |||
269 | <h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt> | ||
270 | — Dump stack </h3> | ||
271 | <p> | ||
272 | This function allows taking stack dumps in an efficient manner. It | ||
273 | returns a string with a stack dump for the <tt>thread</tt> (coroutine), | ||
274 | formatted according to the <tt>fmt</tt> argument: | ||
275 | </p> | ||
276 | <ul> | ||
277 | <li><tt>p</tt> — Preserve the full path for module names. Otherwise | ||
278 | only the file name is used.</li> | ||
279 | <li><tt>f</tt> — Dump the function name if it can be derived. Otherwise | ||
280 | use module:line.</li> | ||
281 | <li><tt>F</tt> — Ditto, but dump module:name.</li> | ||
282 | <li><tt>l</tt> — Dump module:line.</li> | ||
283 | <li><tt>Z</tt> — Zap the following characters for the last dumped | ||
284 | frame.</li> | ||
285 | <li>All other characters are added verbatim to the output string.</li> | ||
286 | </ul> | ||
287 | <p> | ||
288 | The <tt>depth</tt> argument gives the number of frames to dump, starting | ||
289 | at the topmost frame of the thread. A negative number dumps the frames in | ||
290 | inverse order. | ||
291 | </p> | ||
292 | <p> | ||
293 | The first example prints a list of the current module names and line | ||
294 | numbers of up to 10 frames in separate lines. The second example prints | ||
295 | semicolon-separated function names for all frames (up to 100) in inverse | ||
296 | order: | ||
297 | </p> | ||
298 | <pre class="code"> | ||
299 | print(profile.dumpstack(thread, "l\n", 10)) | ||
300 | print(profile.dumpstack(thread, "lZ;", -100)) | ||
301 | </pre> | ||
302 | |||
303 | <h2 id="ll_c_api">Low-level C API</h2> | ||
304 | <p> | ||
305 | The profiler can be controlled directly from C code, e.g. for | ||
306 | use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see | ||
307 | <a href="ext_c_api.html">Lua/C API</a> extensions). | ||
308 | </p> | ||
309 | |||
310 | <h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt> | ||
311 | — Start profiler</h3> | ||
312 | <p> | ||
313 | This function starts the profiler. <a href="#profile_start">See | ||
314 | above</a> for a description of the <tt>mode</tt> argument. | ||
315 | </p> | ||
316 | <p> | ||
317 | The <tt>cb</tt> argument is a callback function with the following | ||
318 | declaration: | ||
319 | </p> | ||
320 | <pre class="code"> | ||
321 | typedef void (*luaJIT_profile_callback)(void *data, lua_State *L, | ||
322 | int samples, int vmstate); | ||
323 | </pre> | ||
324 | <p> | ||
325 | <tt>data</tt> is available for use by the callback. <tt>L</tt> is the | ||
326 | state that holds the stack to sample for profiling. Note: do | ||
327 | <em>not</em> modify this stack or call functions on this stack — | ||
328 | use a separate coroutine for this purpose. <a href="#profile_start">See | ||
329 | above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>. | ||
330 | </p> | ||
331 | |||
332 | <h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt> | ||
333 | — Stop profiler</h3> | ||
334 | <p> | ||
335 | This function stops the profiler. | ||
336 | </p> | ||
337 | |||
338 | <h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt> | ||
339 | — Dump stack </h3> | ||
340 | <p> | ||
341 | This function allows taking stack dumps in an efficient manner. | ||
342 | <a href="#profile_dump">See above</a> for a description of <tt>fmt</tt> | ||
343 | and <tt>depth</tt>. | ||
344 | </p> | ||
345 | <p> | ||
346 | This function returns a <tt>const char *</tt> pointing to a | ||
347 | private string buffer of the profiler. The <tt>int *len</tt> | ||
348 | argument returns the length of the output string. The buffer is | ||
349 | overwritten on the next call and deallocated when the profiler stops. | ||
350 | You either need to consume the content immediately or copy it for later | ||
351 | use. | ||
352 | </p> | ||
353 | <br class="flush"> | ||
354 | </div> | ||
355 | <div id="foot"> | ||
356 | <hr class="hide"> | ||
357 | Copyright © 2005-2020 | ||
358 | <span class="noprint"> | ||
359 | · | ||
360 | <a href="contact.html">Contact</a> | ||
361 | </span> | ||
362 | </div> | ||
363 | </body> | ||
364 | </html> | ||