From official website
SystemTap is a tracing and probing tool that allows users to study and monitor the activities of the
computer system (particularly, the kernel) in fine detail. It provides information similar to the output of
tools like netstat, ps, top, and iostat; however, SystemTap is designed to provide more filtering
and analysis options for collected information.
Install
Simply install systemtap
package and kernel debug packages for your currently running kernel.
$ sudo apt-get install systemtap linux-image-`uname -r`-dbg linux-headers-`uname -r`
To check installation:
$ sudo stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'
Pass 1: parsed user script and 81 library script(s) using 78600virt/22436res/2512shr kb, in 100usr/0sys/125real ms.
Pass 2: analyzed script: 1 probe(s), 1 function(s), 3 embed(s), 0 global(s) using 276868virt/117456res/7792shr kb, in 1190usr/220sys/7225real ms.
Pass 3: translated to C into "/tmp/stapQvI4gP/stap_347bddcb57970a4c3f9ee4e0705f0f68_1473_src.c" using 267024virt/112808res/5676shr kb, in 10usr/20sys/31real ms.
Pass 4: compiled C into "stap_347bddcb57970a4c3f9ee4e0705f0f68_1473.ko" in 5410usr/840sys/12991real ms.
Pass 5: starting run.
read performed
Pass 5: run completed in 0usr/20sys/499real ms.
How it works
- read script file written in “System Tap” language
- convert it to C source code
- compile code as a linux kernel module
- load it and execute it
- unload kernel module
All these steps are done thanks to the stap
command.
Scripts
A SystemTap script is of the following form:
function function_name(arguments) {statements}
probe event1, event2, ..., eventn {
function_name(args)
}
To be read: if one eventn occurs, execute the corresponding statements ie : function_name(args)
First example
Let’s print when the mkdir system call is called:
function fancy_print(text) {
printf("**** %s ****\n", text);
}
probe syscall.mkdir {
fancy_print("mkdir syscall called with pathname: " . pathname);
exit();
}
Run the script:
$ sudo stap -v mkdir_probe.stp
Pass 1: parsed user script and 81 library script(s) using 78596virt/22424res/2508shr kb, in 100usr/10sys/104real ms.
Pass 2: analyzed script: 1 probe(s), 2 function(s), 1 embed(s), 0 global(s) using 205108virt/118440res/75184shr kb, in 310usr/50sys/399real ms.
Pass 3: translated to C into "/tmp/stappLnq6v/stap_0de87453a9617d4db8e5f589bf21474f_1770_src.c" using 205108virt/118548res/75292shr kb, in 0usr/0sys/3real ms.
Pass 4: compiled C into "stap_0de87453a9617d4db8e5f589bf21474f_1770.ko" in 1170usr/150sys/1407real ms.
Pass 5: starting run.
Then issue an mkdir
command to trigger the probe, the SystemTap script should end:
**** mkdir syscall called ! ****
Pass 5: run completed in 10usr/20sys/14869real ms.
Where does the pathname variable comes from ? SystemTap defines a set of
predefined functions and probes typically in /usr/share/systemtap/tapset/
. By looking at the
/usr/share/systemtap/tapset/syscalls.stp
file we can see the following mkdir section:
# mkdir ______________________________________________________
# long sys_mkdir(const char __user * pathname, int mode)
probe syscall.mkdir = kernel.function("sys_mkdir").call
{
name = "mkdir"
pathname_uaddr = $pathname
pathname = user_string($pathname)
mode = $mode
argstr = sprintf("%s, %#o", user_string_quoted($pathname), $mode)
}
where the pathname
variable is defined.
iotop with systemtap (taken from <http://sourceware.org/systemtap/examples/io/iotop.stp>
_)
global reads, writes, total_io
probe vfs.read.return {
reads[execname()] += bytes_read
}
probe vfs.write.return {
writes[execname()] += bytes_written
}
# print top 10 IO processes every 5 seconds
probe timer.s(5) {
foreach (name in writes)
total_io[name] += writes[name]
foreach (name in reads)
total_io[name] += reads[name]
printf ("%16s\t%10s\t%10s\n", "Process", "KB Read", "KB Written")
foreach (name in total_io- limit 10)
printf("%16s\t%10d\t%10d\n", name,
reads[name]/1024, writes[name]/1024)
delete reads
delete writes
delete total_io
print("\n")
}
.return
suffixes to probes means probes are triggered when the syscall returns,bytes_read
andbytes_written
are predefined variables,timer.s
is a probe that executes periodically, in our case, every 5 seconds,- note the minus sign in the
foreach
statement meaning sort by descending order and limited to the first 10 entries (limit 10
).
Running the script will periodically print the iotop-like output::
Process KB Read KB Written
mocp 199 0
Xorg 161 0
firefox 2 5
psi 1 0
stapio 0 0
urxvt 0 0
parcellite 0 0
tmux 0 0
beam.smp 0 0
plugin-containe 0 0