Project

General

Profile

Developing a very lightweight measurement tool

Added by Luis Kornblueh over 13 years ago

As we all heard DrHook from ECMWF is very nice tool for starting. It would be interesting, if ECMWF would provide it as a starting point. We ourself at MPI have a comparable tool which is still much smaller, but doing less analysis.

Oliver Treiber gave a nice talk on improving performance analysis. From my point of view it would be great to have such a lightweight tool integrating timings for computations,
counter analysis, MPI analysis, and I/O analysis.

I think we need here as well to encourage vendors to support this.

Target would be Linux and AIX - as far as I can see that's the only two OSs used in HPC currently.

Hardware seems to be focused on x86, Power, and some GPUs.


Replies (17)

RE: Developing a very lightweight measurement tool - Added by George Mozdzynski over 13 years ago

I will look into making Dr Hook available to the RAPS community.

RE: Developing a very lightweight measurement tool - Added by Tom Henderson over 13 years ago

We currently use a lightweight tool called "GPTL" (General Purpose Timing Library)
to measure run times and many other things. It includes high-resolution timers,
reports of memory usage, and optional hooks into PAPI for counting floating-point
operations, (etc.). It supports automatic and/or user-inserted instrumentation. It
has been ported to IBM, SGI, Linux clusters, etc. and is fairly mature at this point.

http://www.burningserver.net/rosinski/gptl/

I'm happy to provide more information if there is any interest.

Tom

RE: Developing a very lightweight measurement tool - Added by Luis Kornblueh over 13 years ago

It would be nice to have a review of the cpapbilites of all the tools in the Wiki. I just started the entry Lightweight measurement tools.

RE: Developing a very lightweight measurement tool - Added by Tom Henderson over 13 years ago

Would it be OK to add my colleague, Jim Rosinski, to this forum? He's the GPTL
developer and would be the best person to fill in the wiki entry for GPTL.

RE: Developing a very lightweight measurement tool - Added by George Mozdzynski over 13 years ago

He is most welcome to contribute to this forum.

RE: Developing a very lightweight measurement tool - Added by Jim Rosinski over 13 years ago

OK I've joined the group and added an entry for GPTL on the Wiki under "Lightweight measurement tools". The entry is a very brief text file describing what the tool is, and points to the GPTL web page (www.burningserver.net/rosinski/gptl). The web page contains lots of usage examples for both simple and more advanced cases. The three main reasons I use GPTL are: 1) On x86 machines a register read for wallclock times is much more accurate and much lower overhead than standard library routines like gettimeofday(); 2) With auto-instrumentation it is easy to generate a dynamic call tree of the application being profiled; and 3) Including PAPI-based performance counter information (including derived metrics like computational intensity) is trivial.

RE: Developing a very lightweight measurement tool - Added by Tom Henderson over 13 years ago

I used GPTL+PAPI to measure flop rates, run times, and computational intensity for the tables in the GPU presentations I made at the ECMWF workshop: http://www.ecmwf.int/newsevents/meetings/workshops/2010/high_performance_computing_14th/presentations/Henderson.pdf
We also use GPTL to measure memory use.

RE: Developing a very lightweight measurement tool - Added by Sami Saarinen over 13 years ago

Had a quick browse through the GPTL and it looks pretty solid and clearly easier to use than ("my") DrHook.

Two concerns:
1) what if the compiler doesn't contain hooks to the entry/end points (as in http://www.burningserver.net/rosinski/gptl/) ?
2) DrHook has also 2nd function: in case of error, it catches signals and attempts to produce tracebacks. Could GPTL be "taught" to do the same ?

I also gather (and hope) that GPTL will scale, which I believe DrHook may have a trouble with: going beyond say 10,000s of core w/o too much of overhead ?
(what is the overhead on a single core/single task case, btw?)

RE: Developing a very lightweight measurement tool - Added by Sami Saarinen over 13 years ago

Another entry: whilst at ECMWF we developed memory tracing facility into the DrHook,
which worked pretty well on IBM AIX system. There we intercepted Fortran90 ALLOCATE & DEALLOCATE.

However, IBM is pretty much the only system where we got this working -- if not the only one.

Recently came across work on dmalloc under http://dmalloc.com/docs/

Has anybody looked at this and/or experience ?

Namely memory leaks, growth etc. are constant problems as well.

RE: Developing a very lightweight measurement tool - Added by Jim Rosinski over 13 years ago

In response to Sami's 2 concerns:

1) what if the compiler doesn't contain hooks to the entry/end points (as in http://www.burningserver.net/rosinski/gptl/) ?

In these cases one would have to insert the instrumentation manually, e.g. ret=gptlstart('region') followed later by ret=gptlstop('region'). Note that manual and auto-instrumentation can be freely mixed in the same application code without confusing the library. These days most compilers do contain automatic hooks: GNU, Intel, Pathscale, PGI, AIX for sure. Not certain about Cray.

2) DrHook has also 2nd function: in case of error, it catches signals and attempts to produce tracebacks. Could GPTL be "taught" to do the same?

Interesting point. GPTL does retain current "callstack" state information. It would be easy to add a user-callable function to retrieve and print this information. More difficult to do it via signal handling.

RE: Developing a very lightweight measurement tool - Added by Jim Rosinski over 13 years ago

Regarding Sami's question about scalability of GPTL, the only concern comes when writing the output timing files. Normal behavior is to write one output file per MPI task, then post-process those files with one of the Perl scripts included with the distribution. But if the application uses 10,000 or more MPI tasks this can cause problems. There is a function GPTLpr_summary() which summarizes timings across MPI tasks and writes to only a single file if this is an issue. Also, GPTL is thread-safe and reports per-thread timing information in a single file.

More on the point of overhead than scalability of GPTL, the biggest issue that might arise is calling GPTLstart()/GPTLstop() in a region with too fine a granularity (e.g. called a billion times per second). This is one reason it is important to have a lightweight underlying wallclock timer. The overhead of gettimeofday() swamps the overhead of GPTL itself. On X86-based processors which provide a fast register read for wallclock times, this overhead is much less. If you've downloaded the GPTL distribution, look for NANOTIME in the documentation and in gptl.c

RE: Developing a very lightweight measurement tool - Added by Sami Saarinen over 13 years ago

Back to Jim Rosinski :

It would probably make a lot of sense to merge DrHook with GPTL, eh ?

They both are thread-safe, produce a file per MPI-task to be postprocessed (if necessary) offline with Perl.
DrHook does solely rely on insertion of CALL DR_HOOK(0,'routine',handle) and CALL DR_HOOK(1,'routine',handle) corresponding to gptlstart/-stop, respectively.
(browse more https://redmine.dkrz.de/collaboration/attachments/40/drhook_fmi.pdf).
Also an auto-DrHook-call -insertion Perl-script for Fortran (adopted & fixed from an ECMWF script) can be used transparently within makefiles i.e. the original source code can stay untouched.
DrHook signal handling "technology" could be brought into the GPTL with a minimal hassle.
Also memory tracing would be possible to include in GPTL, too.

One suggestion for manual instrumentation in GPTL: DrHook uses an 8-byte handle which upon entering the instru-region stores the address to the internal data structure to locate the data behind the named label 'region'.
When exiting the instru-region, the string 'region' is not anymore parsed/hashed/strcmp'red, but the handle is used directly to point the internal data structure of DrHook. Removes a lot of overhead -- probably more than a
fast wallclock timer (a necessity itself) gives gains.

Finally, I wonder if this is now getting too technical for the audience ? Perhaps we should take this offline -- or is it okay to continue in a such level of detail here ?

RE: Developing a very lightweight measurement tool - Added by George Mozdzynski over 13 years ago

As an experiment I spent today installing GPTL on our AIX Power6 cluster
and tested a small T159 IFS model.

Not that difficult to do this and a 64 task x 1 OpenMP thread case produced
the expected 64 timing.<n> files. So GPTL works with IFS!!

The indented/hierarchical profile is something we don't see with DRHOOK
and interesting to see AVG_MPI_BYTES of MPI calls within this profile.

The only real problem I encountered is when I tried 2 threads, GPTL coughed
with many messages of the type

21:update_parent_info: realloc error parent_count nparent=3
21:GPTLstart: find_parent error
23:GPTLstop: timer sitnu_@OL@1 was already off.
23:GPTLstop: timer sigam_@OL@1@OL@2 was already off.
28:update_stats: negative delta=-1e-06
0:GPTLstart: find_parent error

then finally dying on,

0:    Offset 0x00000008 in procedure strcmp
0: Location 0x0000000100002cb0
0: Location 0x000000010000289c
0: Offset 0x00000044 in procedure sigam_@OL@1@OL@2
0: Location 0x090000000315ea10
0: Offset 0x00000100 in procedure sigam_@OL@1
0: Location 0x090000000315fb08
0: Offset 0x00000354 in procedure sigam_, near line 106 in file sigam.F90
0: Offset 0x000001e8 in procedure lassie_, near line 134 in file lassie.F90
0: Offset 0x00000cc4 in procedure lacdyn_, near line 410 in file lacdyn.F90
0: Offset 0x00000424 in procedure cpg_dyn_, near line 338 in file cpg_dyn.F90
0: Offset 0x00002610 in procedure cpg_@OL@1

So my question for Jim is should I expect GPTL to work with OpenMP on AIX?
Am using xlf 13.1.0.4 for these runs.

Happy to forward any outputs or other to Jim, just trying to get a feel for GPTL, so far
am very impressed with the distribution and initial runs (IFS is a big code).

Now going to trying GPTLpr_summary() ...

RE: Developing a very lightweight measurement tool - Added by Jim Rosinski over 13 years ago

Sami brings up a very good point about using handles to reduce overhead. When I designed GPTL I always opted for the interface that made life easiest for the user. In this case, I was willing to pay the price of some added overhead to avoid making the user keep track of a set of library-created handles. But, the relative overhead is definitely non-trivial when a fast register read is used instead of gettimeofday() to track wallclock times.

A best-of-both-worlds solution might be if I were to add entries GPTLstart_handle('region', handle) and GPTLstop_handle('region', handle), where the argument "handle" would be output by GPTL on the initial call, and the user's responsibility to track. The library could check to see if on input "handle" is zero, meaning to generate the hash entry for the region, and otherwise use it directly as an address, bypassing hashing. Is this how DrHook does things? The only downside I can see to this approach is when "handle" is non-zero, the argument 'region' would be ignored (and therefore possibly misleading). The user could still use existing entry point GPTLstart('region') wherever they wished--it wouldn't interfere with calls to GPTLstart_handle('region', handle), but would go through the slower hash generation process.

I'd say the traceback and memory usage info provided by DrHook are more advanced than what GPTL has. There is currently no signal handling in GPTL. The memory usage info is just a wrapper to getrusage on AIX, and a read of /proc/<pid>/statm on Linux systems.

Comments welcome. Sami, I agree the discussion may be getting too technical for a wiki--feel free to email me at . Looks like your email is if I found the right info.

RE: Developing a very lightweight measurement tool - Added by Jim Rosinski over 13 years ago

George, glad you got GPTL to work on your IFS model, at least in unthreaded mode. I would expect threaded mode to work as well. I just tried the GPTL threading test code from ctests/papiomptest on an AIX system, and it succeeded. Could you try this same test on your machine and let me know how it goes? Can you send me the macros.make file you used to build GPTL? Is it possible that you built the lib with threading disabled? Make sure that either OPENMP=yes or PTHREADS=yes in macros.make. If you can send me a test case I'd be glad to have a look.

Sending this message to as well. My email is

RE: Developing a very lightweight measurement tool - Added by Sami Saarinen over 13 years ago

Just quickly few responses -- the rest : offline via email :-)

George: DrHook indeed has so called callpath based output -- if this is now an alias hierarchical profile.
It has somewhat more overhead, but can be activated through:
export DR_HOOK_CALLPATH=1
export DR_HOOK_CALLPATH_DEPTH=10
These would keep track of a calling tree (as seen by DrHook through its own book-keeping) up to the nesting level of 10.
So if a routine has been called via 8say) 2 different call paths, DrHook would indeed pick this up.

Jim: DrHook handle-variable is merely an 8-byte placeholder for (max) 8-byte address. User should not fill it with anything.
The variable also has to "come" from stack to assure thread safety.
A good idea would really be to two versions of GPTLstart/stop : with and without handle's, as you suggested.
Having these lengthier versions of GPTLstart/stop_handle() one could directly translate DrHook calls to use them.

Regarding traceback & memory. Splitting this into 2 "subprojects" would be worth doing as the traceback is "trivial"
and memory thing requires more effort if one is to go through all the trouble to intercept Fortran Allocate/Deallocate
(and to wrap C-malloc()'s / C++ new()'s ; the C-wrapping has already been done for IFS/ODB).

I will send this message via email to you both, too.

RE: Developing a very lightweight measurement tool - Added by George Mozdzynski over 13 years ago

Just want to add that some number of mails have been exchanged offline between Jim Rosinski,
Sami Saarinen and myself.

The problem with getting GPTL to work with IFS when using > 1 OpenMP threads has been resolved.

Also GPTLpr_summary worked correctly with IFS.

What is clear from the initial experimentation with GPTL is that there is a lot of overlap
in functionality between GPTL and DRHOOK (which thanks to Sami we use by default at ECMWF).

My view is that it would be nice to see some further development of GPTL (if Jim is willing
to do this), and for it to be used more widely in the RAPS community.

    (1-17/17)