RAPS: Measurement tools initiativehttps://redmine.dkrz.de/https://redmine.dkrz.de/favicon.ico?16970985922010-12-16T17:24:55ZDKRZ Project Management Service
Redmine Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=29#message-292010-12-16T17:24:55ZGeorge MozdzynskiGeorge.Mozdzynski@ecmwf.int
<p>Just want to add that some number of mails have been exchanged offline between Jim Rosinski, <br />Sami Saarinen and myself.</p>
<p>The problem with getting GPTL to work with IFS when using > 1 OpenMP threads has been resolved.</p>
<p>Also GPTLpr_summary worked correctly with IFS.</p>
<p>What is clear from the initial experimentation with GPTL is that there is a lot of overlap<br />in functionality between GPTL and DRHOOK (which thanks to Sami we use by default at ECMWF).</p>
<p>My view is that it would be nice to see some further development of GPTL (if Jim is willing<br />to do this), and for it to be used more widely in the RAPS community.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=28#message-282010-12-15T08:17:31ZSami Saarinensami.saarinen@inbox.com
<p>Just quickly few responses -- the rest : offline via email :-)</p>
<p>George: DrHook indeed has so called callpath based output -- if this is now an alias hierarchical profile.<br />It has somewhat more overhead, but can be activated through:<br /> export DR_HOOK_CALLPATH=1<br /> export DR_HOOK_CALLPATH_DEPTH=10<br />These would keep track of a calling tree (as seen by DrHook through its own book-keeping) up to the nesting level of 10.<br />So if a routine has been called via 8say) 2 different call paths, DrHook would indeed pick this up.</p>
<p>Jim: DrHook handle-variable is merely an 8-byte placeholder for (max) 8-byte address. User should not fill it with anything.<br />The variable also has to "come" from stack to assure thread safety. <br />A good idea would really be to two versions of GPTLstart/stop : with and without handle's, as you suggested.<br />Having these lengthier versions of GPTLstart/stop_handle() one could directly translate DrHook calls to use them.</p>
<p>Regarding traceback & memory. Splitting this into 2 "subprojects" would be worth doing as the traceback is "trivial" <br />and memory thing requires more effort if one is to go through all the trouble to intercept Fortran Allocate/Deallocate<br />(and to wrap C-malloc()'s / C++ new()'s ; the C-wrapping has already been done for IFS/ODB).</p>
<p>I will send this message via email to you both, too.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=27#message-272010-12-14T21:18:58ZJim Rosinskijames.rosinski@noaa.gov
<p>George, glad you got GPTL to work on your IFS model, at least in unthreaded mode. I would expect threaded mode to work as well. I just tried the GPTL threading test code from ctests/papiomptest on an AIX system, and it succeeded. Could you try this same test on your machine and let me know how it goes? Can you send me the macros.make file you used to build GPTL? Is it possible that you built the lib with threading disabled? Make sure that either OPENMP=yes or PTHREADS=yes in macros.make. If you can send me a test case I'd be glad to have a look.</p>
<p>Sending this message to <a class="email" href="mailto:George.Mozdzynski@ecmwf.int">George.Mozdzynski@ecmwf.int</a> as well. My email is <a class="email" href="mailto:james.rosinski@noaa.gov">james.rosinski@noaa.gov</a></p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=26#message-262010-12-14T21:06:28ZJim Rosinskijames.rosinski@noaa.gov
<p>Sami brings up a very good point about using handles to reduce overhead. When I designed GPTL I always opted for the interface that made life easiest for the user. In this case, I was willing to pay the price of some added overhead to avoid making the user keep track of a set of library-created handles. But, the relative overhead is definitely non-trivial when a fast register read is used instead of gettimeofday() to track wallclock times.</p>
<p>A best-of-both-worlds solution might be if I were to add entries GPTLstart_handle('region', handle) and GPTLstop_handle('region', handle), where the argument "handle" would be output by GPTL on the initial call, and the user's responsibility to track. The library could check to see if on input "handle" is zero, meaning to generate the hash entry for the region, and otherwise use it directly as an address, bypassing hashing. Is this how DrHook does things? The only downside I can see to this approach is when "handle" is non-zero, the argument 'region' would be ignored (and therefore possibly misleading). The user could still use existing entry point GPTLstart('region') wherever they wished--it wouldn't interfere with calls to GPTLstart_handle('region', handle), but would go through the slower hash generation process.</p>
<p>I'd say the traceback and memory usage info provided by DrHook are more advanced than what GPTL has. There is currently no signal handling in GPTL. The memory usage info is just a wrapper to getrusage on AIX, and a read of /proc/<pid>/statm on Linux systems.</p>
<p>Comments welcome. Sami, I agree the discussion may be getting too technical for a wiki--feel free to email me at <a class="email" href="mailto:james.rosinski@noaa.gov">james.rosinski@noaa.gov</a>. Looks like your email is <a class="email" href="mailto:sami.saarinen@inbox.com">sami.saarinen@inbox.com</a> if I found the right info.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=25#message-252010-12-14T17:12:12ZGeorge MozdzynskiGeorge.Mozdzynski@ecmwf.int
<p>As an experiment I spent today installing GPTL on our AIX Power6 cluster<br />and tested a small T159 IFS model.</p>
<p>Not that difficult to do this and a 64 task x 1 OpenMP thread case produced<br />the expected 64 timing.<n> files. So GPTL works with IFS!!</p>
<p>The indented/hierarchical profile is something we don't see with DRHOOK <br />and interesting to see AVG_MPI_BYTES of MPI calls within this profile.</p>
<p>The only real problem I encountered is when I tried 2 threads, GPTL coughed<br />with many messages of the type</p>
<pre><code>21:update_parent_info: realloc error parent_count nparent=3<br /> 21:GPTLstart: find_parent error<br /> 23:GPTLstop: timer sitnu_@OL@1 was already off.<br /> 23:GPTLstop: timer sigam_@OL@1@OL@2 was already off.<br /> 28:update_stats: negative delta=-1e-06<br /> 0:GPTLstart: find_parent error</code></pre>
<p>then finally dying on,</p>
<pre><code>0: Offset 0x00000008 in procedure strcmp<br /> 0: Location 0x0000000100002cb0<br /> 0: Location 0x000000010000289c<br /> 0: Offset 0x00000044 in procedure sigam_@OL@1@OL@2<br /> 0: Location 0x090000000315ea10<br /> 0: Offset 0x00000100 in procedure sigam_@OL@1<br /> 0: Location 0x090000000315fb08<br /> 0: Offset 0x00000354 in procedure sigam_, near line 106 in file sigam.F90<br /> 0: Offset 0x000001e8 in procedure lassie_, near line 134 in file lassie.F90<br /> 0: Offset 0x00000cc4 in procedure lacdyn_, near line 410 in file lacdyn.F90<br /> 0: Offset 0x00000424 in procedure cpg_dyn_, near line 338 in file cpg_dyn.F90<br /> 0: Offset 0x00002610 in procedure cpg_@OL@1</code></pre>
<p>So my question for Jim is should I expect GPTL to work with OpenMP on AIX?<br />Am using xlf 13.1.0.4 for these runs.</p>
<p>Happy to forward any outputs or other to Jim, just trying to get a feel for GPTL, so far<br />am very impressed with the distribution and initial runs (IFS is a big code).</p>
<p>Now going to trying GPTLpr_summary() ...</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=24#message-242010-12-14T08:44:04ZSami Saarinensami.saarinen@inbox.com
<p>Back to Jim Rosinski :</p>
<p>It would probably make a lot of sense to merge DrHook with GPTL, eh ?</p>
<p>They both are thread-safe, produce a file per MPI-task to be postprocessed (if necessary) offline with Perl.<br />DrHook does solely rely on insertion of CALL DR_HOOK(0,'routine',handle) and CALL DR_HOOK(1,'routine',handle) corresponding to gptlstart/-stop, respectively. <br />(browse more <a class="external" href="https://redmine.dkrz.de/collaboration/attachments/40/drhook_fmi.pdf">https://redmine.dkrz.de/collaboration/attachments/40/drhook_fmi.pdf</a>). <br />Also an auto-DrHook-call -insertion Perl-script for Fortran (adopted & fixed from an ECMWF script) can be used transparently within makefiles i.e. the original source code can stay untouched.<br />DrHook signal handling "technology" could be brought into the GPTL with a minimal hassle.<br />Also memory tracing would be possible to include in GPTL, too.</p>
<p>One suggestion for manual instrumentation in GPTL: DrHook uses an 8-byte handle which upon entering the instru-region stores the address to the internal data structure to locate the data behind the named label 'region'.<br />When exiting the instru-region, the string 'region' is not anymore parsed/hashed/strcmp'red, but the handle is used directly to point the internal data structure of DrHook. Removes a lot of overhead -- probably more than a<br />fast wallclock timer (a necessity itself) gives gains.</p>
<p>Finally, I wonder if this is now getting too technical for the audience ? Perhaps we should take this offline -- or is it okay to continue in a such level of detail here ?</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=23#message-232010-12-13T18:44:36ZJim Rosinskijames.rosinski@noaa.gov
<p>Regarding Sami's question about scalability of GPTL, the only concern comes when writing the output timing files. Normal behavior is to write one output file per MPI task, then post-process those files with one of the Perl scripts included with the distribution. But if the application uses 10,000 or more MPI tasks this can cause problems. There is a function GPTLpr_summary() which summarizes timings across MPI tasks and writes to only a single file if this is an issue. Also, GPTL is thread-safe and reports per-thread timing information in a single file.</p>
<p>More on the point of overhead than scalability of GPTL, the biggest issue that might arise is calling GPTLstart()/GPTLstop() in a region with too fine a granularity (e.g. called a billion times per second). This is one reason it is important to have a lightweight underlying wallclock timer. The overhead of gettimeofday() swamps the overhead of GPTL itself. On X86-based processors which provide a fast register read for wallclock times, this overhead is much less. If you've downloaded the GPTL distribution, look for NANOTIME in the documentation and in gptl.c</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=22#message-222010-12-13T18:27:25ZJim Rosinskijames.rosinski@noaa.gov
<p>In response to Sami's 2 concerns:</p>
<p>1) what if the compiler doesn't contain hooks to the entry/end points (as in <a class="external" href="http://www.burningserver.net/rosinski/gptl/">http://www.burningserver.net/rosinski/gptl/</a>) ?</p>
<p>In these cases one would have to insert the instrumentation manually, e.g. ret=gptlstart('region') followed later by ret=gptlstop('region'). Note that manual and auto-instrumentation can be freely mixed in the same application code without confusing the library. These days most compilers do contain automatic hooks: GNU, Intel, Pathscale, PGI, AIX for sure. Not certain about Cray.</p>
<p>2) DrHook has also 2nd function: in case of error, it catches signals and attempts to produce tracebacks. Could GPTL be "taught" to do the same?</p>
<p>Interesting point. GPTL does retain current "callstack" state information. It would be easy to add a user-callable function to retrieve and print this information. More difficult to do it via signal handling.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=21#message-212010-12-13T13:22:31ZSami Saarinensami.saarinen@inbox.com
<p>Another entry: whilst at ECMWF we developed memory tracing facility into the DrHook,<br />which worked pretty well on IBM AIX system. There we intercepted Fortran90 ALLOCATE & DEALLOCATE.</p>
<p>However, IBM is pretty much the only system where we got this working -- if not the only one.</p>
<p>Recently came across work on dmalloc under <a class="external" href="http://dmalloc.com/docs/">http://dmalloc.com/docs/</a></p>
<p>Has anybody looked at this and/or experience ?</p>
<p>Namely memory leaks, growth etc. are constant problems as well.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=20#message-202010-12-13T13:18:49ZSami Saarinensami.saarinen@inbox.com
<p>Had a quick browse through the GPTL and it looks pretty solid and clearly easier to use than ("my") DrHook.</p>
<p>Two concerns: <br />1) what if the compiler doesn't contain hooks to the entry/end points (as in <a class="external" href="http://www.burningserver.net/rosinski/gptl/">http://www.burningserver.net/rosinski/gptl/</a>) ?<br />2) DrHook has also 2nd function: in case of error, it catches signals and attempts to produce tracebacks. Could GPTL be "taught" to do the same ?</p>
<p>I also gather (and hope) that GPTL will scale, which I believe DrHook may have a trouble with: going beyond say 10,000s of core w/o too much of overhead ?<br />(what is the overhead on a single core/single task case, btw?)</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=19#message-192010-12-09T14:55:18ZTom Hendersonthomas.b.henderson@noaa.gov
<p>I used GPTL+PAPI to measure flop rates, run times, and computational intensity for the tables in the GPU presentations I made at the ECMWF workshop: <a class="external" href="http://www.ecmwf.int/newsevents/meetings/workshops/2010/high_performance_computing_14th/presentations/Henderson.pdf">http://www.ecmwf.int/newsevents/meetings/workshops/2010/high_performance_computing_14th/presentations/Henderson.pdf</a><br />We also use GPTL to measure memory use.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=18#message-182010-12-08T17:41:56ZJim Rosinskijames.rosinski@noaa.gov
<p>OK I've joined the group and added an entry for GPTL on the Wiki under "Lightweight measurement tools". The entry is a very brief text file describing what the tool is, and points to the GPTL web page (<a class="external" href="http://www.burningserver.net/rosinski/gptl">www.burningserver.net/rosinski/gptl</a>). The web page contains lots of usage examples for both simple and more advanced cases. The three main reasons I use GPTL are: 1) On x86 machines a register read for wallclock times is much more accurate and much lower overhead than standard library routines like gettimeofday(); 2) With auto-instrumentation it is easy to generate a dynamic call tree of the application being profiled; and 3) Including PAPI-based performance counter information (including derived metrics like computational intensity) is trivial.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=17#message-172010-12-06T17:38:07ZGeorge MozdzynskiGeorge.Mozdzynski@ecmwf.int
<p>He is most welcome to contribute to this forum.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=16#message-162010-12-06T15:25:38ZTom Hendersonthomas.b.henderson@noaa.gov
<p>Would it be OK to add my colleague, Jim Rosinski, to this forum? He's the GPTL <br />developer and would be the best person to fill in the wiki entry for GPTL.</p> Measurement tools initiative: RE: Developing a very lightweight measurement toolhttps://redmine.dkrz.de/boards/2/topics/6?r=14#message-142010-12-06T07:48:34ZLuis Kornbluehluis.kornblueh@mpimet.mpg.de
<p>It would be nice to have a review of the cpapbilites of all the tools in the Wiki. I just started the entry <a class="wiki-page" href="https://redmine.dkrz.de/projects/raps/wiki/Lightweight_measurement_tools">Lightweight measurement tools</a>.</p>