#4 Profiling

eCos offers the possibilty to run profiling of the application directly on the target board. It requires the CYGPKG_PROFILE_GPROF package to be added and the HAL of your target to support it. Two profiling levels are possible:

- The flat profiling level is the simplest to implement. A fix interval timer interrupt is used to "halt" the application and store the value of the program counter. The program counter tells where the CPU was executing code when the interrupt occured. This method gives a coarse idea where the CPU time is spent in the application. For this profiling level, a timer interrupt must be implemented to trigger the profiler. The ISR of the timer interrupt must make a call to the eCos profiling function:
__profile_hit( interrupted_pc );
- The call graph profiling level uses the GCC -pg compiler option. When compiling with this option, GCC will include a jump to the _mount routine at the start of each function as described here. This routine is NOT supplied in the eCos public repository. Developers are in charge of writting their own _mcount routine. The _mcount routine must make a call to the eCos profiling function:
__profile_mcount((CYG_ADDRWORD) caller_pc, (CYG_ADDRWORD) callee_pc);

Detailed eCos profiling API here.

I. Flat profiling example

Assuming the HAL has support for the profiler interrupt, add the GPROF package to your configuration:
$ ecosconfig add CYGPKG_PROFILE_GPROF

As described here, you can start the profiler from your main() function for instance by inserting the following code:

extern char _stext[], _etext[];
profile_on(_stext, _etext, 8, 3500);

Load and run your application as usual with GDB and halt it after a while when you think enough data has been gathered. Halt GDB with [ctrl + C]

Then, from the GDB consol, type:
(gdb) source /.../packages/services/profile/gprof/current/host/gprof.gdb
(gdb) gprof_dump
quit gdb

Finally, generate the GPROF output file, in this example with the PowerPC cross compiler toolchain:
powerpc-eabi-gprof <elf_file.elf> gmon.out > dump.txt

- elf_file.elf : the application being profiled.
- gmon.out : the dump output from gprof_dump.
- dump.txt : the resulting profiling graph.

The flat profile in the dump.txt would look like the following:

Each sample counts as 0.00350877 seconds.
% cumulative self self total
time   seconds   seconds   calls  s/call   s/call  name
88.25    30.35     30.35                           hal_delay_us
11.65    34.35      4.01                           hal_if_diag_write_char
0.06     34.38      0.02                           hal_mcount
0.01     34.38      0.00                           hal_variant_idle_thread_action
0.01     34.38      0.00                           cyg_fp_get
0.01     34.39      0.00                           idle_thread_main(unsigned int)
0.01     34.39      0.00                           hal_idle_thread_action

More details how to read GPROF statistics can be found here. In addition, the Linux tools project includes a GPROF plug-in for Eclipse to extract the statistics.

II. PowerPC profiling timer and mcount implementation example

When writting the profiler interrupt routine, the main function is the HAL interface to the profiler:

hal_enable_profile_timer( int resolution )

This routine must:
Start a timer with the given resolution
Attached an ISR to the timer interrupt

The second function is the timer ISR routine:

static cyg_uint32
profile_isr(CYG_ADDRWORD vector, CYG_ADDRWORD data, HAL_SavedRegisters *regs)

This is a regular ISR routine. The common interrupt handler code will pass a pointer to the HAL_SavedRegisters structure as the third argument
The ISR must restart the timer
The ISR must call the __profile_hit routine with the PC value as argument

The profiler timer interrupt is platform dependant. This is a piece of code given as an example from:

// Profiling support - uses a separate high-speed timer

#include <cyg/hal/hal_arch.h>
#include <cyg/hal/hal_intr.h>
#include <cyg/profile/profile.h>
#include <cyg/hal/drv_api.h> // CYG_ISR_HANDLED

static int profile_period = 0;

// Profiling timer ISR
static cyg_uint32
profile_isr(CYG_ADDRWORD vector, CYG_ADDRWORD data, HAL_SavedRegisters *regs)
hal_clock_reset_gta0( CYGNUM_HAL_INTERRUPT_TGA0, profile_period);


__profile_hit( regs->pc );


hal_enable_profile_timer( int resolution )
// Run periodic timer interrupt for profile
// The resolution is specified in us

// calculate # of ticks for the resolution which is given in us.
profile_period = (CYGHWR_HAL_POWERPC_CPU_SPEED * resolution) / 8;

// Attach ISR.

// Set period and enable timer interrupts
hal_clock_initialize_gta0( profile_period );

return resolution;

The second part is the implementation of the _mcount routine. GCC includes a call to this routine at the start of each function. In this example for PowerPC architecture, the _mcount routine is implemented in assembly code. The routine does not call the eCos profiing routine directly. Instead ,because of the required SPIN lock for SMP capable targets and to ease the implementation, a C routine ( hal_mcount ) is called. This routine is handling the SPIN lock and does the call to the eCos __profile_mcount routine.

Code extracted from hal/powerpc/arch/current/src/vectors.S

## Profiling call graph support

# _mcount
# Before calling eCos profiling function, save registers
# that may be modified across function call, r3 through r10.
# f1 through f8 are not saved, the profiler does not make
# use of them.
. extern hal_mcount
. global _mcount
stwu r1,-44(r1)
# Save registers
stw r3, 8(r1)
stw r4, 12(r1)
stw r5, 16(r1)
stw r6, 20(r1)
stw r7, 24(r1)
stw r8, 28(r1)
stw r9, 32(r1)
stw r10,36(r1)
# Store callee on stack and load it in r4
mflr r4
stw r4, 40(r1)
# Load the caller in r3
lwz r3, 48(r1)
# Call profiler
bl hal_mcount
# Restore registers
lwz r3, 8(r1)
lwz r4, 12(r1)
lwz r5, 16(r1)
lwz r6, 20(r1)
lwz r7, 24(r1)
lwz r8, 28(r1)
lwz r9, 32(r1)
lwz r10,40(r1)
mtctr r10
lwz r10,36(r1)
lwz r0, 48(r1)
mtlr r0
addi r1,r1,44

Code extracted from /hal/powerpc/arch/current/src/hal_misc.c

static int mcount_nested;

externC void
hal_mcount( CYG_ADDRWORD caller, CYG_ADDRWORD callee )
int istate;
if (! mcount_nested) {
mcount_nested = 1;
__profile_mcount( caller, callee );
mcount_nested = 0;

The plf_misc.c file shall be compiled without the GCC -pg option to avoid the _mcount routine to be called from hal_mcount itself hence avoiding an infinite loop.

III. Profiling SMP target

The GPROF package allows to profile multi-core architecture in different ways. By default, the flat profile is generated for the processor core that implements the profiling timer, the call graph will include profiling for all the processor cores. By selecting CYGSEM_PROFILE_PER_CORE, it is possible to record profiling data seperatly for each core. It is left to the user to implement a profiling timer for each core of the processor. To extract the data, include the source packages/services/profile/gprof/current/host/gprof_mp.gdb and use the GDB macros gdb_mp_fetch 0 to extract profiling data for the processor core 0, gdb_mp_fetch 1 for processor core 1 etc.. Resulting profiling data shall be located in gmon_core0.out, gmon_core1.out etc...

IY. GPROF plug-in for Eclipse

To use the Eclipse plug-in, load and run the application with GDB and create the gmon.out file as described in paragraph 1. Open the gmon.out file with Eclipse. Eclipse will ask you to locate the ELF file that was used to create the GPROF statistics. Here is an example of GPROF plug-in call graph.