How to Measure Operations on Internal Tables

former_member194613 · ‎11-08-2007

The efficient handling of internal tables is one of the most important performance factors in ABAP programs. Therefore it is essential to know the runtime behavior of internal table statements. This blog describes how to measure operations on internal tables and how to ensure that the measurement results are reliable.

1. Introduction

In ABAP, there are three table types: standard table, sorted table and hashed table; and two main types of accesses, read with index and read with key (which has some subtypes).

The expectations on table types and access regarding performance are the following:

The fastest accesses should be independent of the table size. This behavior should be realized by the index reads on standard table and sorted table. The hashed table allows no index read, it calculates a hashed value from the table key which allows also a direct access to the searched line.
A binary search algorithm splits in every step the search area in two parts and checks which part contains the wanted entry. It can be applied if the table has a sort order, i.e. either a sorted table or a sorted standard table. The binary search should have a logarithmic dependence on the table size. It is realized automatically by the read on a sorted table with table key.
It can also be used on standard tables by adding BINARY SEARCH at the end of the read statement. Here you must take care that the standard table is sorted in ascending order according to the used key. If the sort order is not fulfilled, then the binary search still works, but it can miss entries. Please be also aware, that a sort is an expensive operation, it should never be used inside a large loop. In principle, a table should be sorted exactly once during the execution of a program.
Both realizations of the binary search find not just any record fulfilling the search condition, but the first record according to the sort order. Therefore, they also speed up a read with key where the key is not the complete table key but only a leading part of it.
All other reads must scan the whole table sequentially, and need an average runtime per record which is directly proportional to the size of the table. These are, all reads from standard table, if no binary search is added and no index is used, all reads from sorted tables which contain no leading fields of the table key and no index, and all reads on hashed tables which do not specify the complete table key.

This blog has two goals:

First to demonstrate that the behavior is qualitatively as described above.
And to establish a reliable measurement method.

In a previous blog (Runtimes of Reads and Loops on Internal Tables) the exact measurement results for several reads from internal tables have been shown.

2. Measurement Program

One might assume that in principle the measurement of a READ from an internal table is very simple. You simply use the ABAP command ‘GET RUN TIME’ as

  GET RUN TIME FIELD start.
    READ TABLE sort1
         WITH TABLE KEY key1 = k1 key2 = k2
         INTO wa1.
  GET RUN TIME FIELD stop.

It is absolutely essential that the measurement does not contain operations other than the one you want to measure. This is really obvious but still often overlooked. Additionally, you will soon recognize that the results of such simple measurements show huge variation and do not match the expected behavior.

The following program has all the ingredients to measure internal table operations in a reliable way.

*&---------------------------------------------------------------------* *& Report  Z_ITAB_TEST *&---------------------------------------------------------------------* *  Measures Runtimes of Read from Internal Tables *     for hashed table with table key and *     for sorted table with table key * *  Several variation parameters are built in: *     N_max   : to cover different table sizes 10,20,50, ... 10.000 *     Pre-Read: to exclude first executions from measurement *     I_max   : to increase time resolution *     S_Max   : to get better statistics *     L_Max   : to read lines from different location in the table * *  Last change: Nov 2007 *----------------------------------------------------------------------* REPORT zsb_itab_test                 LINE-SIZE 220.  TYPES:   BEGIN OF st_tab,     keyfield(30)   TYPE n,     c(2)           TYPE n,     ctext(168)     TYPE c,   END OF st_tab.  TYPES:   tab      TYPE STANDARD TABLE OF st_tab  WITH        KEY keyfield c,   sort     TYPE SORTED   TABLE OF st_tab  WITH UNIQUE KEY keyfield c,   hash     TYPE HASHED   TABLE OF st_tab  WITH UNIQUE KEY keyfield c.  DATA:    n_itab       TYPE  tab,    s_itab       TYPE  tab,    c10(10)      TYPE  c,    c50(50)      TYPE  c,    c250(250)    TYPE  c,    c1000(1000)  TYPE  c,    c5000(5000)  TYPE  c,    textx        TYPE  st_tab-ctext,     j_max        TYPE  i   VALUE '20',    n_i          TYPE  i,    nn           TYPE  i,    ll           TYPE  i,    l_inc        TYPE  i,    l10          TYPE  i,    start        TYPE  i,    stop         TYPE  i,     t_i          TYPE  p DECIMALS 3,    t1_s         TYPE  p DECIMALS 3,    t2_s         TYPE  p DECIMALS 3,    t3_s         TYPE  p DECIMALS 3,    t1_s_min     TYPE  p DECIMALS 3,    t2_s_min     TYPE  p DECIMALS 3,    t3_s_min     TYPE  p DECIMALS 3,    t1_l         TYPE  p DECIMALS 3,    t2_l         TYPE  p DECIMALS 3,    t3_l         TYPE  p DECIMALS 3,    tsum1_l      TYPE  p DECIMALS 3,    tsum2_l      TYPE  p DECIMALS 3,    tsum3_l      TYPE  p DECIMALS 3,    t1_n         TYPE  p DECIMALS 3,    t2_n         TYPE  p DECIMALS 3,    t3_n         TYPE  p DECIMALS 3.  c10 = '1234567890'. CONCATENATE    c10   c10   c10   c10  c10    INTO   c50. CONCATENATE    c50   c50   c50   c50  c50    INTO  c250. CONCATENATE    c250  c250  c250  c250        INTO c1000. CONCATENATE    c1000 c1000 c1000 c1000 c1000 INTO c5000. textx = c5000.  *---------------------------------------------------------------------- PARAMETERS:   n_max    TYPE i  DEFAULT '10',   preread  TYPE c  AS CHECKBOX,   i_max    TYPE i  DEFAULT '1',   s_max    TYPE i  DEFAULT '1',   l_max    TYPE i  DEFAULT '1'.    START-OF-SELECTION.    FORMAT COLOR COL_KEY INTENSIFIED ON.   WRITE: / '  '.   WRITE AT 30 'Runtime (micro-seconds)                        '.   WRITE: / '   '.   WRITE AT 10 'N'.   WRITE AT 30 'Read_1'.   WRITE AT 50 'Read_2'.   WRITE AT 70 'Offset '.   FORMAT RESET.  *-----------------------------------------------------------------------    n_i = 0. * 4. variation: size of internal tables:---------------------------------   DO n_max TIMES.     n_i = n_i + 1.  * fill internal tables of certain size:     PERFORM  fill_itab              USING     n_i              CHANGING  nn.      CLEAR tsum1_l.     CLEAR tsum2_l.     CLEAR tsum3_l.     CLEAR ll.     l_inc = nn / ( l_max + 1 ). * 3. variation: different locations:----------------------------------     WHILE ( ll<l_max ).       ll  = ll + 1.       l10 = ( l_inc * ll ) * 10.        t1_s_min = '9999999.9'.       t2_s_min = '9999999.9'.       t3_s_min = '9999999.9'. * 2. variation: the statistical repeats:------------------------------       DO s_max TIMES.  * 1. variation: internal repeats--------------------------------------         IF ( i_max EQ '1' ).           IF ( preread IS INITIAL ).             PERFORM read_hashed_tabkey_1.             t1_s = t_i.              PERFORM read_sorted_tabkey_1.             t2_s = t_i.            ELSE.             PERFORM read_hashed_tabkey_p.             t1_s = t_i.              PERFORM read_sorted_tabkey_p.             t2_s = t_i.           ENDIF.         ELSE.           PERFORM read_hashed_tabkey.           t1_s = t_i.            PERFORM read_sorted_tabkey.           t2_s = t_i.            PERFORM empty_do.           t3_s = t_i.         ENDIF. * end of 1. variation---------------------------------------------         IF ( t1_s<t1_s_min ).           t1_s_min = t1_s.         ENDIF.         IF ( t2_l<t2_s_min ).           t2_s_min = t2_s.         ENDIF.         IF ( t3_l<t3_s_min ).           t3_s_min = t3_s.         ENDIF.       ENDDO. * end of 2. variation:-------------------------------------------       t1_l = t1_s_min.       t2_l = t2_s_min.       t3_l = t3_s_min.        tsum1_l = tsum1_l + t1_l.       tsum2_l = tsum2_l + t2_l.       tsum3_l = tsum3_l + t3_l.     ENDWHILE. * end of 3. variation: location-average:-----------------------------------     t1_n = tsum1_l / l_max.     t2_n = tsum2_l / l_max.     t3_n = tsum3_l / l_max.      FORMAT COLOR COL_NEGATIVE.     WRITE: /  nn.     WRITE AT  20   t1_n.     WRITE AT  40   t2_n.     WRITE AT  60   t3_n.     FORMAT RESET.   ENDDO.  *----------------------------------------------------------------------- *  fill_itab: fills standard table *     ote the filled table should have no special order * *----------------------------------------------------------------------- FORM fill_itab      USING     n_i        TYPE i      CHANGING  nn         TYPE i.    DATA:      itab        TYPE tab,      wa1         TYPE st_tab,      count       TYPE i.  * predefined   IF     ( n_i GE 10 ).     nn = 10000.   ELSEIF ( n_i = 9 ).     nn = 5000.   ELSEIF ( n_i = 8 ).     nn = 2000.   ELSEIF ( n_i = 7 ).     nn = 1000.   ELSEIF ( n_i = 6 ).     nn = 500.   ELSEIF ( n_i = 5 ).     nn = 200.   ELSEIF ( n_i = 4 ).     nn = 100.   ELSEIF ( n_i = 3 ).     nn = 50.   ELSEIF ( n_i = 2 ).     nn = 20.   ELSEIF ( n_i = 1 ).     nn = 10.   ENDIF.    REFRESH  itab.   REFRESH  n_itab.   REFRESH  s_itab.   CLEAR    wa1.   CLEAR    count. *------------------------------------------------ * itab is build sorted !   DO nn TIMES.    count        = count + 1.     wa1-keyfield = count * 10.     wa1-c        = count * 10.     wa1-ctext    = textx.     APPEND wa1 TO itab.   ENDDO.  * s_itab is a standard table sorted by keyfield   s_itab[] = itab[]. * sort by c ( which is last digit) gives random * and might build up an index for itab   SORT itab BY c. * n_itab should have random order and no index   n_itab[] = itab[].   REFRESH itab[].  ENDFORM.                    "fill_itab  *----------------------------------------------------------------------- *  Read sorted table with table key *----------------------------------------------------------------------- FORM  read_sorted_tabkey_1.    DATA:     sort1       TYPE  sort,     wa1         TYPE  st_tab,     k1          TYPE  st_tab-keyfield,     c1          TYPE  st_tab-c.    sort1[] = s_itab[].   k1      = l10.   c1      = l10. *------------------------------------   GET RUN TIME FIELD start.   READ TABLE sort1        WITH TABLE KEY keyfield = k1 c = c1        INTO wa1.   GET RUN TIME FIELD stop. *------------------------------------   t_i = ( stop - start ).  ENDFORM.                                        "Read_Sorted_Tabkey_1 *----------------------------------------------------------------------- *  Read sorted table with table key and preread *----------------------------------------------------------------------- FORM  read_sorted_tabkey_p.    DATA:     sort1       TYPE  sort,     wa1         TYPE  st_tab,     k1          TYPE  st_tab-keyfield,     c1          TYPE  st_tab-c.    sort1[] = s_itab[].   k1      = l10.   c1      = l10.    DO j_max TIMES.     READ TABLE sort1        WITH TABLE KEY keyfield = k1 c = c1        INTO wa1.   ENDDO. *------------------------------------   GET RUN TIME FIELD start.   READ TABLE sort1        WITH TABLE KEY keyfield = k1 c = c1        INTO wa1.   GET RUN TIME FIELD stop. *------------------------------------   t_i = ( stop - start ).  ENDFORM.                                        "Read_Sorted_Tabkey_p *----------------------------------------------------------------------- *  Read sorted table with table key *----------------------------------------------------------------------- FORM  read_sorted_tabkey.    DATA:     sort1       TYPE  sort,     wa1         TYPE  st_tab,     k1          TYPE  st_tab-keyfield,     c1          TYPE  st_tab-c.    sort1[] = s_itab[].   k1      = l10.   c1      = l10.    DO j_max TIMES.     READ TABLE sort1        WITH TABLE KEY keyfield = k1 c = c1        INTO wa1.   ENDDO. *------------------------------------   GET RUN TIME FIELD start.   DO i_max TIMES.     READ TABLE sort1          WITH TABLE KEY keyfield = k1 c = c1          INTO wa1.   ENDDO.   GET RUN TIME FIELD stop. *------------------------------------   t_i = ( stop - start ) / i_max.  ENDFORM.                                         "Read_Sorted_Tabkey *----------------------------------------------------------------------- *  Read hashed table with table key *----------------------------------------------------------------------- FORM  read_hashed_tabkey_1.    DATA:    hash1       TYPE  hash,    wa1         TYPE  st_tab,    k1          TYPE  st_tab-keyfield,    c1          TYPE  st_tab-c.    hash1[] = n_itab[].   k1      = l10.   c1      = l10.  *--------------------------------   GET RUN TIME FIELD start.     READ TABLE hash1       WITH TABLE KEY keyfield = k1  c = c1       INTO wa1.    GET RUN TIME FIELD stop. *---------------------------------   t_i = ( stop - start ) .  ENDFORM.                                      "Read_Hashed_tabkey_1  *----------------------------------------------------------------------- *  Read hashed table with table key and preread *----------------------------------------------------------------------- FORM  read_hashed_tabkey_p.    DATA:    hash1       TYPE  hash,    wa1         TYPE  st_tab,    k1          TYPE  st_tab-keyfield,    c1          TYPE  st_tab-c.    hash1[] = n_itab[].   k1      = l10.   c1      = l10.    DO j_max TIMES.     READ TABLE hash1        WITH TABLE KEY keyfield = k1  c = c1        INTO wa1.   ENDDO. *--------------------------------   GET RUN TIME FIELD start.     READ TABLE hash1       WITH TABLE KEY keyfield = k1  c = c1       INTO wa1.    GET RUN TIME FIELD stop. *---------------------------------   t_i = ( stop - start ) .  ENDFORM.                                      "Read_Hashed_tabkey_P *----------------------------------------------------------------------- *  Read hashed table with table key *----------------------------------------------------------------------- FORM  read_hashed_tabkey.    DATA:    hash1       TYPE  hash,    wa1         TYPE  st_tab,    k1          TYPE  st_tab-keyfield,    c1          TYPE  st_tab-c.    hash1[] = n_itab[].   k1      = l10.   c1      = l10.    DO j_max TIMES.     READ TABLE hash1          WITH TABLE KEY keyfield = k1  c = c1          INTO wa1.   ENDDO. *--------------------------------   GET RUN TIME FIELD start.   DO i_max TIMES.     READ TABLE hash1          WITH TABLE KEY keyfield = k1  c = c1          INTO wa1.   ENDDO.   GET RUN TIME FIELD stop. *---------------------------------   t_i = ( stop - start ) / i_max. ENDFORM.                                      "Read_Hashed_tabkey *----------------------------------------------------------------------- *  Just the empty di i_max times *----------------------------------------------------------------------- FORM  empty_do.  *--------------------------------   GET RUN TIME FIELD start.   DO i_max TIMES.   ENDDO.   GET RUN TIME FIELD stop. *---------------------------------   t_i = ( stop - start ) / i_max.  ENDFORM.                                       "Empty_Do ************************************************************************

The explanation of the parameters, their effect and their proper setting are discussed step-by-step below.

3. The Effect of the Parameters

3.a. Variation of the Size n_max

The runtime of an operation on an internal table depends on the number of lines in the table, the machine power, the table width and other factors. So, if we measure only for one fixed table of the size n, for example n = 1000, then we do not learn much. We are mainly interested in the dependence of the runtime on the size of the internal table n, as all other parameters should not change. Therefore a variation of n is included in the test program.

The effect can be seen by running the test program with n_max = 10, pre-read off, i_max = 1, s_max = 1, l_max = 1, i.e. in default setting. For simplicity, 10 values have been predefined to cover the range from 10 to 10.000. Execute the tests several times to check whether this setting leads already to reliable data or not. The results are shown as black lines in figures 1 and 2. It is obvious that there is a lot of variation between the measurements. Also, the dependence on the table size is far from what we expect. So before we draw wrong conclusions, let us check whether the measurement can be improved any further.

3.b. Pre-read Cost of Initial Reads

The strangest effect in these first measurements is the rather strong increase of the runtime with the number lines in the table: for the hashed table we expect no increase, and for the sorted table maybe a small increase. It seems that a first read needs much more time than the subsequent ones, which are therefore a better measure for our needs. For this reason a pre-read was added, i.e. 20 reads on the table were executed before the actual measurement is done.

The effect can be seen by running the test program with n_max = 10, pre-read on, i_max = 1, s_max = 1, l_max = 1. The results are shown as orange lines in figures 1 and 2. The result is much smaller than the result of the first test, but there is still a lot of variation between the measurements.

3.c. Measurement Time Resolution Repeated Execution i_max

The measurements are now in the range of a few microseconds and therefore extremely close to the time resolution of the GET RUN TIME, which is one micro-second. So measuring one execution is not reliable, the operation must be repeated several times to get runtimes in the ranges of 50 or more microseconds. This can be done by adding a DO ... ENDDO. The cost of the empty DO … ENDDO must be deduced from the measurement.

  GET RUN TIME FIELD start.
    DO i_max TIMES.
      READ TABLE sort1
           WITH TABLE KEY key1 = k1 key2 = k2
           INTO wa1.
      
    ENDDO.
  GET RUN TIME FIELD stop.

This can be done by running the test program with n_max = 10, pre-read on, i_max = 1000, s_max = 1, l_max = 1. Note, i_max was increased until the results did change no longer. The results are shown the figures 1 and 2 as green lines. The result is much smaller than the previous results therefore the detail view was added. There is still a bit of variation in the results.

3.d. Repeats for Better Statistics s_max

To reduce the variation of the results even further, it helps to the repeat the measurements several times and calculate the average. It can be assumed that the variations are caused by some uncontrollable effects, which should have only a negative impact. i.e. they can make the execution slower but not faster. Therefore we do not average over the different executions but use the fastest execution out of several measurements.

This can be done by running the test program with s_max than 1, i.e. with n_max = 10, pre-read on, i_max = 1000, s_max = 20, l_max = 1. The results are shown in the figures 1 and 2 as blue lines. The variation decreases again.

3.e. Location Dependence l_max

It is obvious that an operation like the sequential read, which scans the table from start to end, will find an entry at the beginning faster than one at the end. In this case it is also obvious that the runtime for an entry in the middle of the table is equal to the averaged runtime. However, in the case of a read facilitating a binary search, it is not clear which line would represent the averaged runtime. In general it is much better to execute the average over the runtimes of several reads accessing different parts of the table.

This can be done by running the test program with l_max larger than 1, i.e. with n_max = 10, pre-read on, i_max = 1000, s_max = 20 and l_max = 20. The program accesses l_max different lines equidistantly distributed over the whole table. The results are shown in the figures 1 and 2 as red lines. These are the best results and do resemble our expectations quite well.

4. Results

Figure 1 and Detail: Averaged runtime (in micro-sec) for the hashed read with table key for different table sizes N according to the method explained above. The different colors display different setting of the parameter values (n_max, pre-read, i_max, s_max, l_max). The blacks lines have only the table size n variation (10, off, 1, 1, 1), the orange lines add a pre-read before the measurements (10, on, 1, 1, 1), the green lines an internal i variation because of the restricted time resolution (10, on, 1000, 1, 1), the blue lines also the statistical (s) variation (10, on, 1000, 20, 1), and the red lines also a location variation (10, on, 1000, 20, 20).

Figure 2 and Detail: Averaged runtime (in micro-sec) of a sorted read with table key for different table sizes N according the method explained above. The different colors display different settings of the parameter values (n_max, pre-read, i_max, s_max, l_max). The blacks lines have only the table size n variation (10, off, 1, 1, 1), the orange lines add a pre-read before the measurements (10, on, 1, 1, 1), the green lines an internal i variation because of the restricted time resolution (10, on, 1000, 1, 1), the blue lines also the statistical (s) variation (10, on, 1000, 20, 1), and the red lines also a location variation (10, on, 1000, 20, 20).

Measuring operations on internal tables in principle seems very simple. However, to get really reliable data bit more effort must be put into the measurement. How it should be done was discussed here.