The efficient handling of internal tables is one of the most important performance factors in ABAP programs. Therefore it is essential to know the runtime behavior of internal table statements. This blog describes how to measure operations on internal tables and how to ensure that the measurement results are reliable.
In ABAP, there are three table types: standard table, sorted table and hashed table; and two main types of accesses, read with index and read with key (which has some subtypes).
The expectations on table types and access regarding performance are the following:
- The fastest accesses should be independent of the table size. This behavior should be realized by the index reads on standard table and sorted table. The hashed table allows no index read, it calculates a hashed value from the table key which allows also a direct access to the searched line.
- A binary search algorithm splits in every step the search area in two parts and checks which part contains the wanted entry. It can be applied if the table has a sort order, i.e. either a sorted table or a sorted standard table. The binary search should have a logarithmic dependence on the table size. It is realized automatically by the read on a sorted table with table key.
It can also be used on standard tables by adding BINARY SEARCH at the end of the read statement. Here you must take care that the standard table is sorted in ascending order according to the used key. If the sort order is not fulfilled, then the binary search still works, but it can miss entries. Please be also aware, that a sort is an expensive operation, it should never be used inside a large loop. In principle, a table should be sorted exactly once during the execution of a program.
Both realizations of the binary search find not just any record fulfilling the search condition, but the first record according to the sort order. Therefore, they also speed up a read with key where the key is not the complete table key but only a leading part of it.
- All other reads must scan the whole table sequentially, and need an average runtime per record which is directly proportional to the size of the table. These are, all reads from standard table, if no binary search is added and no index is used, all reads from sorted tables which contain no leading fields of the table key and no index, and all reads on hashed tables which do not specify the complete table key.
This blog has two goals:
- First to demonstrate that the behavior is qualitatively as described above.
- And to establish a reliable measurement method.
In a previous blog (Runtimes of Reads and Loops on Internal Tables) the exact measurement results for several reads from internal tables have been shown.
2. Measurement Program
One might assume that in principle the measurement of a READ from an internal table is very simple. You simply use the ABAP command ‘GET RUN TIME’ as
GET RUN TIME FIELD start.
READ TABLE sort1
WITH TABLE KEY key1 = k1 key2 = k2
GET RUN TIME FIELD stop.
It is absolutely essential that the measurement does not contain operations other than the one you want to measure. This is really obvious but still often overlooked. Additionally, you will soon recognize that the results of such simple measurements show huge variation and do not match the expected behavior.
The following program has all the ingredients to measure internal table operations in a reliable way.
The explanation of the parameters, their effect and their proper setting are discussed step-by-step below.
3. The Effect of the Parameters
3.a. Variation of the Size n_max
The runtime of an operation on an internal table depends on the number of lines in the table, the machine power, the table width and other factors. So, if we measure only for one fixed table of the size n, for example n = 1000, then we do not learn much. We are mainly interested in the dependence of the runtime on the size of the internal table n, as all other parameters should not change. Therefore a variation of n is included in the test program.
The effect can be seen by running the test program with n_max = 10, pre-read off, i_max = 1, s_max = 1, l_max = 1, i.e. in default setting. For simplicity, 10 values have been predefined to cover the range from 10 to 10.000. Execute the tests several times to check whether this setting leads already to reliable data or not. The results are shown as black lines in figures 1 and 2. It is obvious that there is a lot of variation between the measurements. Also, the dependence on the table size is far from what we expect. So before we draw wrong conclusions, let us check whether the measurement can be improved any further.
3.b. Pre-read Cost of Initial Reads
The strangest effect in these first measurements is the rather strong increase of the runtime with the number lines in the table: for the hashed table we expect no increase, and for the sorted table maybe a small increase. It seems that a first read needs much more time than the subsequent ones, which are therefore a better measure for our needs. For this reason a pre-read was added, i.e. 20 reads on the table were executed before the actual measurement is done.
The effect can be seen by running the test program with n_max = 10, pre-read on, i_max = 1, s_max = 1, l_max = 1. The results are shown as orange lines in figures 1 and 2. The result is much smaller than the result of the first test, but there is still a lot of variation between the measurements.
3.c. Measurement Time Resolution Repeated Execution i_max
The measurements are now in the range of a few microseconds and therefore extremely close to the time resolution of the GET RUN TIME, which is one micro-second. So measuring one execution is not reliable, the operation must be repeated several times to get runtimes in the ranges of 50 or more microseconds. This can be done by adding a DO ... ENDDO. The cost of the empty DO … ENDDO must be deduced from the measurement.
GET RUN TIME FIELD start.
DO i_max TIMES.
READ TABLE sort1
WITH TABLE KEY key1 = k1 key2 = k2
GET RUN TIME FIELD stop.
This can be done by running the test program with n_max = 10, pre-read on, i_max = 1000, s_max = 1, l_max = 1. Note, i_max was increased until the results did change no longer. The results are shown the figures 1 and 2 as green lines. The result is much smaller than the previous results therefore the detail view was added. There is still a bit of variation in the results.
3.d. Repeats for Better Statistics s_max
To reduce the variation of the results even further, it helps to the repeat the measurements several times and calculate the average. It can be assumed that the variations are caused by some uncontrollable effects, which should have only a negative impact. i.e. they can make the execution slower but not faster. Therefore we do not average over the different executions but use the fastest execution out of several measurements.
This can be done by running the test program with s_max than 1, i.e. with n_max = 10, pre-read on, i_max = 1000, s_max = 20, l_max = 1. The results are shown in the figures 1 and 2 as blue lines. The variation decreases again.
3.e. Location Dependence l_max
It is obvious that an operation like the sequential read, which scans the table from start to end, will find an entry at the beginning faster than one at the end. In this case it is also obvious that the runtime for an entry in the middle of the table is equal to the averaged runtime. However, in the case of a read facilitating a binary search, it is not clear which line would represent the averaged runtime. In general it is much better to execute the average over the runtimes of several reads accessing different parts of the table.
This can be done by running the test program with l_max larger than 1, i.e. with n_max = 10, pre-read on, i_max = 1000, s_max = 20 and l_max = 20. The program accesses l_max different lines equidistantly distributed over the whole table. The results are shown in the figures 1 and 2 as red lines. These are the best results and do resemble our expectations quite well.
Figure 1 and Detail: Averaged runtime (in micro-sec) for the hashed read with table key for different table sizes N according to the method explained above. The different colors display different setting of the parameter values (n_max, pre-read, i_max, s_max, l_max). The blacks lines have only the table size n variation (10, off, 1, 1, 1), the orange lines add a pre-read before the measurements (10, on, 1, 1, 1), the green lines an internal i variation because of the restricted time resolution (10, on, 1000, 1, 1), the blue lines also the statistical (s) variation (10, on, 1000, 20, 1), and the red lines also a location variation (10, on, 1000, 20, 20).
Figure 2 and Detail: Averaged runtime (in micro-sec) of a sorted read with table key for different table sizes N according the method explained above. The different colors display different settings of the parameter values (n_max, pre-read, i_max, s_max, l_max). The blacks lines have only the table size n variation (10, off, 1, 1, 1), the orange lines add a pre-read before the measurements (10, on, 1, 1, 1), the green lines an internal i variation because of the restricted time resolution (10, on, 1000, 1, 1), the blue lines also the statistical (s) variation (10, on, 1000, 20, 1), and the red lines also a location variation (10, on, 1000, 20, 20).
Measuring operations on internal tables in principle seems very simple. However, to get really reliable data bit more effort must be put into the measurement. How it should be done was discussed here.
Further Reading: Performance-Optimierung von ABAP-Programmen (in German!)
More information on performance topics can be found in my new textbook on performance (published Nov 2009). However please note, that it is right now only available in German.
- Performance Tools
- Database Know-How
- Optimal Database Programming
- ABAP - Internal Tables
- Analysis and Optimization
- Programs and Processes
- Further Topics
In the book you will find detailed descriptions of all relevant performance tools. An introduction to database processing, indexes, optimizers etc. is also given. Many database statements are discussed and different alternatives are compared. The resulting recommendations are supported by ABAP test programs which you can download from the publishers webpage (see below). The importance of the buffers in the SAP system are discussed in chaptr five. Of all ABAP statements mainly the usage of internal tables is important for a good performance. With all the presented knowledge you will able to analyse your programs and optimize them. The performance implications of further topics, such as modularisation, workprocesses, remote function calls (RFCs), locks & enqueues, update tasks and prallelization are explained in the eight chapter.
Even more information - including the test programs - can be found on the webpage of the publisher.
I would recommend you especially the examples for the different database statements. The file with the test program (K4a) and necessary overview with the input numbers (K4b) can even be used, if you do not speak German!