Solved: Performance: reading huge amount of master data in...

Former Member · ‎02-13-2011

In our 7.0 system, each day a full load runs from DSO X to DSO Y in which from six characteristics from DSO X master data is read to about 15 fields in DSO Y contains about 2mln. records, which are all transferred each day. The master data tables all contain between 2mln. and 4mln. records. Before this load starts, DSO Y is emptied. DSO Y is write optimized.

At first, we designed this with the standard "master data reads", but this resulted in load times of 4 hours, because all master data is read with single lookups. We redesigned and fill all master data attributes in the end routine, after fillilng internal tables with the master data values corresponding to the data package:


*   Read 0UCPREMISE into temp table
    SELECT ucpremise ucpremisty ucdele_ind
      FROM /BI0/PUCPREMISE
      INTO CORRESPONDING FIELDS OF TABLE lt_0ucpremise
      FOR ALL ENTRIES IN RESULT_PACKAGE
      WHERE ucpremise EQ RESULT_PACKAGE-ucpremise.

And when we loop over the data package, we write someting like:


    LOOP AT RESULT_PACKAGE ASSIGNING <fs_rp>.
      READ TABLE lt_0ucpremise INTO ls_0ucpremise
        WITH KEY ucpremise = <fs_rp>-ucpremise
        BINARY SEARCH.
      IF sy-subrc EQ 0.
        <fs_rp>-ucpremisty = ls_0ucpremise-ucpremisty.
        <fs_rp>-ucdele_ind = ls_0ucpremise-ucdele_ind.
      ENDIF.
*all other MD reads
ENDLOOP.

So the above statement is repeated for all master data we need to read from. Now this method is quite faster (1,5 hr). But we want to make it faster. We noticed that reading in the master data in the internal tables still takes a long time, and this has to be repeated for each data package. We want to change this. We have now tried a similar method, but now load all master data in internal tables, without filtering on the data package, and we do this only once.


*   Read 0UCPREMISE into temp table
    SELECT ucpremise ucpremisty ucdele_ind
      FROM /BI0/PUCPREMISE
      INTO CORRESPONDING FIELDS OF TABLE lt_0ucpremise.

So when the first data package starts, it fills all master data values, which 95% of them we would need anyway. To accomplish that the following data packages can use the same table and don't need to fill them again, we placed the definition of the internal tables in the global part of the end routine. In the global we also write:


DATA: lv_data_loaded TYPE C LENGTH 1.

And in the method we write:


IF lv_data_loaded IS INITIAL.
  lv_0bpartner_loaded = 'X'.
* load all internal tables
lv_data_loaded = 'Y'.

WHILE lv_0bpartner_loaded NE 'Y'.
  Call FUNCTION 'ENQUEUE_SLEEP'
  EXPORTING
     seconds = 1.
ENDWHILE.

LOOP AT RESULT_PACKAGE
* assign all data
ENDLOOP.

This makes sure that another data package that already started, "sleeps" until the first data package is done with filling the internal tables.

Well this all seems to work: it takes now 10 minutes to load everything to DSO Y. But I'm wondering if I'm missing anything. The system seems to work fine loading all these records in internal tables. But any improvements or critic remarks are very welcome.

esjewett · ‎02-13-2011

This is a great question, and you've clearly done a good job of investigating this, but there are some additional things you should look at and perhaps a few things you have missed.


At first, we designed this with the standard "master data reads", but this resulted in load times of 4 hours, because all master data is read with single lookups.

This is not accurate. After SP14, BW does a prefetch and buffers the master data values used in the lookup. Note [1092539|https://service.sap.com/sap/support/notes/1092539] discusses this in detail. The important thing, and most likely the reason you are probably seeing individual master data lookups on the DB, is that you must manually maintain the MD_LOOKUP_MAX_BUFFER_SIZE parameter to be larger than the number of lines of master data (from all characteristics used in lookups) that will be read. If you are seeing one select statement per line, then something is going wrong.

You might want to go back and test with master data lookups using this setting and see how fast it goes. If memory serves, the BW master data lookup uses an approach very similar to your second example (1,5 hrs), though I think that it first loops through the source package and extracts the lists of required master data keys, which is probably faster than your statement "FOR ALL ENTRIES IN RESULT_PACKAGE" if RESULT_PACKAGE contains very many duplicate keys.

I'm guessing you'll get down to at least the 1,5 hrs that you saw in your second example, but it is possible that it will get down quite a bit further.


This makes sure that another data package that already started, "sleeps" until the first data package is done with filling the internal tables.

This sleeping approach is not necessary as only one data package will be running at a time in any given process. I believe that the "global" internal table is not be shared between parallel processes, so if your DTP is running with three parallel processes, then this table will just get filled three times. Within a process, all data packages are processed serially, so all you need to do is check whether or not it has already been filled. Or are you are doing something additional to export the filled lookup table into a shared memory location?

Actually, you have your global data defined with the statement "DATA: lv_data_loaded TYPE C LENGTH 1.". I'm not completely sure, but I don't think that this data will persist from one data package to the next. Data defined in the global section using "DATA" is global to the package start, end, and field routines, but I believe it is discarded between packages. I think you need to use "CLASS-DATA: lv_data_loaded TYPE C LENGTH 1." to get the variables to persist between packages. Have you checked in the debugger that you are really only filling the table once per request and not once per package in your current setup? << This is incorrect - see next posting for correction.

Otherwise the third approach is fine as long as you are comfortable managing your process memory allocations and you know the maximum size that your master data tables can have. On the other hand, if your master data tables grow regularly, then you are eventually going to run out of memory and start seeing dumps.

Hopefully that helps out a little bit. This was a great question. If I'm off-base with my assumptions above and you can provide more information, I would be really interested in looking at it further.

Edited by: Ethan Jewett on Feb 13, 2011 1:47 PM

Performance: reading huge amount of master data in end routine

Accepted Solutions (1)

Accepted Solutions (1)

Answers (0)

What is Capacity Units for SAP Build Code?

Re: Loading Reuse S1 Component from Manage Materia...

advance Data action to link two dimensions

Publish Message for External System

Navigation with filters inside a Fiori Elements oD...