cancel
Showing results for 
Search instead for 
Did you mean: 

SPLITTED_LOAD option in r3load

Former Member
0 Kudos

Hi All,

Has anyone tried the option SPLITTED_LOAD for importing table data parallely?

This is the documentation which i can see in the SAP note.

  • SPLITTED_LOAD ( SAP Kernel Version 7.20 and higher)
    This is an expert
    option to be able to use DB2 LOAD in parallel for a gradually exported table.
    Prerequisites:

    • The table was divided using WHERE clauses during the export and is loaded
      with multiple R3load processes.

    • The target table does not have indexes yet.

Each R3load process creates a copy of the target
table during import and uses DB2 LOAD to load this copy. Before creating the
first index on the target table, all data is copied from the table copies to the
target table using DB2 LOAD FROM CURSOR. If you use the option SPLITTED_LOAD,
you can make time when using the duplicate loading process: LOAD FROM CURSOR is
significantly faster than LOAD via the CLI client, and the loading of table
copies can happen in parallel. You can use the environment variable
DB6LOAD_TEMP_TBSPACE to specify, in which table space the table copies are
created.

A large LOAD FROM CURSOR ... SELECT statement is
used to copy the data from the table copies to the target table. The SELECT part
of this statement can be accelerated by activating SMP parallelism (SAP Note 2047006). To do so, it is
sufficient to set INTRA_PARALLEL=YES and MAX_QUERYDEGREE='ANY'. You can leave
the value for DFT_DEGREE as 1.

Accepted Solutions (0)

Answers (1)

Answers (1)

Frank-Martin
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi,

I am also very much interested in  more feedback about this option from customer tests. I got positive feedback from consulting teams but I do not have performance numbers I could share.

SPLITTED_LOAD certainly provides more room for errors if the R3load processes and tasks are not executed in the right order ( all data pieces first , primary keys and indexes last ... ) and it may require more tuning ( query degree ... ) . Also it is not clear if it will be faster compared to sequential LOAD of the data pieces on all systems for example if not enough CPU resources are available. Thats why this option is marked as an expert option.

Regards

               Frank


Former Member
0 Kudos

Hi Frank,

Guten Tag!

Thanks so much for getting back to me.

We are following the same protocol frank .First tables commented out the primary key and indexes waiting for the data load to complete.

Will this loadargs work frank?

loadArgs=-stop_on_error -loadprocedure fast SPLITTED_LOAD:COMPRESS_STATIC_ALL -nolog -c 100000

Former Member
0 Kudos

Hi Frank,

What would be the control flow?

Right now when i see it happens like this,

Loading of data into temprory tables using db2load but i dont see the data load via cursor into the main table.

When will this happen?

Frank-Martin
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi,

the loadArgs should work. As mentioned -nolog -c 100000 have no effect on LOAD.

The LOAD FROM CURSOR should happen when the primary key or first index is created by a seperate R3load process that runs after all data packages have been loaded. Although this R3load process may only create indexes it still needs to be started with "-loadprocedure fast SPLITTED_LOAD ... " . Otherwise it won't look for temporary tables to be merged.

Regards

               Frank

Former Member
0 Kudos

Hi Frank,

Yes true commit count is waste in db2load as you have mentioned, i forgot to remove it actually .

Right now i just kept ign for primary key and the indexes in the tasks till the data load is complete. Once done I will change them and retrigger the import so that the index build would happen and data merge would happen as you have told.

Once will merge into the main table take a long time? Will it be longer than the db2load into the temprorary tables?

Former Member
0 Kudos

Hallo Frank,

Guten Tag!

We do have the CPU's available, can you please throw some light on this topic which you have mentioned please.

it may require more tuning ( query degree ... )

We are ready to give it a shot.

DB6CONV is ruled for extremely large tables, we have to go for an export/import option which is much faster.

Former Member
0 Kudos

Hi Frank,

Dont see any session where the merging of data from the temprorary tables to the main table happening after the data load.

Enabled the task file and ran it with the same load args but somehow it appears to have hung. No update in the log files for a long time Frank.

Former Member
0 Kudos

Frank,

Finally it started loading of the data.

But i see the performance of merging is not that great.

What would be your advise be on the below values.




DB6LOAD_CPU_PARALLELISM=<n>


CPU_PARALLELISM parameter.


DB6LOAD_DATA_BUFFER_SIZE=<n>


BUFFER_SIZE parameter.


DB6LOAD_DISK_PARALLELISM=<n>


DISK_PARALLELISM parameter.


DB6LOAD_INDEXING_MODE =<n>


INDEXING_MODE parameter, where 0


AUTOSELECT (Default), 1 equals REBUILD, 2 equals INCREMENTAL, and 3


DEFERRED.


DB6LOAD_FORCE_LOAD




Frank-Martin
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi,

well ... tons of questions ... some of them would be better answered by migration specialists that have used SPLITTED_LOAD in migration projects. Let's try anyways.

The default for DB6LOAD_CPU_PARALLELISM is 4 in current R3load versions . This should be for the CLI LOAD into the temporary tables. If you fell that you have free CPU resources for the LOAD FROM cursor, you can try to set DB6LOAD_SPLITTED_CPU_PARALLELISM to a somewhat higher value.

I do not advice to change DB6LOAD_DISK_PARALLELISM and DB6LOAD_INDEXING_MODE for SPLITTED_LOAD. Only for sequential LOAD of data pieces into one table  DB6LOAD_INDEXING_MODE=2 is sometimes beneficial. You can set DB6LOAD_DATA_BUFFER_SIZE to a fraction of your memory available in UTIL_HEAP . Increasing UTIL_HEAP explicitly while your R3load is running may also be a good idea. For productive database UTIL_HEAP=AUTOMATIC is a good setting but for LOAD allocating some memory explicitly is a good idea. If UTIL_HEAP is large enough LOAD should be able to pick a good default for DB6LOAD_DATA_BUFFER_SIZE automatically.

Since SPLITTED_LOAD is using LOAD anyways you can ignore DB6LOAD_FORCE_LOAD.

A lot of useful hints are already contained in the note mentioned at the beginning. For example using INTRA_PARALLEL=YES is a good idea for the SPLITTED_LOAD merge phase. Also putting the temporary tables into a seperate tablespace using DB6LOAD_TEMP_TBSPACE is a good idea.

Thats almost all I know about SPLITTED_LOAD. Hope other can provide more hints...

Regards

               Frank


Former Member
0 Kudos

Frank,

Thanks so much for the information.

First what would you advise regarding CPU parallelism.

lparstat -i |grep CPU

Online Virtual CPUs                        : 40

Maximum Virtual CPUs                       : 64

Minimum Virtual CPUs                       : 1

Maximum Physical CPUs in system            : 64

Active Physical CPUs in system             : 64

Active CPUs in Pool                        : 64

Shared Physical CPUs in system             : 64

Physical CPU Percentage                    : 50.00%

Desired Virtual CPUs                       : 40

What do you think is a good figure to start with?

lparstat -i | grep 'Desired Capacity'

Desired Capacity                           : 20.00

For the other points, will respond back shortly.

Former Member
0 Kudos

Frank,

Also shall i mark the table as volatile.

"ALTER TABLE TABLNAME VOLATILE"

Frank-Martin
Product and Topic Expert
Product and Topic Expert
0 Kudos

DB6LOAD_SPLITTED_CPU_PARALLELISM = 8 may be worth a try.

Please do not overparallelize. I have seen customers using 30 parallel R3loads on a 30 core machine with DB6LOAD_CPU_PARALLELISM=automatic(30) . This resulted in very bad performance.

Regards

                Frank

Frank-Martin
Product and Topic Expert
Product and Topic Expert
0 Kudos

LOAD is not using the SQL engine. So VOLATILE will have no effect. Whenever neccessary R3load temporaily sets the VOLATILE attribute itself.

Regards

                Frank

Former Member
0 Kudos

Hi Frank,

Sure point taken, would take iterative methods first with 8  and then 16.

Have started with 30 processes and from db2top i see the below stats.

Memory                 Memory               Percent      Current         High Percent      Maximum # of

                             Type        Level      Pool                   Total         Size    WaterMark     Max         Size Pool(s)

                             ----------- ---------- -------------------- ------- ------------ ------------ ------- ------------ -------

                             Instance    DB2EP1     Monitor                0.08%        82.0M        82.3M 21866.67%       384.0K       1

                             Instance    DB2EP1     FCMBP                  0.00%       832.0K       832.0K 100.00%       832.0K       1

                             Instance    DB2EP1     Other                  0.25%       267.0M       271.6M 279.58%        95.5M       1

                             Database    EP1        Applications           0.12%       130.7M       160.1M  24.67%       530.0M     530

                             Database    EP1        Database               0.88%       946.0M       949.5M 470.39%       201.1M       1

                             Database    EP1        Lock Mgr               2.33%         2.4G         2.4G 100.00%         2.4G       1

                             Database    EP1        Utility                0.31%       337.0M       354.1M 172.58%       195.3M       1

                             Database    EP1        Package Cache          0.53%       572.0M       648.8M 108.45%       527.5M       1

                             Database    EP1        Catalog Cache          0.02%        21.3M        21.6M 213.75%        10.0M       1

                             Database    EP1        Other                  0.00%       192.0K       192.0K   0.94%        20.0M       1

                             Database    EP1        BufferPool            94.88%       100.1G       100.1G 100.00%       100.1G       5

                             Database    EP1        SharedSort             0.00%         2.2M         2.0G   0.05%         4.4G       1

                             Database    EP1        ApplShrHeap            0.04%        45.6M       151.1M  58.40%        78.1M       1

                             Application EP1        Applications           0.12%       130.7M       160.1M  24.67%       530.0M     530

                             Application EP1        Other                  0.44%       475.4M       549.3M   0.00%       103.5T     530

from topas and vmstat i dont see waits and cpu percentage being high.

Former Member
0 Kudos

Frank,

What is more important for an import.

CPU , memory.

Becuase dont see the util heap dont being occupied as well.

Three step process right:

--> Load data into temp tables

--> Merge data into the main table

--> Building indexes.

Former Member
0 Kudos

Hi Frank,

Thanks a ton for the information. That's good to know.

During the testing when checked we were able to cut short time little bit with CPU_PARALLELISM 8.

Probably increasing to 16 will enhance the performance.

One question by increasing the CPU_PARALLELISM how does it affect the load time.

Wanted to know control flow.

Frank-Martin
Product and Topic Expert
Product and Topic Expert
0 Kudos

CPU_PARALLELISM controls how many theads are working in parallel on one LOAD operations. If you have a hardware with N cores and you are running M R3load processes in parallel, I would not set CPU_PARALLELISM much higher than N/M . The R3load internal default for CPU_PARALLELISM is 4. If you would let DB2 choose an automatic default for CPU_PARALLELISM, it would take N.

Regards

               Frank

Former Member
0 Kudos

Frank,

Guten Tag!

Thanks a ton for the response.

Few questions we have basically three step process right.

--> Create temp tables

--> Load data into temp tables

--> Merge data into main table

--> Build indexes and runstats right

For the load data we can increase CPU_PARALLELISM.

Merge data happens using cursor right, how can we improve its performance?

How can we improve the performance of building indexes?

Sorry for pestering with so many questions.