Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

Performance of Modify or Insert of 4.500.000 records to DB

RadimB
Advisor
Advisor
0 Kudos

Hello all,

I have a performance problem with the insertion of 4.500.000 records to the database table z_tab. The statement is as follows:

MODIFY z_tab FROM TABLE <lt_tgt>.

The table z_tab is empty, but this statement dump with the time_out error.

If I do it with following statement:

INSERT z_tab FROM TABLE <lt_tgt>.

it finished in 156 seconds. I suppose that MODIFY is slower than INSERT, but not so dramatic. Is it really a problem of slow MODIFY statement or it could be a problem of DB (settings)?

Kind regards

Radim

1 ACCEPTED SOLUTION

former_member192616
Active Contributor
0 Kudos

Hi,

the modify from table, at least on most db plattforms, can't do bulk processing. That's one

reason why it is slower. On those plattforms where no bulk processing is possible the

modify executes the following commands for each line:

update...

if not sucessful

insert

if not sucessufl

update

For an empty table you will see 4.5 million updates AND inserts (assuming there are now duplicates).

That is 9 million db statements... compared with a bulk insert... the modify has to be slower.

Kind regards,

Hermann

8 REPLIES 8

former_member192616
Active Contributor
0 Kudos

Hi,

the modify from table, at least on most db plattforms, can't do bulk processing. That's one

reason why it is slower. On those plattforms where no bulk processing is possible the

modify executes the following commands for each line:

update...

if not sucessful

insert

if not sucessufl

update

For an empty table you will see 4.5 million updates AND inserts (assuming there are now duplicates).

That is 9 million db statements... compared with a bulk insert... the modify has to be slower.

Kind regards,

Hermann

Former Member
0 Kudos

well since modify is a combined statement it uses WAY more time.

why?

well if you write modify, the DB first gotta determine if the record you are talking of exists or not (step 1).

If it exists, it doesn an update. (step 2)

if it doesnt exist it does an insert. (step 2)

when you do an insert yourself step 1 has not to be done.

naveen_inuganti2
Active Contributor
0 Kudos

Hi,

But if I want to deal with INSERT statement then record shouldn't be there in DB table.... this require additional check. I think that is nothing but program is taking MODIFY command's role here. This again lead into performance issue.

I am facing exactly same problem... (MODIFY) vs (DELETE and INSERT).

My modify command is taking 60% of overall time. How can I resolve this issue?

Regards,

Naveen

0 Kudos

Often you can rearrange your program logic so that rows to be inserted, updated or deleted are filled in separate internal tables, so that you can use the respective SQL commands.

If you are not sure whether some rows exist already, and you only want to insert but not update, try the INSERT ... ACCEPTING DUPLICATES option.

And in case you really need the MODIFY logic, then just using this statement might the better option over coding the logic yourself. Compare runtimes if in doubt (just percentages don't mean much).

Thomas

0 Kudos

Thank you Thomas.

60% runtime is from overall of 1hr 30mins.

So, original program is deleting records and inserting new records and new program is using MODIFY command to get rid of DELETE and INSERT.

Yes, most of the rows will be existing in table... except for one period i.e. first period. (This FI program is updating period wise amounts into one custom DB table. Hence first quarter run will perform INSERT only).

I think proceeding with MODIFY statement make sense as it uploads existing data (almost every time - this program runs everyday).

I still want to check with DELETE and INSERT option after latest result that I received this morning...

Case1:

This new program with MODIFY is running for 20mins and modifying 6million records. (Here target table is either empty or having 6million records)

Case2:

This new program with MODIFY is running for 180mins and modifying 6million records. (Here target table is having 200million records)

Old program with DELETE and INSERT is running for 4hrs and modifying 6million records. (Here target table is having 200million records)

Seems like MODIFY is expensive to search for the record when the table is big.. but I am not sure about. (still checking).

Regards,

Naveen

0 Kudos

That's a surprisingly big difference between Case1 and Case2.

You could also try to do an UPDATE and then an INSERT ACCEPTING DUPLICATE KEYS (which will both result in SY-SUBRC = 4, but will update resp. insert all rows where it is possible) and compare this to the MODIFY and DELETE / INSERT alternatives.

Also think about block processing of say 10K rows each, since 6 million in one go might result in redo/undo buffer filling up (which are tracking all the DB changes in case a rollback occurs later on) and thus longer runtime or even memory problems.

Interesting research, please keep posting your results here (although we hijacked the discussion, I guess...)

Thomas

0 Kudos

Sorry for the delay in my update.

Yes, I've compared 'Modify', 'Insert+Update' and 'Delete+Insert' options in our Quality system by controlling my code blocks with selection-screen inputs. I've executed around 50 background jobs, in one week, to compare results for these statements. 'Modify' and 'Insert+Update' are close, however, 'Modify' is taking less time in all the cases.

As you mentioned, I have applied another change, to modify DB table with smaller sets of data. This has significant positive impact on performance of Modify query. I had to apply change to the program flow or architecture to incorporate this change (because, prior to this change, we are using collect statement to populate data into source table of Modify - where every new record looks at all previous record).

Now, after all - If I compare this with old program, I have more than 60% improvement in performance.

I am checking if I can do anything on statements that are top on the below list. I think 40% for Modify statement is acceptable after my research (Below run is updating around 20M which is real time volume for this application).

As we can see 34% of run time to going for one SELECT query on custom table. Take a look at below high level flow of my program to understand above select query.

1. Select data from ZABC

2. Select data from Variant Table (Var1, Var2, Var3 etc.., 12 in real time)

3. Loop Variant Table

4. Select data from X, Y, Z table for Var<n>.

5. Populate final internal table from ZABC, X, Y and Z table

6. Modify ZTABLE with Final Internal table data

7. End Loop on Variant Table

As described in the flow of the program, ZABC table data is common for all the variants and need not to fetch multiple times. Hence I am doing it only once in my program. Below is that select query:


select rrcty ryear rbukrs racct rcntr sum( kslvt ) as kslvt sum( ksl01 ) as ksl01 sum( ksl02 ) as ksl02 sum( ksl03 ) as ksl03 sum( ksl04 ) as ksl04 sum( ksl05 ) as ksl05

           sum( ksl06 ) as ksl06 sum( ksl07 ) as ksl07 sum( ksl08 ) as ksl08 sum( ksl09 ) as ksl09 sum( ksl10 ) as ksl10 sum( ksl11 ) as ksl11

           sum( ksl12 ) as ksl12 sum( ksl13 ) as ksl13 sum( ksl14 ) as ksl14 sum( ksl15 ) as ksl15 sum( ksl16 ) as ksl16 sum( mslvt ) as mslvt

           sum( msl01 ) as msl01 sum( msl02 ) as msl02 sum( msl03 ) as msl03 sum( msl04 ) as msl04 sum( msl05 ) as msl05 sum( msl06 ) as msl06

           sum( msl07 ) as msl07 sum( msl08 ) as msl08 sum( msl09 ) as msl09 sum( msl10 ) as msl10 sum( msl11 ) as msl11 sum( msl12 ) as msl12

           sum( msl13 ) as msl13 sum( msl14 ) as msl14 sum( msl15 ) as msl15 sum( msl16 ) as msl16

           from zabc

            into table i_zabc

            where ryear in r_year " Two records in ranges

            and rvers = '001'

            and rrcty in r_rrcty      "Three records in ranges

            and rldnr = 'DT'

            group by rrcty ryear rbukrs racct rcntr

            order by rrcty ryear rbukrs racct rcntr.

ZABC table is again having huge volume of data and we are fetching millions of records with above query. That is primary reason to take long time. May be that is okay as there is nothing to do with Indexes of this table. But, I am not comfortable with it as it can reach max. memory point and through run time error. Fetch Cursor is one reliable option that I can see here, but with that, I should move ZABC selection inside the variant loop which can cause fetching ZABC data 12 times (Let me know If I am missing anything here).

Now, third statement in my trace results, with 10% of overall time is this:


loop at i_abc assigning <fs_abc>.

...

loop at i_table assigning <fs_table> where low <= <fs_abc>-racct and high >= <fs_abc>-racct.

...

endloop.

...

endloop.

6 million executions with this complex WHERE condition is causing this statement to get 3rd position in trace results. I tried below two options which are taking even more time -

1) Removed WHERE condition on LOW, HIGH and applied filter inside the loop.

2) Removed WHERE condition on HIGH only and applied filter inside the loop.

Any suggestions on how to proceed with ZABC selection and I_TABLE loop.

Let me know if you have any questions on above compose.

Regards,

Naveen Inuganti

0 Kudos

Hi Thomas - I've opened new discussion on above two queries as subject line of this post is not relevant to my issue.

Thank you for all your help!!

Regards,

Naveen