Driven to extraction: performance pieces

eddy_declercq · ‎12-14-2007

As you may already be aware, I’m currently into Business Intelligence. The fun part for me is attempting to make extractors in as compact and a performant way as possible. I had this extractor the other day where I needed to select all the points for all the examination periods of a / all student(s) in an academic year.
Once you get to know which tables you need to use, it is – in my opinion - rather simple:

Simple doesn’t mean easy though, nor is it a reason to act hastily. There is a major snag somewhere when it comes to performance. When I first created the extractor, it’ was still running after 14 hours. And that was for a mere 298.194 records. What was the problem? Not the initial query to join HRP1001 with HRPADP506. It was the other query:

     select gen~gradescale gen~gradesym zittd~perid
                   into (gradescale,gradesym, perid)
                   from PIQDBAGR_ASSIGNM as assignm
                   INNER JOIN ZPIQDBAGR_ZITTD AS zittd
                   ON assignm~agrid = zittd~upagrid
                   INNER JOIN PIQDBAGR_GEN AS gen
                   on zittd~agrid = gen~agrid
                   where assignm~modreg_id = wa-MODREG_ID
                       and zittd~agrtype = '0105'.
               ... 
     endselect.

As such it’s no no rocket science, if you’re not afraid of creating a join on more than 2 tables. But for some reason, this query was so viscous that the engine didn’t run well anymore. All the rules were followed: keys and properly defined indexes were used where needed. If we look at the query with ST05 and explain:

You’ll see that the cost is very high due to the fact that it seems to perform a table scan for PIQDBAGR_GEN, despite the defined keys and indexes.

Even ORACLE index didn’t help. It wasn’t such a burden if a couple of students were involved, but we need to execute this query for all the courses and all the students. The fun thing is that one can force it to use indexes in a strange way:

   select  zittd~perid
       into (gradescale,gradesym, perid)
       from PIQDBAGR_ASSIGNM as assignm
       INNER JOIN ZPIQDBAGR_ZITTD AS zittd
       ON assignm~agrid = zittd~upagrid
       INNER JOIN PIQDBAGR_GEN AS gen
       on zittd~agrid = gen~agrid
       where assignm~modreg_id = wa-MODREG_ID
           and zittd~agrtype = '0105'.
   ... 
   endselect.

Which gives this as a result.

It’s still a high cost, but at least it doesn’t do a full table scan anymore.

Don’t use this at work
                      So what is the solution? Well after reading a lot of forum post, web logs and external sources, I didn’t find any solution for it. I did everything that should be done. A colleague of mine suggested to do it the ‘simple’ way: nested queries. The thought only makes me shudder. First of all, I find writing nested queries the same as writing spaghetti code - like in the old days – with a lot of gotos. No, I’m not a snob being a bit sniffy for everything which is not OO –I don’t write much OO anyway. Furthermore, I read everywhere that nested queries are only good for a small amount of data and are performance killers number 1.
            Having said that, having no alternatives I gave it a try and made nested joins (with some trickery to make it more performant):

select single agrid  into agrid from PIQDBAGR_ASSIGNM where modreg_id = wa-MODREG_ID.
    if sy-subrc eq 0.
                   select  perid agrid into (perid,agrid) from ZPIQDBAGR_ZITTD 
                                  where  upagrid = agrid and agrtype = '0105'.
                                  select  single gradescale gradesym 
                                                  into  (gradescale,gradesym) 
                                                  from  PIQDBAGR_GEN where agrid = agrid.
                                  if  sy-subrc eq 0.
                                  ....
                                  endif.
                   endselect.
     endif.

And guess what? It ran very smoothly. The extractor only needed 26 minutes this time. Compared to the unfinished job after 14 hours, it’s a huge improvement.

Conclusion
Is there one, I don’t know? Theory doesn’t match practice? Bad interaction between SAP and Oracle? Bad Oracle drivers? I really don’t know. The only moral of this story is to try everything when the tipped solution doesn’t work out.

Driven to extraction: performance pieces

Are you there, SAP? It's me, Jelena

Integration Point of MM-FI-SD in SAP ERP

SAP Project System - A ready Reference ( Part 1 )