Currently Being Moderated
Lars Breddemann

Small changes with big effects

Posted by Lars Breddemann in SAP MaxDB on Jan 30, 2011 6:52:50 AM

For many BW performance relevant DB-features, it's really all about the details of usage and implementation when it's about the effect of those features.

The following are two examples of rather small things that went wrong which had a big impact on the system performance.

Two examples for BW on MaxDB

The first two examples are both good examples for the inherent assumptions that developers make during development.
Number 1 goes like this:

A customer has chosen MaxDB as the database platform for its SAP BW instance.
In addition to that the customer decided to go for a BIA, which is quite a clever choice, if you ask me.
Instead of having a super-expensive and maintenance intensive BW main database that maybe still would require the setup of a BIA, this customer now runs a low-cost low-maintenance main database and the perfomance intensive reporting out of the expensive but also low-maintenance BIA.

Unfortunately, it looks like nobody anticipated this combination to become popular.
Otherwise I assume report RSDDTREX_MEMORY_ESTIMATE would have been tested with MaxDB as well.

This report is used to get an estimation of the required memory for the BIA usage.
It's not too complicated and merely consists of taking the number of rows in an InfoCube and multiply this with the InfoObjects data lengths and some "magic" constants.
So far nothing special.

What's "special" is that this report still makes use of the nowadays abandoned fact-views from BW3.x-times.
Fact-views make it possible to access the data in both E- and F-fact table at once, by concatenating the sets with a UNION ALL.
That means, fact-views basically look like this:

CREATE VIEW "/BIC/V..." AS
(
SELECT col1, col2, ...
  FROM "/BIC/F...."
UNION ALL
  SELECT col1, col2, ...
  FROM "/BIC/F...."
)

From the ABAP side this eases the access since you now just have to run one query to get access to all data in an InfoCube.
Our report does the same and runs this statement:

SELECT
count(*)
FROM
  "/BIC/VMYCUBE"

The readers with some MaxDB experience might think now:
"That's great! MaxDB has it's filecounter statistics and a special COUNT(*) optimization that avoids table/index access for counting!"
And those readers are correct!

Unfortunately the COUNT(*) optimization has a severe limitation: it only works for simple statements.
That means:

  • no WHERE condition (!),
  • no JOINs (!),
  • no UNIONS/SET OPERATIONS (!),
  • no GROUP BY/ORDER BY (!)

In reality it means: NO NOTHING, just the COUNT(*).

The fact-view used here therefore couldn't take advantage of this optimization and had to do the counting via the brute-force-traditional approach: read the whole first table, read the second whole table, combine the results and count the number of rows.

The execution plan for such an IO/CPU burning process looks like this:

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/FMYCUBE      TABLE SCAN                                  1
SAPXXX   /BIC/EMYCUBE      TABLE SCAN                            1194819
INTERNAL TEMPORARY RESULT  TABLE SCAN                                  1
         SHOW                RESULT IS COPIED, COSTVALUE IS     10653812
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

The runtime of this little monster was 3 days and running ... until the database couldn't keep the huge temporary result set of approx. 10 Mio. pages (ca. 76 GB) anymore. The report finally dumped with the infamous

"POS(1) Space for result tables exhausted"

Ouch!

Fortunately the report already was prepared to handle the request without a fact view, but it wasn't enabled for MaxDB yet.
This was quickly done after a short discussion with the responsible IMS colleague and correction note
#1533676 - Long runtime of program RSDDTREX_MEMORY_ESTIMATE
was created.

The execution plans afterwards looked like this:

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/FMYCUBE      TABLE SCAN                                  1
                           COUNT OPTIMIZATION
         SHOW                RESULT IS COPIED, COSTVALUE IS            2
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

and

OWNER    TABLENAME         STRATEGY                           PAGECOUNT
SAPXXX   /BIC/EMYCUBE      TABLE SCAN                            1194819
                           COUNT OPTIMIZATION
         SHOW                RESULT IS COPIED, COSTVALUE IS            2
         SHOW              QUERYREWRITE - APPLIED RULES:
         SHOW                 DistinctPullUp                           1

And the total runtime of the report went down to a few hours (there is other stuff in there that just takes some time).

(Remark: important to understand for MaxDB execution plans is that only the COSTVALUE represents a optimizer estimation. All other PAGECOUNT values refer to the TOTAL number of pages the table or index of this specific line allocates in the database!)

If you look at the sap note with the correction, you'll find that it was a very small change that made the difference:

From this:

     IF sy-dbsys <> 'DB400'.
       APPEND g_v_tablnm_v TO g_t_tablnm.
     ELSE.
       APPEND g_v_tablnm_e TO g_t_tablnm.
       APPEND g_v_tablnm_f TO g_t_tablnm.


to this:

IF sy-dbsys = 'DB400' OR sy-dbsys = 'ADABAS D'.
       APPEND g_v_tablnm_e TO g_t_tablnm.
       APPEND g_v_tablnm_f TO g_t_tablnm.
     ELSE.
       APPEND g_v_tablnm_v TO g_t_tablnm.

Knock, knock, any data in there?

The second example is not only 'special' on the MaxDB port, but on all databases.
However, for MaxDB the effect was the worst, due to certain limitations of SQL optimization.

SAP BW is a data warehouse and therfore a lot of the functionality is there to handle data, to store and move data and to get rid of data.
These tasks bring with them the necessity to sometimes drop a table and rebuild it, e.g. when you change an InfoObject-definition.

But before merely dropping tables,BW is cautios and asks"Hey, any data in this table?".
And indeed, there is a function module called RSDU_DATA_EXISTS_TABLE that answers this question.

Now, before proceeding, ask yourself: how would YOU try to answer this question in SQL?
A common first approach would be: count the number of rows in the table and if it's larger then 0 then there is some data in the table.
Correct!
But given the fact that counting the actual number of rows in a table really can takes ages (see the example above), this is the second worst idea to approach the issue (and I admit that it was also the first I thought up).

The worst idea I've seen so far, is what was actually implementd in the function module:

SELECT bname FROM usr01 CLIENT SPECIFIED UP TO 1 ROWS INTO :test
     WHERE EXISTS ( SELECT * FROM (i_tablnm) CLIENT SPECIFIED ).
  ENDSELECT.

Let's see if we can figure out, what this statement should do.
In english it means:

  • Give me the column BNAME
  • from the table USR01 for at most one row
  • for which the set of all rows in table I_TABLNM (this is the one we want to know of whether it's empty or not) contains something.

This is just amazing!

As you can imagine, MaxDB will first create a temporary result set for the exists clause (that is full table copy) and then returns just one row.
If the I_TABLNM table is not empty, this can easily become a similar problem as the example above.

Now, of course there is a much better way to do this.
If you think about it, all we want is a YES (there's data in there) or a NO (nope, all empty) and this can be done as well as SAP note #1542839 - "Performance Optimization in RSDU_DATA_EXISTS_TABLE_ADA" nicely demonstrates:

SELECT 'X' FROM (i_tablnm) WHERE ROWNUM <= 1

This means: "Database, go and get me an X for the first row that you hit in the table and stop afterwards!"
Regardless how you process this statement, in the worst case it will end after a few (1-4) page visits.
The database may even use a index-only strategy, since NO data from the table needs to be fetched (just a constant).

There are of course similar examples for other DBMS as well, but for the sake of a digestable blog post size, I'll save them for later posts. 

Comments

Actions

Filter Blog

By author: By date:
By tag: