'Mind the gap...' (#4 - reading query execution pl...

lbreddemann · ‎05-11-2009

In my mini-series "Mind the gap..." I will try to shed some light on where the little or big differences between MaxDB and Oracle databases are and what to keep in mind when working with them.

Today I'd like to write a bit on the pitfalls you can tap into, when trying to apply your query tuning knowledge from Oracle to your MaxDB instance.

Basically both databases do employ a so-called cost-based optimizer to figure out what access strategy will deliver the results as fast as possible.

For those of you that don't know what 'optimizer' or 'execution plan' mean, I recommed to grab the documentation of either one of the DBMS and read about it there. Here I will focus on the differences between the both systems.

The test schema

To make it easier to compare the output of the explain-statements for both DBMS I use the same test data on both databases.

Actually it is just the demo schema 'HOTEL' that comes with every MaxDB.

Here's how it looks like from a modelling point of view:

As you see it's a pretty straigth forward schema with no specialities.

To make things a bit more equals for both databases, I created unique indexes on Oracle wherever there is a primary key on MaxDB.

Also, still to make it simpler, I will just focus on the CUSTOMER and CITY tables as this should be enough to make the points.

1. Query, single table - get used to the explain commands

select * from city

This first query is really just for getting in touch with the command available to display information about the execution plan.

While for Oracle there is one command 'EXPLAIN PLAN FOR' to generate the execution plan and another one 'SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);' to display it, MaxDB directly shows the execution plan as the result of 'EXPLAIN' or 'EXPLAIN JOIN'.

And, yes, there are also other options to generate execution plans for both databases, but the ones mationed above are the most commonly used ones.

One word about the output of the Oracle EXPLAIN PLAN: when used from within SAP CCMS (ST04/ST05) you'll see SAPs own reprentation of the execution plan.

It contains less information but is more graphically oriented - anyhow, since newer releases of the SAP_BASIS also provide the Oracle output formatting I'll stick with this for this blog.

So why does MaxDB needs two commands if it displays the execution plan automatically? Because the information of both outputs differ.

In fact, MaxDBs explain facilities needed to be extended and fit to some old coding - so for now we need to live with two EXPLAINs to see all information (remark: a completely new implementaion of EXPLAIN is on it's way but likely won't be available before MaxDB 7.8).

We'll see what the both commands do later on, so let's get started with the simple query above.

Oracle

SQL> explain plan for
2  select * from city;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 3973142328

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1765 | 31770 |     5  (20)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| CITY |  1765 | 31770 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------

8 rows selected.

MaxDB

SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
LARS        CITY                        TABLE SCAN                                      11
                                           RESULT IS NOT COPIED , COSTVALUE IS          11
                                        QUERYREWRITE - APPLIED RULES:           
                                           DistinctPullUp                                1

So, what do we see here?

Oracle

Oracle shows us the access strategy (a full table scan), the accessed table (CITY) and for each line we see the expected number of rows, the bytes to be processed, a cost value and a estimation for the execution time.

An important thing to know here is that the costs basically add up with each level of execution. We'll see below what 'level of execution means here'. Anyhow, this level is shows by how far the entry in 'OPERATION' is indeded to the right.

Note that we don't see how many blocks Oracle expects to touch - this information in encoded to the COST for each line. Finally the 'TIME' column is derived from the COSTs and some system wide performance statistics.

MaxDB

MaxDB in turn shows us the used table (CITY in schema LARS) and the chosen operation on the table (TABLE SCAN here as well). Instead of COSTS or TIME we see a PAGES column.

ATTENTION: this DOES NOT contain the estimated number of pages to be accessed at this line, but it shows the SIZE of the table/index accessed by the operation in the current line. Therefore the PAGES of a multi-step execution plan DON'T add up like they do in Oracle.
Instead the MaxDB explain provides a seperate line 'COSTVALUE IS'. This and only this line gives us information about the estimated effort of the query.

The lines below 'Queryrewrite...' are just additional information - the numbers shown in the PAGES column here simply tell how often a specific rewite rule had been applied to the query. So these aren't costs either.

One word to the COSTVALUE line: this comes in two fashions:

1) 'RESULT IS NOT COPIED, COSTVALUE IS' - this means the result rows are delivered to the client on-the-fly. They arent't copied to a temporary result buffer.

2) 'RESULT IS COPIED, COSTVALUE IS' - the result has bee copied to an internal temporary storage and will be delivered from there to the client when it fetches the data.

Ok, now that we know these basics, let's move on to a bit more complicated statement:

2. Query, single table with selection criteria

Oracle

SQL> explain plan for
  2  select * from city where name='Dallas';

Explained.
SQL>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
Plan hash value: 2402407757

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |    21 |   378 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CITY            |    21 |   378 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_CITY_NAME_ZIP |    21 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NAME"='Dallas')

14 rows selected.

MaxDB

explain
select * from city where name='Dallas'

SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
LARS        CITY       I_CITY_NAME      EQUAL CONDITION FOR INDEX                       11
                       NAME                  (USED INDEX COLUMN)                
                                           RESULT IS NOT COPIED , COSTVALUE IS           2
                                        QUERYREWRITE - APPLIED RULES:           
                                           DistinctPullUp                                1

Oracle

Here we see a bit better, how the 'execution level' works. Oracle first accesses the Index I_CITY_NAME_ZIP (you guess it - it covers the NAME and ZIP columns). From the index Oracle gets all ROWIDs that fit for the WHERE condition.

There we also note, that Oracle nicely tells us for each step in the execution plan, what filters and/or access conditions are used. 'ACCESS' means here, that Oracle can use the condition to locate the data - just like it does by accessing the index with a search condition. If Oracle has to browse through all the data to find the fitting rows it will print out "filter" in the predicate information.

MaxDB

The MaxDB output also gains information: the name of the index accessed plus all used columns of the index are printed below the table access. Again the PAGES number in the line for the index access does NOT tell us the expected costs but simply show how large the index is in total. There is one exception to this rule (when a KEYACCESS is done) but we leave that out for now.

Up to here, the differences between both explain tools are remarkable but don't lead to a big difference in understanding a querys execution strategy.

But up to here we only had a look to the most simple statements there are.

Let's see how things turn out, if we apply some pressure 🙂 and do joins.

3. Query, join

Oracle

SQL> explain plan for
  2   select c.title, c.firstname, c.name, c.zip, ci.name, c.address
  3   from customer c inner join city ci on c.zip=ci.zip;

Explained.
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 4175067910
-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |  3731 |   229K|    14   (8)| 00:00:01 |
|*  1 |  HASH JOIN         |          |  3731 |   229K|    14   (8)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| CITY     |  1765 | 28240 |     4   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| CUSTOMER |  3731 |   171K|     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C"."ZIP"="CI"."ZIP")
15 rows selected.

MaxDB

explain
 select c.title, c.firstname, c.name, c.zip, ci.name, c.address
 from customer c inner join city ci on c.zip=ci.zip
 
TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
C                           TABLE SCAN                                      40
CI         ZIP              JOIN VIA KEY COLUMN                             11
                            TABLE HASHED                            
                                 NO TEMPORARY RESULTS CREATED       
                                 RESULT IS COPIED   , COSTVALUE IS         290

-------------

explain join
 select c.title, c.firstname, c.name, c.zip, ci.name, c.address
 from customer c inner join city ci on c.zip=ci.zip

STRATEGY             SCHEMANAME  TABLENAME  MAX ADDNL. ROWS  MULTIPLIER  REVERSE MULTIPLIER  ADDITIONAL PAGES  ADDITIONAL ROWS  ACCUMULATED_COSTS
                                 C          4480             1           1                   106.666666666667  4480             40
JOIN VIA SINGLE KEY              CI         2574             1           19.9332591768632    248.888888888889  4480             289.846383079439

Oracle

Now we really see how the 'execution level' works in Oracle.

The two tables are joined via the HASH JOIN. For this the first table (the upper one in the explain plan output) is read completely into memory and HASHED.

Then the second (lower) table is read completely as well and the fitting rows (the ones that fullfill the join condition) are found by HASHING the rows against the existing HASH SET.

As you see, both table scans are on the same 'level of execution' since both scas deliver rows to the same operation (the HASH JOIN).

You also note that the number of rows does not add up here, but that the HASH JOIN line really contains the number of rows expected to come out of the join. As this demo schema constraints all ZIP valus in the CUSTOMER table so that they have to be present in the CITY table and as ZIP values are unique (in fact this is the primary key of the CITY table) Oracle correctly figures that there is one fitting row from the CITY table for each row of the CUSTOMER table. Therefore the HASH JOIN will produce exactly the number of rows of the CUSTOMER table.

The COSTs in turn do add up here. So the Query costs are the cost for scanning table CUSTOMER (9) + cost for scanning table CITY (4) + a overhead for performing the actual join operation (1) = 14.

And again we see what predicates Oracle used here to find the fitting rows (yes, a HASH function IS a access method).

MaxDB

The MaxDB output for the same join is somewhat more complecated or less concentrated depending on how you like to view it.

The 'explain' statement just informs us about the order of the execution, specifically the JOIN ORDER.
Although all the other information provided by the Oracle 'explain plan' can be very usefull, the join order is the most important information about join processing
Along with this important information we get the usual information about table and index sizes as well as the total cost estimation.

To get more information, like the ones we get from Oracle, we've to use the second 'explain' command - 'explain join'.

This finally gives us the estimation about rows and pages to be processed within each step.

Anyhow, what we can rely on here is mainly the ADDITIONAL ROWS column as the other columns are filled by unrevealed internal calculations that usually don't match the intuition.

Nevertheless we see that, just like in the Oracle example, a HASH JOIN is chosen and that the number of rows in the final result set equals the number of rows in the CUSTOMER table (the difference to the number of rows in the Oracle example is simply because there ARE more customers in my MaxDB ... ).

This time the result set has been copied - which also in incorporated into the cost calculation.

Ok, we're nearly done. Let's have a look to the most complicated case (for today ;-)):

4. Query, join with selection criteria

Oracle

SQL> explain plan for
  2   select c.title, c.firstname, c.name, c.zip, ci.name, c.address
  3   from customer c inner join city ci on c.zip=ci.zip
  4   where ci.name ='Dallas';

Explained.

SQL>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
Plan hash value: 1286193604

--------------------------------------------------------------------------------------
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                 |    48 |  3024 |    10  (10)| 00:00:01 |
|*  1 |  HASH JOIN         |                 |    48 |  3024 |    10  (10)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN | I_CITY_NAME_ZIP |    21 |   336 |     0   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| CUSTOMER        |  3731 |   171K|     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("C"."ZIP"="CI"."ZIP")
   2 - access("CI"."NAME"='Dallas')

16 rows selected.

MaxDB

explain
 select c.title, c.firstname, c.name, c.zip, ci.name, c.address
 from customer c inner join city ci on c.zip=ci.zip
 where ci.name ='Dallas'

TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
CI         I_CITY_NAME      EQUAL CONDITION FOR INDEX                       11
                            ONLY INDEX ACCESSED                     
           NAME                  (USED INDEX COLUMN)                
C          I_CUSTOMER_ZIP   JOIN VIA INDEXED COLUMN                         40
           ZIP                   (USED INDEX COLUMN)                
                                 NO TEMPORARY RESULTS CREATED       
                                 RESULT IS COPIED   , COSTVALUE IS           5
------------- 
explain join
 select c.title, c.firstname, c.name, c.zip, ci.name, c.address
 from customer c inner join city ci on c.zip=ci.zip
 where ci.name ='Dallas'

STRATEGY                      SCHEMANAME  TABLENAME  MAX ADDNL. ROWS   MULTIPLIER  REVERSE MULTIPLIER  ADDITIONAL PAGES  ADDITIONAL ROWS   ACCUMULATED_COSTS
                                          CI         25.9194422927963  1           1                   1                 25.9194422927963  1
JOIN VIA MORE THAN ONE FIELD              C          4480              2           1                   2.87993803253292  51.8388845855926  4.47963144734404

Oracle

Ok, the 'explain plan' output is not much more complicated now. In fact, even with the new WHERE condition, Oracle has decided for a HASH scan.
Nevertheless, you notice that now table CITY is not touched anymore. Instead the index I_CITY_NAME_ZIP is read (see the access predicate 2) and since the index contains the ZIP data as well, it is used to build up the HASH SET for the join.

The other join part is now processed like before - a simple full table scan that is hashed to create the result set.

If you wonder, why Oracle decides for HASH JOIN here (although there iS an index on ZIP on the CUSTOMER table): it's because the estimated COSTS are lower.

Remember that this is a rather small table and Oracle uses a multiblock I/O for tablescans. So in this case it's fater to simply load all data and do the HASHING than look up ZIP number by ZIP number in the index on CUSTOMER and visit the table for each fitting record.

But attention: just these cost diferrences can lead to severe performance issues, when Oracle gets them wrong. Especially in OLTP system we usually don't want to see HASH JOINs...

MaxDB

MaxDB decides for a diffrent plan this time: first it also performs an INDEX ONLY access on the I_CITY_NAME index.

Due to the way how indexes work in MaxDB, every index always contains the full primary key - so this index does also contain the ZIP values.

With this information the join can already be done. This time via the I_CUSTOMER_ZIP index on the CUSTOMER table.

It's not written out here, but this join is a nested loop join.

Again we see: the numbers in the PAGES column DON'T add up to the COSTS line.

Instead we notice that the COSTS are the rounded 'ACCUMULATED COSTS' of the 'explain join' command.

Cheat Sheet

------------------- 8< ------------ cut here ------------ 8< ------------

Oracle	MaxDB
ONE explain statement, result needs to be displayed by: select * from table(dbms_xplan.display);	TWO explain statements, result is visible immediately.
Dense information,CBO arithmetic can be followed.	Information focus on the resulting plan - optimizer arithmetics can barely be followed.
"Costs" add up in the execution plan.	"Costs" appear in the "Costs" line ONLY. All other values in the PAGES column are the statistics of the accessed tables/indexes.
notes to know: 618868 - FAQ: Oracle performance 766349 - FAQ: Oracle SQL optimization 772497 - FAQ: Oracle Hints 176754 - Problems with CBO and RBO	notes to know: 819324 - FAQ: MaxDB SQL optimization 832544 - FAQ: MaxDB Hints
Documentation to read: Oracle Documentation - Overview of the Optimizer Oracle Scratchpad (Oracle Optimizer and Performance related blog by Jonathan Lewis)	Documentation to read: MaxDB Wiki MaxDB Documentation - SQL Optimizer

---------------------- 8< ------------ cut here ------------ 8< --------------------

Former editions of 'Mind the gap...':

'Mind the gap...' (#3 - Backup/Restore - Part 2)

'Mind the gap...' (#3 - Backup/Restore - Part 1)

'Mind the gap...' (#2 - Data storage)

'Mind the gap ...' (1# - Logon mechanisms)