MaxDB Optimizer Statistics handling without NetWea...

lbreddemann · ‎05-27-2010

Ok, MaxDB is most often used with a NetWeaver on top of it, so this blog is about a niche topic of a niche product.
Wow - that should be enough understatement and un-buzzing for now.

The question of how and when to collect new optimizer statistics pops up every now and then.
Most people accept that a cost-model-based query optimizer depends on statistics that fit the data and storage usage of the tables and indexes involved in a query to come up with the best possible query execution plan.
But how can we know when the statistics are not fitting good enough anymore?

The reactive strategy would be to monitor the execution runtimes of every query in the system, waiting for runtime increases to show up and then check whether the execution plan had changed compared to the times the query ran quick enough.

Obviously this strategy is pretty labor and time intensive.

Another way would be to say: "ok, maybe I'm doing some update statistics too often, but at least this does not make execution plans worse".

This approach (and yes, sometimes the execution plan can become worse, but that's a different story) is the one employed by recommendations like "Update statisitcs at least once a week".
One improvement to this approach is to carefully choose the tables for which new statistics should be collected.
A possibly reasonable criteria for that is the change of data volume since the last update of statistics.
That means we need to compare the current size of a table (in pages) against the size it had when the statistics where last collected.
Fortunately MaxDB provides two system tables containing this information:
1. SYSINFO.FILES
Shows the current size of the table.
This is even true for uncommitted transactions.
So if you load data into your table in session A you'll be able to monitor the table growth via session B even before session A commits.

2. SYSDBA.OPTIMIZERSTATISTICS
Contains the stored optimizer statistics.

Doing this comparisation for all tables in your database manually would be bunch of monkey work, so MaxDB development decided to deliver a build-in monkey in form of a stored procedure:
SUPERDBA.SYSCHECKSTATISTICS (IN CHANGETHRESHOLD INTEGER)

This procedure does the comparison for us.
Via the CHANGETRESHOLD parameter we can specify the percentage of data volume change that should lead to new statistics.

The procedure then loops over all tables of the current user and the 'SYSDBA' schema and performs the check.
Once a table qualifies for new statistics (another reason may be that a table does not have any optimizer statistics at all) the tablename is denoted into a system table:
SYSDBA.SYSUPDSTATWANTED

If you're familiar with the automatic statistics update feature of MaxDB than this table is already known to you.
It's the same table where the MaxDB Kernel puts tablenames in when it realizes during a join, that the optimizer statistics were wrong and more data then expected.

Anyhow, apart from the automatic statistics update, there is a command for manual processing of the denoted tables present:
UPDATE STATISTICS AS PER SYSTEM TABLE

This command will read the entries from SYSUPDSTATSWANTED and run a parallelized non-blocking update statistics without sampling.

You may of course choose to use the sampling size stored for each table in the database catalog via
UPDATE STATISTICS AS PER SYSTEM TABLE ESTIMATE
but this will lead to table locks, so it's not exactly what we want to see in production systems.

Once the statistics collection is finished, you can check the result in the table
SYSDBA.SYSUPDSTATLOG

SELECT * FROM  SYSDBA.SYSUPDSTATLOG

\



\

SCHEMANAME|TABLENAME|TABLEID          |COLUMNNAME|INDEXNAME|EXECUTED_AT        |IMPLICIT|SAMPLE_PCT|SAMPLE_ROW|EXECUTION_START    |EXECUTION_END      |SESSION|TERMID            |SEQNO|

\

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

\

LARS      |T1       |000000000001A560 |          |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:35|2010-05-27 13:19:35|14692  |7200@VIENXXXXXXXXA|0    |

\

LARS      |T1       |000000000001A560 |N1        |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:35|2010-05-27 13:19:35|14692  |7200@VIENXXXXXXXXA|1    |

\

LARS      |T1       |000000000001A560 |N2        |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:35|2010-05-27 13:19:35|14692  |7200@VIENXXXXXXXXA|2    |

\

LARS      |T2       |000000000001A561 |          |         |2010-05-10 15:05:36|NO      |100       |0         |2010-05-10 15:05:36|2010-05-10 15:05:36|14299  |3532@VIENXXXXXXXXA|0    |

\

LARS      |T2       |000000000001A561 |N1        |         |2010-05-10 15:05:36|NO      |100       |0         |2010-05-10 15:05:36|2010-05-10 15:05:36|14299  |3532@VIENXXXXXXXXA|1    |

\

LARS      |ZTEST6   |000000000001A52E |          |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:23|2010-05-27 13:19:32|14686  |7200@VIENXXXXXXXXA|0    |

\

LARS      |ZTEST6   |000000000001A52E |MANDT     |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:23|2010-05-27 13:19:32|14686  |7200@VIENXXXXXXXXA|1    |

\

LARS      |ZTEST6   |000000000001A52E |OTHID     |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:23|2010-05-27 13:19:32|14686  |7200@VIENXXXXXXXXA|3    |

\

LARS      |ZTEST6   |000000000001A52E |UNIID     |         |2010-05-27 13:19:23|NO      |100       |0         |2010-05-27 13:19:23|2010-05-27 13:19:32|14686  |7200@VIENXXXXXXXXA|2    |

\

...

\



\

As we saw above, this procedure depends on having the SYSINFO.FILES information at hand.
Unfortunately, for databases that had been upgraded from a SAP DB/MaxDB version <=7.5 this information might not yet be available.

For wich tables these information are missing, you can figure out by checking table
SYSDBA.SYSUPDATECOUNTERWANTED

As long as the file counters are not present, the SYSCHECKSTATISTICS procedure consults the SYSDBM.ESTIMATED_PAGES table to get an estimation for the current table size.
This might take much longer and would not deliver precise results, but rather an estimation on the total table size.

Summing this up:

Given a MaxDB =>7.6 at a recent patch level you can easily implement a statistics maintenance strategy by running these two commands, say once a week:
--> as your SQL Schema owner:
CALL SYSCHECKSTATISTICS (40)

--> as the SYSDBA (SUPERDBA) of the database:
UPDATE STATISTICS AS PER SYSTEM TABLE

So there would be two commands to be scheduled:

dbmcli -U DB770W -USQL DB770LARS sql_execute "call SUPERDBA.SYSCHECKSTATISTICS (40)"
and
dbmcli -U DB770W -USQL DB770W sql_execute "UPDATE STATISTICS AS PER SYSTEM TABLE"

Note: DB770W is my XUSER entry for SUPERDBA and DB770LARS is for my SQL-User.
Make sure to remember that XUSER entries are case sensitive!
db770w would not work here!

I hope you like this blog and maybe this technique can be an alternative to the one shown in SDN thread doubt about UPDATE STAT COLUMN .http://forums.sdn.sap.com/thread.jspa?threadID=1668683&tstart=0

MaxDB Optimizer Statistics handling without NetWeaver

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win