[Oracle] DB Optimizer Part X - Looking under the h...

stefan_koehler · ‎02-19-2014

Introduction

In the first part of the "Adaptive Query Optimization" series we looked at the feature set called "Adaptive Plans", which is completely new in Oracle 12c. However Adaptive Query Optimization consists of two feature sets called "Adaptive Plans" and "Adaptive Statistics". The latter (or better said the new features of it) should be the main topic in this blog post. Some of these automatic and "under the hood" changes may have influences on your SQL execution plans and statistic runs in consequence without even knowing their existence at all. Dynamic statistics already exists to a certain point with Oracle 11g (and were even called that way since 11.2.0.4), but the terminology itself has changed and the scope was extended with Oracle 12c.

What are we talking about?

*** The right graphic is a screenshot from Kerry Osborne's presentation "Adaptive Optimization" for better illustration and differentiation

As previously mentioned we already have investigated the feature set called "Adaptive Plans" (left tree) and now we will take a closer look at the "Adaptive Statistics" on the right tree side. Please remember, that the feature set "Adaptive Plans" should improve the performance of the initial execution of a SQL and the feature set "Adaptive Statistics" should mainly improve the performance of the subsequent executions. Before we dig into the official Oracle documentation and the corresponding white papers - let's clarify some new Oracle terms first.

Oracle terminology

Oracle 12c	Oracle 11g
Statistics Feedback	Cardinality Feedback
Dynamic Statistics	Dynamic Sampling
SQL Plan Directives	Not available in 11g

Oracle Documentation

Dynamic Statistics

During the compilation of a SQL statement, the optimizer decides whether to use dynamic statistics by considering whether the available statistics are sufficient to generate an optimal execution plan. If the available statistics are insufficient, then the optimizer uses dynamic statistics to augment the statistics. One type of dynamic statistics is the information gathered by dynamic sampling. The optimizer can use dynamic statistics for table scans, index access, joins, and GROUP BY operations, thus improving the quality of optimizer decisions.

About Dynamic Statistics Levels (adjusted / new level 11)

The dynamic statistics level controls both when the database gathers dynamic statistics, and the size of the sample that the optimizer uses to gather the statistics. Set the dynamic statistics level using either the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter (dynamic statistics were called dynamic sampling in releases earlier than Oracle Database 12c) or a statement hint.

Level 11 - Use dynamic statistics automatically when the optimizer deems it necessary. The resulting statistics are persistent in the statistics repository, making them available to other queries.

When OPTIMIZER_DYNAMIC_SAMPLING is set to 11, the optimizer will use dynamic statistics to verify cardinality estimates for all SQL operators, and it will determine an internal time limit to spend verifying the estimates. There are cases where the optimizer will automatically decide to use 11, for example:

The query will run in parallel
The query was executed before and its history is available (from the cursor cache, Automatic Workload Repository, or the SQL Management Base).

Automatic Reoptimization

Whereas adaptive plans help decide between multiple subplans, they are not feasible for all kinds of plan changes. For example, a query with an inefficient join order might perform suboptimally, but adaptive plans do not support adapting the join order during execution. In these cases, the optimizer considers automatic reoptimization. In contrast to adaptive plans, automatic reoptimization changes a plan on subsequent executions after the initial execution. At the end of the first execution of a SQL statement, the optimizer uses the information gathered during execution to determine whether automatic reoptimization is worthwhile. If execution informations differs significantly from optimizer estimates, then the optimizer looks for a replacement plan on the next execution. The optimizer uses the information gathered during the previous execution to help determine an alternative plan. The optimizer can reoptimize a query several times, each time learning more and further improving the plan.

Reoptimization: Statistics Feedback

A form of reoptimization known as statistics feedback (formerly known as cardinality feedback) automatically improves plans for repeated queries that have cardinality misestimates. The optimizer can estimate cardinalities incorrectly for many reasons, such as missing statistics, inaccurate statistics, or complex predicates.

Reoptimization: Performance Feedback

Another form of reoptimization is performance feedback. This reoptimization helps improve the degree of parallelism automatically chosen for repeated SQL statements when PARALLEL_DEGREE_POLICY is set to ADAPTIVE.

SQL Plan Directives

A SQL plan directive is additional information that the optimizer uses to generate a more optimal plan. For example, during query optimization, when deciding whether the table is a candidate for dynamic statistics, the database queries the statistics repository for directives on a table. If the query joins two tables that have a data skew in their join columns, a SQL plan directive can direct the optimizer to use dynamic statistics to obtain an accurate cardinality estimate.The optimizer collects SQL plan directives on query expressions rather than at the statement level. In this way, the optimizer can apply directives to multiple SQL statements. The database automatically maintains directives, and stores them in the SYSAUX tablespace. You can manage directives using the package DBMS_SPD.

DBMS_SPD - Overview

SPD are objects generated automatically by Oracle. For example, if Oracle detects that the single table cardinality estimated made by the optimizer is different from the actual number of rows returned when accessing the table, it will automatically create a directive to perform dynamic statistics for the table. When any SQL statement referencing the table is compiled, the optimizer will perform dynamic statistics for the table to get a more accurate estimate.

Oracle White Paper - Optimizer with Oracle Database 12c

SQL plan directives

SQL plan directives are automatically created based on information learnt via Automatic Reoptimization.

Currently there is only one type of SQL plan directive, ‘DYNAMIC_SAMPLING’. This tells the optimizer that when it sees a particular query expression it should use dynamic sampling to address the cardinality misestimate. SQL plan directives are also used by Oracle to determine if extended statistics, specifically column groups, are missing and would resolve the cardinality misestimates.

After a SQL directive is used the optimizer decides if the cardinality misestimate could be resolved with a column group. If so, it will automatically create that column group the next time statistics are gathered on the appropriate table. The extended statistics will then be used in place of the SQL plan directive when possible (equality predicates, group bys etc.). If the SQL plan directive is no longer necessary it will be automatically purged after 53 weeks.

Oracle White Paper - Understanding Optimizer Statistics with Oracle Database 12c

Dynamic Statistics (previously known as dynamic sampling)

Dynamic sampling was introduced in Oracle Database 9i Release 2 to collect additional statement-specific object statistics during the optimization of a SQL statement. The most common misconception is that dynamic sampling can be used as a substitute for Optimizer statistics. The goal of dynamic sampling is to augment the existing statistics; it is used when regular statistics are not sufficient to get good quality cardinality estimates.

In Oracle Database 12c dynamic sampling has been enhanced to become dynamic statistics. Dynamic statistics allow the optimizer to augment existing statistics to get more accurate cardinality estimates for not only single table accesses but also joins and group-by predicates

So, how and when will dynamic statistics be used? During the compilation of a SQL statement, the Optimizer decides whether to use dynamic statistics or not by considering whether the available statistics are sufficient to generate a good execution plan. If the available statistics are not enough, dynamic sampling will be used. It is typically used to compensate for missing or insufficient statistics that would otherwise lead to a very bad plan. For the case where one or more of the tables in the query does not have statistics, dynamic sampling is used by the Optimizer to gather basic statistics on these tables before optimizing the statement. The second scenario where dynamic statistics can be used is when the statement contains a complex predicate expression, and extended statistics are not available, or cannot be used.

Dynamic sampling is controlled by the parameter OPTIMIZER_DYNAMIC_SAMPLING, which can be set to different levels (0-11). These levels control two different things; when dynamic sampling kicks in, and how large a sample size will be used to gather the statistics. The greater the sample size, the bigger impact dynamic sampling has on the compilation time of a query.

When set to 11 the Optimizer will automatically decide if dynamic statistics will be useful, and what dynamic sampling level will be used for SQL statements The optimizer bases its decision, to use dynamic statistics, on the complexity of the predicates used, the existing base statistics, and the total execution time expected for the SQL statement. For example, dynamic statistics will kick in for situations where the Optimizer previously would have used a guess.

Given these criteria it’s likely that when set to level 11, dynamic sampling will kick-in more often than it did before. This will extend the parse time of a statement. In order to minimize the performance impact, the results of the dynamic sampling queries will be persisted in the cache, as dynamic statistics, allowing other SQL statements to share these statistics.

So far enough from the official Oracle documentation and white papers, but that amount of basic information was necessary to understand the following content.

In the following blog part i will focus on the optimizer enhancement "SQL Plan Directives" and its implementation as it is completely new and not just renamed by Oracle :wink:

"SQL Plan Directives" demo

The following demo was run on an Oracle database (12.1.0.1) on OEL 6.4 (2.6.39-400.109.1.el6uek.x86_64) and with an Oracle schema called "TESTUSER". At first we need to clarify some specific types and states before we setup the demo data and run the SQL examples.

Types in view DBA_SQL_PLAN_DIRECTIVES:

DYNAMIC_SAMPLING is currently the only supported type for SQL plan directives (definition of base view _BASE_OPT_DIRECTIVE is "decode(type, 1, 'DYNAMIC_SAMPLING', 'UNKNOWN')")

States in view DBA_SQL_PLAN_DIRECTIVES

NEW - Newly created directive
MISSING_STATS - Directive object does not have relevant statistics (and needs extended statistics)
HAS_STATS - Objects have extended statistics
CANDIDATE - Candidate directive, the server to evaluate effectiveness
PERMANENT - A permanent directive determined by the server to be effective and useful as predicates can not be handled by extended statistics. Remember that the optimizer uses column group statistics for equality predicates, inlist predicates, and for estimating the group by cardinality.

CBO settings and dynamic statistics level



SQL> show parameter optimizer_adaptive_features

SQL> show parameter optimizer_dynamic_sampling

SQL> show parameter optimizer_features_enable

Create the base table and data set (it is the same correlated data set as in my previous blog post [Oracle] DB Optimizer Part V - Introduction of dynamic sampling and why it is used in SAP BI environ...)



SQL> create table DYNTEST (COUNTRY VARCHAR2(40), WERKS VARCHAR(20),

TEXT VARCHAR(4000));

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'DYNTEST');



SQL> select OWNER, TABLE_NAME, COLUMN_ID as CNT, COLUMN_NAME,

DATA_TYPE, DATA_LENGTH as LENGTH, AVG_COL_LEN, NUM_DISTINCT as DSCNT,

            DENSITY, NUM_NULLS, HISTOGRAM, NUM_BUCKETS as BUCKETS

     from DBA_TAB_COLUMNS

     where OWNER = 'TESTUSER' and TABLE_NAME = 'DYNTEST'

     order by OWNER, TABLE_NAME, COLUMN_ID;



SQL> select OWNER, TABLE_NAME, COLUMN_NAME, AVG_COL_LEN,

NUM_DISTINCT as DSCNT, LOW_VALUE, HIGH_VALUE, DENSITY, NUM_NULLS,

HISTOGRAM, NUM_BUCKETS as BUCKETS, LAST_ANALYZED

      from ALL_TAB_COL_STATISTICS

      where OWNER = 'TESTUSER' and TABLE_NAME = 'DYNTEST' and COLUMN_NAME like 'SYS_%'

      order by OWNER, TABLE_NAME, COLUMN_NAME;

no rows selected



SQL> select TABLE_NAME, EXTENSION_NAME, EXTENSION from DBA_STAT_EXTENSIONS

where OWNER = 'TESTUSER' and TABLE_NAME = 'DYNTEST';

no rows selected



SQL> exec DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE();

SQL> select * from DBA_SQL_PLAN_DIR_OBJECTS where OWNER = 'TESTUSER';

no rows selected

First execution of SQL with correlated columns



SQL> select /*+ gather_plan_statistics */ * from DYNTEST where COUNTRY = 'DE' and WERKS = '1200';

The cost based optimizer is way off by design and its limits (273 estimated rows in contrast to 1.000 actually returned rows).



SQL> select CHILD_NUMBER, IS_REOPTIMIZABLE from V$SQL where SQL_ID = '6jr7pwrk2tszg';

According to the documentation of column IS_REOPTIMIZABLE in view V$SQL. This columns shows whether the next execution matching this child cursor will trigger a reoptimization. The values are:

Y: If the next execution will trigger a reoptimization
R: If the child cursor contains reoptimization information, but will not trigger reoptimization because the cursor was compiled in reporting mode
N: If the child cursor has no reoptimization information

Flush SQL plan directives from memory to disk manually (automatically done every 15 minutes by default)



SQL> exec DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE();

SQL> select * from DBA_SQL_PLAN_DIR_OBJECTS where OWNER = 'TESTUSER';

A SQL plan directive was automatically created to correct the cardinality misestimate on the DYNTEST table caused by correlation between the multiple single-column predicates.



select * from DBA_SQL_PLAN_DIRECTIVES where DIRECTIVE_ID = 4423171997703515062;

This particular SQL plan directive has the state "NEW" (check detailed explanation above) and the reason for this directive is a "SINGLE TABLE CARDINALITY MISESTIMATE" which makes sense as well. Please note that this SQL plan directive was just created and not used at all.

Second execution of the same SQL with correlated columns



SQL> select /*+ gather_plan_statistics */ * from DYNTEST where COUNTRY = 'DE' and WERKS = '1200';

A new child cursor was created and the estimated rows are in the right ball park (based on statistics feedback) now, but there is still no indication for the instruction usage of SQL plan directives. However the reason for this can be found in the CBO trace file with the comment "Not using dynamic sampling since cardinality or selectivity hint present and no NDV needed". So basically the SQL plan directive instruction is already well known, but not followed due to hint OPT_ESTIMATE by statistics feedback.



SQL> select CHILD_NUMBER, IS_REOPTIMIZABLE from V$SQL where SQL_ID = '6jr7pwrk2tszg';

Flush SQL plan directives from memory to disk manually (automatically done every 15 minutes by default)



SQL> exec DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE();

SQL> select * from DBA_SQL_PLAN_DIRECTIVES where DIRECTIVE_ID = 4423171997703515062;

This particular SQL plan directive has the state "MISSING_STATS" (check detailed explanation above) now and was adjusted by the last SQL execution (with statistics feedback).

Simulate "SQL statement flush out" and execute it with existing SQL plan directive



SQL> alter system flush shared_pool;

SQL> select /*+ gather_plan_statistics */ * from DYNTEST where COUNTRY = 'DE' and WERKS = '1200';

Look at this - the row estimates are correct now right from the start (after the SQL has been aged out from the library cache and need to be parsed again). The note section states that a SQL plan directive was used and that dynamic sampling (= instruction from the SQL plan directive) was performed.

CBO trace with SQL plan directives (in MISSING_STATS state)

Gather statistics with DBMS_STATS and existing SQL plan directive



SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'DYNTEST');



SQL> select OWNER, TABLE_NAME, COLUMN_ID as CNT, COLUMN_NAME,

DATA_TYPE, DATA_LENGTH as LENGTH, AVG_COL_LEN, NUM_DISTINCT as DSCNT,

            DENSITY, NUM_NULLS, HISTOGRAM, NUM_BUCKETS as BUCKETS

     from DBA_TAB_COLUMNS

     where OWNER = 'TESTUSER' and TABLE_NAME = 'DYNTEST'

     order by OWNER, TABLE_NAME, COLUMN_ID;

The histograms on columns COUNTRY and WERKS are an "old behavior" (SYS.COL_USAGE$) and not new at this point.



SQL> select OWNER, TABLE_NAME, COLUMN_NAME, AVG_COL_LEN,

NUM_DISTINCT as DSCNT, LOW_VALUE, HIGH_VALUE, DENSITY, NUM_NULLS,

HISTOGRAM, NUM_BUCKETS as BUCKETS, LAST_ANALYZED

      from ALL_TAB_COL_STATISTICS

      where OWNER = 'TESTUSER' and TABLE_NAME = 'DYNTEST' and COLUMN_NAME like 'SYS_%'

      order by OWNER, TABLE_NAME, COLUMN_NAME;

Oracle has automatically created extended statistics (column group statistics) for the previous used correlated columns in the WHERE clause.



SQL> select TABLE_NAME, EXTENSION_NAME, EXTENSION from DBA_STAT_EXTENSIONS

where OWNER = 'TESTUSER' and TABLE_NAME = 'DYNTEST';

The extension "SYS_STSNGS_X_L#DNTEN#KCGN#426H" is based on the two columns as mentioned previously.

Flush SQL plan directives from memory to disk manually (automatically done every 15 minutes by default)



SQL> exec DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE();

SQL> select * from DBA_SQL_PLAN_DIRECTIVES where DIRECTIVE_ID = 4423171997703515062;

That is kind of strange. The last DBMS_STATS run has gathered extended statistics, but the state is still "MISSING_STATS".

Simulate "SQL statement flush out" and execute it with existing SQL plan directive and extended statistics



SQL> alter system flush shared_pool;

SQL> select /*+ gather_plan_statistics */ * from DYNTEST where COUNTRY = 'DE' and WERKS = '1200';

The same behavior as before the DBMS_STATS run. The SQL plan directive with the dynamic sampling instruction is still used instead of the available column group statistics.

Flush SQL plan directives from memory to disk manually (automatically done every 15 minutes by default)



SQL> exec DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE();

SQL> select * from DBA_SQL_PLAN_DIRECTIVES where DIRECTIVE_ID = 4423171997703515062;

But now the SQL plan directive has the state "HAS_STATS". So this basically means that the plan directive is only modified by the SQL execution itself. At last let's run the same SQL once again with this new state.

Simulate "SQL statement flush out" and execute it with existing SQL plan directive and extended statistics



SQL> alter system flush shared_pool;

SQL> select /*+ gather_plan_statistics */ * from DYNTEST where COUNTRY = 'DE' and WERKS = '1200';

Now the extended statistics are used by the cost based optimizer and the SQL plan directive is "ignored".

CBO trace with SQL plan directives (in HAS_STATS state)

Purging / Automatic clean up of SQL plan directives

SQL plan directives are automatically purged (by default), if they are not used for more than SPD_RETENTION_WEEKS.



SQL> select DBMS_SPD.GET_PREFS('SPD_RETENTION_WEEKS') from dual;

DBMS_SPD.GET_PREFS('SPD_RETENTION_WEEKS')

-------------------------------------------

53

Summary

Nowadays SQL plan directives are basically something like a persistent dynamic sampling instruction with possible further influences on DBMS_STATS runs. On one side SQL plan directives introduce a new level of complexity by understanding the CBO behavior and troubleshooting (dynamic) execution plans, but on the other side it can help to improve the performance of subsequent "first" executions (e.g. after flushed out of memory) of a SQL statement as it does not have to go through the whole "re-optimization" process after it has been aged out from library cache. However it becomes more and more important to determine the exact state at a given point in time by troubleshooting performance and execution plan issues, which can be very tricky of course.

In my experience (based on my Oracle 12c troubleshooting request so far) this feature set is not well known and the clients are surprised when they discover the "system generated" statistics extensions or the execution plan changes on subsequent first executions.

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues or in any case of Oracle 12c projects / tasks.