From the Archives - Customizing SQL Requests With ...

JasonHinsperger · ‎04-02-2014

In this post, originally written by Glenn Paulley and posted to sybase.com in June of 2009, Glenn continues to talk about customizing query semantics via the use of hints. As he says in his post, hints can be incredibly useful in some situations, but it is important to note that the vast majority of the time, SQL Anywhere query optimization and execution does an incredible job of doing 'the right thing', taking into account all of the context in which it is running in order to provide performant execution of queries and DML. Using hints should only be done in specific scenarios by experienced users.

In an earlier post on SQL request hints, I described techniques to modify concurrency control semantics with a specific SQL statement. In this second post, I'd like to describe some additional hinting capabilities supported by SQL Anywherewith respect to query optimization.

Just to be clear once again, I'm not advocating the use of hints. Using hints should be done with considerable care, as index hints override the query optimizer's decision-making logic. They also can become problematic from a maintenance standpoint. My recommendation is that hints should be used only by experienced users as workarounds for specific problems.

User-specified Selectivity Estimates

SQL Anywhere has supported self-managing table and column statistics for nearly two decades, and starting with the 8.0.0 release (December 2000) utilizes self-tuning column histograms to estimate column distributions and frequent-value statistics. In addition to histograms, the SQL Anywhere query optimizer utilizes a variety of other tools, including index probes, to estimate the selectivity of a predicate. Accurate selectivity estimates assist the query optimizer in estimating the cardinality of joins and other intermediate results, and hence accurate estimates lead to higher-quality access plans.

However, as I wrote last September, there are ample opportunities to conjure SQL queries that are difficult to optimize. One example is a SQL query that contains a predicate containing a function, or some form of additional computation, such as:



SELECT COUNT(*)

FROM SalesOrders so JOIN SalesOrderItems soi ON (so.id = soi.id)

WHERE DATEDIFF(day, so.OrderDate, soi.Shipdate) > 0

The problem is that the SQL Anywhere query optimizer is unable to decompose and analyze the comparison predicate involving DATEDIFF - and so makes a guess at its selectivity, which can be seen in a graphical plan for the statement:

In the above example, the guess estimate of 25% selectivity isn't really a problem, simply because it does not affect the access plan chosen by the query optimizer: here, because both tables are already cached and resident in the buffer pool, the optimizer has chosen a straightforward nested-loop join, rendering the additional DATEDIFF predicate as residual to the join.

There are, of course, situations where a residual predicate such as the one above is problematic, particularly when there are multiple joins. In these cases, the cardinality estimation error that stems from the initial guess, coupled with a set of assumptions about intermediate result set sizes yields, at the end of the day, an extremely poor access plan due to the physical access path(s) chosen for a particular table.

In these cases, one can workaround the issue by specifying a hard-coded selectivity estimate directly into the SQL text, which actually is a long-standing feature of SQL Anywhere. To specify a user estimate, one first enables the USER_ESTIMATES connection option, and in the query's WHEREclause expresses the predicate's selectivity as a percentage directly:

SELECT COUNT(*)

FROM SalesOrders so JOIN SalesOrderItems soi ON (so.id = soi.id)

WHERE ( DATEDIFF(day, so.OrderDate, soi.Shipdate) > 0, 83.50 )

Here is what the graphical plan looks like with this modification:

Now, instead of the optimizer expecting 274 (25 percent of 1097) rows as input to the GROUP BY operator (to compute the COUNT(*)), the optimizer believes that the predicate selectivity is 83.5 percent, yielding 916 rows - which is the correct answer.

Adjusting predicate selectivity is one way to workaround selectivity estimation problems, but specifying user estimates of selectivity can itself be problematic, particularly when a query utilizes a host variable and the column distribution is skewed; in such cases specifying any hardcoded estimate is a problem.

Index Selection Table Hints

In addition to the locking table hints I mentioned previously, SQL Anywhere permits one to specify a hint that names a specific index, or indexes, to be used as the physical access path for a specific table. As with concurrency control hints, table hints on views or proxy tables are silently ignored. Here's an example:

SELECT *

FROM Customers WITH ( INDEX(CustomersKey) )

The access path hints one can specify are:

WITH ( INDEX index-name ). This hint overrides the query optimizer's access path selection algorithm. An error is produced if an index of the given name does not exist. Index hints can be used for base tables, temporary tables, and materialized views.
WITH ( INDEX index-name1, index-name2, ... ). This form of index hint permits up to four indexes to be specified. If any of the specified indexes cannot be used, or do not exist, an error is returned. Multiple indexes can be exploited by the SQL Anywhere query optimizer for multi-index retrieval.
WITH ( NO INDEX ). This hint forces a sequential scan of the table.
WITH ( INDEX index-name1, index-name2, ... ) INDEX ONLY { ON | OFF }. With this modification, one can specify the use of index-only retrieval. With INDEX ONLY ON, the query optimizer will attempt to produce an access plan that utilizes index-only retrieval with the specified indexes. If any of the specified indexes cannot be used in satisfying an index-only retrieval, an error is returned (for example, if the named index does not exist, or the indexed attributes alone cannot satisfy the query). One can specify INDEX ONLY OFF to prevent index-only retrieval.
FORCE INDEX ( index-name ) is provided for compatibility with MySQL, and has the same semantics as WITH( INDEX index-name). FORCE INDEX does not support specifying more than one index.

Hints are useful - virtually all systems offer them out of necessity, and they can solve real problems for application developers. The trick is knowing when to use them.

From the Archives - Customizing SQL Requests With Hints - Part Deux

User-specified Selectivity Estimates

Index Selection Table Hints

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win