Hana Adapter SDK - Adapter capabilities

werner_daehn · ‎09-16-2015

This post is part of an entire series

When designing the Data Provisioning Solution, one key element were the Adapter Capabilities. Initially the assumption of SDA was every adapter is database-like. So joins can be pushed down, every kind of filter etc. No problem if the source is a database which supports that, then the entire exercise is to translate the Hana SQL the adapter gets into the SQL syntax of the source database understands.

If the source is not a database, that would mean the adapter has to implement it. The adapter implementing a join manually? An aggregation? A substring function?? That would have been easy for the SDA team but make adapter development very difficult.

Instead the adapter does return a list of capabilities it supports at adapter, table and column level and everything else the adapter does not support is executed as a second step inside Hana.

From a technology point of view that is quite simple to do.

Example: The user executes a SQL "select column3, column5 from virtual_table where column2 = 99". This SQL is equivalent to a SQL in the form of "select column3, column5 from (select column2, column3, column5 from virtual table) where column2 = 99". Some operations are put into the inner SQL - all the ones the adapter supports - and the rest is done in the outer SQL, the one executed in Hana. The capabilities control what goes into the inner SQL versus what is outside. In this concrete example, the capabilities would tell that projects (=returning individual columns) is supported, a where clause is not.

Pushdown of filters - example

Obviously this topic of adapter capabilities can get very complex quickly. Twitter is a good example of a source which is certainly not a database. According the the API documentation it supports where clauses.

see: The Search API | Twitter Developers

At a first look the logic is straight forward

Search String	Description	SQL equivalent
watching now	containing both “watching” and “now”	where Tweet like '%watching%' and Tweet like '%now%'
"happy hour"	containing the exact phrase “happy hour”	where Tweet like '%happy hour%'
love OR hate	containing either “love” or “hate” (or both).	where Tweet like '%love%' or Tweet like '%hate%'
beer -root	containing “beer” but not “root”	where Tweet like '%beer%' and not Tweet like '%root%'

Filtering on dates is more of a challenge as

Search String	Description	SQL equivalent
since:2015-07-19	all tweets with sent date since 2015-07-19	where sent >= '2015.07.19'
until:2015-07-20	all tweets with sent date until 2015-07-20	where sent <= '2015.07.20'

Therefore a BETWEEN would work, an EQUAL also but a strict GREATER THAN cannot be pushed into the Twitter API.

Queries on long/lat are possible but only in the form of "near". No idea how to express that in pure SQL. And those are AND always, no way to have a query similar to "All Tweets where either the text contains Boston OR it is near Boston".

Pushdown - general guideline

The guideline for pushdowns has to be that the resulting dataset should be the same if the filter is pushed down to the source or applied in Hana. Although obvious, that could present problems. Is a Twitter search string "now" really the equivalent of "Tweet like '%now%'"? No, it is not. In SQL the like operator works on characters, the Twitter search on words. So the text "nowhere" would not match according to Twitter but would according to the database.

Other effects could be casing, like is 'NOW' = 'now'? Or trailing blanks 'now ' = 'now'?

When performing an order-by, what is the sort order of Hana compared to the source? A classic example are Western European characters 'Austria' < 'Österreich' = 'Oesterreich' < 'Russia'? In a binary order based on the UTF-8 character set it would be 'Austria' < 'Oesterreich' < 'Russia' < 'Österreich'.

As a consequence when I write an adapter I tend to pushdown rather less than more and if one kind of pushdown is really required, okay, then my adapter is not 100% consistent. Better inconsistent than useless because of performance reasons.

Adapter Capabilities

On the adapter level the global capabilities are set. When creating/refreshing an adapter within Hana, the Data Provisioning Server of Hana calls the getCapabilities() method which has to return a list of Capabilities. These Capabilities are then stored in the Hana Data Dictionary and control the various aspects of the Adapter.

The majority of the capabilities deal with the kind of statements the adapter supports, e.g. AdapterCapability.CAP_SELECT tells Hana that this adapter supports select queries. A CAP_INSERT would mean the adapter supports loading as well.

Other Capabilities deal with all the kind of Hana functions the Adapter is able to understand, e.g. CAP_BI_UPPER would allow Hana to even push an upper() function in the SQL sent to the adapter.

But not all is related to pushdown, e.g. CAP_METADATA_ATTRIBUTE tells Hana this adapter does wish to store additional table properties.

As said, all these Capabilities are global for this adapter.

In case it is not sufficient to have global capabilities only, there are the adapter Capabilities CAP_TABLE_CAP and CAP_COLUMN_CAP. With these the SQL optimizer of the federation layer does the extra steps of checking the table metadata and its capabilities being set on table level or even down to every column.

Of course it would be optimal to control every single of these settings on a table level and column level. The downside of such total flexibility would be the optimizer performance, the kind of logic the optimizer has to go through to create the execution plan. Therefor the majority of the capabilities can be controlled on adapter level only, important exceptions on table/column level.

Where-clause related Capabilities

The most important capabilities revolve around the pushdown of where clauses. If the amount of data is small, the absolute time difference between reading all rows and executing the filter in Hana versus returning just the matching data, is not that big. But in the majority of cases the amount should be filtered as early as possible, meaning the adapter should get the information about the filters.

To enable that the CAP_WHERE has to be set. Without, no where-clause will ever be pushed down into the adapter. Next question is the type of where clause the adapter supports. Examples with increasing complexity:

where clause	additional Capabilities to be set	Comment
where value=1	CAP_WHERE, CAP_SIMPLE_EXPR_IN_WHERE
where value > 2	CAP_WHERE, CAP_SIMPLE_EXPR_IN_WHERE, CAP_NONEQUAL_COMPARISON	includes: >, >=, <, <=, <>
where value between 1 and 3	CAP_WHERE, CAP_BETWEEN	Note: adapter will receive "value >= 1 and value <= 3" even if CAP_AND is turned off
where value in (1,2,3)	CAP_WHERE, CAP_SIMPLE_EXPR_IN_WHERE, CAP_IN	Note: adapter will receive "(value = 1 or value = 2 or value = 3)" includes: not in
where text like '%abc%'	CAP_WHERE, CAP_SIMPLE_EXPR_IN_WHERE, CAP_LIKE	includes: not like
where value = 1 and text = 'abc'	CAP_WHERE, CAP_SIMPLE_EXPR_IN_WHERE, CAP_AND_DIFFERENT_COLUMNS
where (value = 1 or value = 2) and text = 'abc'	CAP_WHERE, CAP_SIMPLE_EXPR_IN_WHERE, CAP_AND_DIFFERENT_COLUMNS, CAP_OR

Table and Column level capabilities

On table level the capabilities available to control the pushdown are

CAP_TABLE_SELECT, .._INSERT, .._UPDATE, .._DELETE, .._UPSERT
CAP_TABLE_AND, .._OR, .._AND_DIFFERENT_COLUMNS, .._OR_DIFFERENT_COLUMNS

So the idea is that some tables support reading and loading, other tables just selecting data from it. And the combinations of AND/OR one table might support can be different to another.

Remember to set the AdapterCapability.CAP_TABLE_CAP in the adapter, else these settings are not even read. And the table level capabilities are part of the table metadata, so the TableMetadata object the method importMetadata() returns should contain the table capabilities list.

If there is the requirement to have different capabilities per column, the AdapterCapability.CAP_COLUMN_CAP is set on adapter level and/or the table level capability TableCapability.CAP_TABLE_COLUMN_CAP. The list of capabilities added to each Column of a table includes:

CAP_COLUMN_FILTER
CAP_COLUMN_LIKE, CAP_COLUMN_IN, CAP_COLUMN_NONEQUAL_COMPARISON, CAP_COLUMN_BETWEEN
CAP_COLUMN_GROUP
CAP_COLUMN_INNER_JOIN, CAP_COLUMN_OUTER_JOIN
CAP_COLUMN_SORT
CAP_COLUMN_SELECT, CAP_COLUMN_UPDATE, CAP_COLUMN_UPSERT, CAP_COLUMN_INSERT

Here the idea is that one column can be selected but no filter be applied on. One column is read only, the other supports updates. And some control of the complexity of the filters supported.

Gotchas

The assumption of the capabilities is that they are static. Set once by the adapter and then never changed again. This allows to keep the capability information in the Hana data dictionary tables, e.g. in the SYS.ADAPTERS_ table the column CAPABILITIES is a BIGINT with a bitmap mask of all set adapter capabilities.

As a consequence of that, whenever the capabilities do change, the alter-adapter command has to be executed to reread that information, e.g. using this syntax

alter adapter "CapabilitiesTestAdapter" refresh at location agent "mylaptop";

But even that is not enough, the optimizer has a few more places to cache the execution plans and the such. And if table/column capabilities are used, then the information is part of the Hana virtual table. Hence it is advised to drop and recreate the virtual tables as well.

Testing

In order to simplify playing with the adapter capabilities, a CapabilityTestAdapter has been created and is available in github.

hana-native-adapters/CapabilitiesTestAdapter

This adapter has two tables, the SQLTEXT table and the ADAPTERCAPS table. Using a statement like

update v_adaptercaps set isset = 'TRUE' where CAPABILITY = 'CAP_NONEQUAL_COMPARISON';

the individual adapter capabilities can be turned on and off.

Then the Hana dictionary is refreshed using commands similar to

alter adapter "CapabilitiesTestAdapter" refresh at location agent "mylaptop";

drop table v_sqltable;

create virtual table v_sqltable at "captest"."<NULL>"."<NULL>"."SQLTABLE";

And finally the pushdown can be tested by either executing the query or looking at the explain plan of each statement.

select * from v_sqltable where rownumber>=1;

Note: The Capability settings are kept in static memory, meaning whenever the adapter is restarted, it starts with the default settings. Also keep in mind that ADAPTERCAPS table returns the capabilities set in the adapter at that moment, not the Hana data dictionary information and what is actively used by the Hana optimizer currently.

Finally, a set of four capabilities cannot be turned off because else the update statement would not work:

AdapterCapability.CAP_SELECT
AdapterCapability.CAP_UPDATE
AdapterCapability.CAP_WHERE
AdapterCapability.CAP_SIMPLE_EXPR_IN_WHERE