cancel
Showing results for 
Search instead for 
Did you mean: 

How to make informed design choices - Where to find in-depth information

former_member182046
Contributor
0 Kudos

Hi Lars and other HANA experts,

Reading this thread http://scn.sap.com/thread/3298053  with a great answer by makes me raise a more general point.

Lars writes:

The statement is an SQL statement joining two table variables.

To do that, as this is SQL and not CE-function world - HANA needs to execute whatever creates the tables referenced by the table variables first and then join these materialized temporary result sets.

Frequently, when working with joins and filters, there are many different options to achieve functionally identical results. Assuming that we're breaking problems down in order to achieve at least some degree of readability, instead of one huge SQL join, we might use:

  • views - on top of views
  • views - joined with views
  • table variables - selecting from tables variables within a stored procedure
  • table variables - joined with tables variables within a stored procedure
  • similar combinations with CE functions (CE_JOIN, CE_PROJECTION, ...)

Even at the micro level, when joining tables and applying filter conditions, it is often functionally equivalent whether I filter via the ON clause in the join definition or the WHERE clause for the whole statement.

When cascading views, sometimes it is functionally equivalent (same result set) whether I apply a condition or an aggregation in the WHERE clause of the lower view in the usage hierarchy or in the higher view.

Ideally, if I knew everything about how the system parses and executes those queries, I'd strive for the perfect balance between

  • readability (transparency)
  • reusability
  • performance

(same as with most modularization problems - I could massage encapsulation and separation of concerns into the explanation, but you get my drift). 

In real life, I can judge the impacts of my design choices on readability and reusability, but know very little about the performance penalties that come with the one or the other choice. Are there any documents that contain more insights such as the one you shared above:

The statement is an SQL statement joining two table variables.

To do that, as this is SQL and not CE-function world - HANA needs to execute whatever creates the tables referenced by the table variables first and then join these materialized temporary result sets.

That's exactly the kind of information of which I need to stuff plenty into my brain until I can make informed design decisions that balance readability, reusability, and performance well. Any pointers where I can find more of that?

Thanks,

Thorsten

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

Hi Thorsten!

Really a well written question that clearly shows that you've done your homework and that gets the reader from where ever he or she currently is to your point of view.

Its far too seldom that questions get posted here in that great form.

Thanks for that!

And I fully agree with your remarks on readability and maintainability of SQL statements.

In fact, these two aspects on their own led to the majority of problems in the projects I've been involved so far.Getting back to a SQL statement that maybe somebody else wrote months ago and doing something with it is very often a nightmare.

This can eat up hours and days of developer resources, just to figure out, what the old coding was supposed to do.

Anyhow, your question was about how one can know how a statement or a procedure is actually executed.

The answer to that is: use the plan visualization and check it for the statement in question.

The tool is described in the documentation and in my blog I give some hints on how to run this thing.

To be honest with you, even I as a SAP employee use this as the primary source of information on how specific semantic constructs are executed in HANA.

Besides that, it's rather straight forward to use.

Unfortunately, there is still a great deal of documentation lacking, especially the part that would allow you and me and everybody else to establish expectations on what HANA should do.

But for the time being, the plan visualization is the best we have.

Best regards and cheers,

Lars

former_member182046
Contributor
0 Kudos

Lars,

Thanks a lot, that is even better than the kind of paper I had hoped for. Working with the plan visualization feature is fantastic training for writing HANA queries:

The visualization shows where the critical issues lie (temporary tables, inappropriately large results sets, crooked graph) and which changes are actually effective.

Within a few minutes, I was able to optimize a join of five or six tables (using two views to break down the problem logically) as follows:

  • two out of two temporary tables were eliminated from the execution plan and
  • the cost went down to 11% of the original cost.

Measures that helped in my case:

  • moving filters from JOIN conditions to WHERE clauses, thus simplifying JOIN conditions and allowing optimizations in the column engine to kick in
  • correcting the origin tables of fields in the left side of JOIN conditions

Best,

Thorsten

Former Member
0 Kudos

Hi Thorsten,

I thought I would chime in with just a few more comments. As you've noticed and as I think has been the experience of most HANA consultants/content developers, often times optimization on HANA is trial and error, and the plan visualization provided by Lars is very helpful in this regard.

That being said, there are a few general recommendations that can be found in the HANA SQL guide, SQLScript guide, and in papers published by contributors to this forum.

I imagine most if not all of the following are familiar to you, but I'll post them for the sake of others reading this forum. If anyone has questions on them let me know and I can explain further. They touch on HANA views in addition to SQL / stored procs.

1) Use generated columns (on table) in place of calculated attributes/measures *when possible/appropriate* in HANA views. Reason: eases computation in the Calculation Engine by storing calculated results and may avoid data result transfer from Join/OLAP Engine to Calculation Engine. 

2) Avoid filtering on calculated attributes when possible. Preference is to filter on regular attributes. Reason: Entire view is materialized and then results are filtered, since HANA needs to know the values of all calculated fields before it can filter them out.

3) Leverage column engine SQL operators whenever possible over row engine operators (these are described in the HANA SQL guide). Explain plan / visual plan of a query will detail which operations run in which engine.

4) If you need measures from two fact tables, combine them using 'union with constant values' rather than a join.

5) Avoid cursors and imperative logic in general in stored procedures whenever possible.

6) Instead of using OR in a WHERE clause, SELECT with one filter condition, SELECT with the other filter condition and then combine the query results using UNION ALL (a row store operation but can still be faster).

7) Don't mix CE functions and SQL syntax in a stored procedure. Stick with one or the other so that CalcEngine can optimize in the best manner possible (with big preference given to CE functions).

😎 If different branches of data flow graph stem from the same source, create multiple instances of the same source. So, if you're trying to combine sales data from last year and this year, and you have two projection nodes coming from one analytic view in your calculation view, each with a filter on year - instead, include two analytic view nodes with respective projections (and then a union node, etc). This can also be done in a scripted view but sometimes harder to notice.

Now to be clear, these are all pretty much 'rules of thumb' which should be measured against other approaches - hence 'trial and error'. In particular I've seen the following on projects:

1) Instances where OR clause leads to faster processing than the UNION ALL approach described above.

2) I've always found that if SQL syntax is required in a stored proc / scripted CalcView, it's still faster to include CE functions wherever possible (despite potential optimization problems).

3) I've seen results that show better peformance in scripted calculation views than an equivalent analytic view. Again, this is the exception rather than the rule - encouraging developers to try multiple approaches.

So, my best advice is to try multiple approaches, leverage explain plans and performance tracing, and iteratively optimize your code once functionality is validated. As for where to find best practices, in-depth information - my experience has been that it comes from multiple sources including official HANA documentation and contributor publications.

I hope that helps a bit!

Cheers,

Jody

former_member182046
Contributor
0 Kudos

Jody,

Thanks a lot - awesome advice and just the kind of rules of thumb I appreciate. Because in real life, recommendations such as "don't mix CE functions and SQL" go only so far - frequently, some SQL is inevitable, which is when more differentiated recommendations and ideally background information are helpful.

Design choices are rarely matters of black-or-white decisions; they are about finding the right tradeoff by weighing a number of advantages and disadvantages, and trying to balance different aspects against each other.

However, here's a plea: Please take your forum post, and copy and paste it into a blog. It's a fantastic piece of information for people working with HANA that should be much more prominently placed and easier to find for folks looking for this type of document than is possible in a fleeting forum post.

Best,


Thorsten

Former Member

Answers (0)