cancel
Showing results for 
Search instead for 
Did you mean: 

Calculated Column Issue

vivekbhoj
Active Contributor
0 Kudos

Hi Everyone,

I have a table with millions of records.

I have created a simple Calculation View with one Projection Node and some mandatory input parameters like Period, Year, Company Code and Controlling Area.

The View has more than 150 measures and 20 attributes.

After entering all the parameters, the result comes out in 15 seconds.

The Plan Viz looks like this:

Why Self Join is performed on the table?

Now when I create a Calculated column in the Projection, the report takes around 1 minute 30 seconds to display the data.

The Calculated column is pretty simple.

It just shows description for the Transactions:

case("COPA_AWTYP",'BKPF','Accounting document','COBK','CO Object: Document Header','COPA','COPA','KE28','Top Down Distribution','MKPF','Material Document','RMRP','HR Data of Repetitive Manufacturing Backflush','VBRK','Billing Document','')

and after seeing Plan Visualization I found out that for a particular month,there are around 34 million records but later on after calculations and grouping, there are around 900 million records being created

What could be the possible reason for this?

Because of this CPU and Memory Utilization is very high.

Regards,

Vivek

Accepted Solutions (0)

Answers (4)

Answers (4)

vivekbhoj
Active Contributor
0 Kudos

Hi,

Earlier, I had created the Calculated Column in aggregation node only, but because it was taking around 2 minutes, so I created it in Projection later on.

If I create calculated columns that perform some mathematical operations, then the query runs faster and shows the output in seconds.

The problem seems with Calculated columns containing strings.

Regards,

Vivek

former_member182302
Active Contributor
0 Kudos

Hi Vivek,

Can you share the screenshot of that vizplan as well with calculated column in the Aggregation node?

Regards,

Krishna Tangudu

vivekbhoj
Active Contributor
0 Kudos

Hi Krishna,

Below is the Plan

This Plan includes three calculated columns, including the one mentioned above and one calculated column for naming regions

My main concern is that data is multiplying by almost 300 times from 3 crore records to 900 crore records

Why is this happening?


Regards,

Vivek

lbreddemann
Active Contributor
0 Kudos

Hi Vivek

The self-joins in BWPopJoin13 actually is not the kind of self-join you might be thinking of.

This kind of "join" actually puts together the different selected columns of a column store table correctly.

- Lars

vivekbhoj
Active Contributor
0 Kudos

Hi Lars,


Thanks for your reply and correcting me.


Can you please explain on the behavior of calculated columns in HANA?

Does HANA handle (calculated columns containing string operations) and (calculated columns containing numeric operations) differently?

From my experience, I have seen that calculated columns that involve string operations consume lot of memory and take lot of time.

Regards,

Vivek

lbreddemann
Active Contributor
0 Kudos

Hi Vivek,

hmm... that's a broad question.


One thing to keep in mind is that numeric operations are almost always way cheaper to perform than string operations. That's true for any system, not just SAP HANA.

With numeric values you have a lot more options to postpone the operation e.g. put it after an aggregation/filter operation so that you can work on less data.

Also, numeric values often take a lot less memory space to be stored,

String values on the other hand cannot be well aggregated and tend to take a lot of memory. You also have to allocate the maximum possible memory space up front, even though the actual strings might be a lot shorter on average.

So that is what comes to my mind ad hoc.

- Lars

vivekbhoj
Active Contributor
0 Kudos

Hi Lars,

I agree with you that string operations are costly and consume more memory.

But  as seen in the first two Visualize plans that I have shared here, the number of records increased 100s of times when I added a string based calculated column because of which the memory consumption increased and the time taken to compute these records also increased.

The number of records increased in the Grouping.

Can you please shed some light on this?

Regards,

Vivek

lbreddemann
Active Contributor
0 Kudos

Hi Vivek,

can you share the planviz file?

Which revision is this running on?

- Lars

vivekbhoj
Active Contributor
0 Kudos

Hi Lars,

I will try to recreate this model once again and then share the plan visualization .

Regards,

Vivek

vivekbhoj
Active Contributor
0 Kudos

Hi Lars,

HANA is on Rev 73.

I tried to attach the planviz files but it seems it is not supported, so I have changed the extension to .txt and shared them

I have shared two Visualization Plans.

First plan "Access_Plan_Without_Calculated_Column" is based on a calculation view directly built upon table without any calculated columns.

Second plan "Access_Plan_With_Calculated_Column" is built on the same calculation view used above but with a calculated column created as mentioned below:

case("COPA_AWTYP",'BKPF','Accounting document','COBK','CO Object: Document Header','COPA','COPA','KE28','Top Down Distribution','MKPF','Material Document','RMRP','HR Data of Repetitive Manufacturing Backflush','VBRK','Billing Document','')

Please delete the planviz files from this SCN message once you download them.

Regards,

Vivek

Former Member
0 Kudos

Hello Vivek,

I think what Krishna suggested is absolutely right to improve the performance,

According to the modelling principles the performance can be increased if the calculations are done at the end node and all the aggregations are done in the initial nodes.

Regards,

Krishna

former_member182302
Active Contributor
0 Kudos

Hi Vivek,

Create that calculated column in the Aggregation node at the end instead of Projection node. That should definately save some time.

Regards,

Krishna Tangudu

vivekbhoj
Active Contributor
0 Kudos

Understood the Self join part of first query, I have 17 Attributes in the View and it has done 17 self-joins to get distinct records for all those fields

But still am unable to understand why it needs to find distinct values for the fields

Regards,

Vivek

Former Member
0 Kudos

Hello Vivek,

As per my knowledge, the usage of self join can retrieve the distinct records and serve the purpose which is something like this,

In the above case the 1st record and the 4th record are one and the same, in cases like these self join can be used to eliminate the duplicate records.

eg: select a.point_a,b.point_b,a.distance from tab_name a,tab_name b where a.point_a<b.point_b and a.distance=b.distance;

PS: Eagerly waiting for the entire explanation of the above thread regarding the time consumption.

Regards,

Krishna