Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
lbreddemann
Active Contributor
0 Kudos
based on SAP HANA revision 74

>>> update 09.02.2016

>>> if you are trying to work on a similar problem, please make sure to also review the solutions posted in discussion Fetch the first record with 24 hr interval based on first record timestamp

>>> The solutions posted there are much more efficient and solve the problem better in multiple ways.

>>> update 09.02.2016

Ok, I pondered about this one a while and I did not find a satisfying solution.

So I thought: let's take this to the community and see what others can come up with.

But let's start at the beginning.

A few days ago a colleague send me an email asking for advice:

I inserted the colors and markers to make the problem a bit easier to understand.

Basically, the report should aggregate shopping transactions based on the time relation between each of them.

For every transaction of every customer there is, it needs to be checked if there had been a transaction happening within a time frame of 14 days before.

If there is at least one such transaction, these two now belong to the same group.

This of course accumulates like a rolling sum, so whenever a transaction still 'glues' to the group via the 14 days range rule, it needs to be included in the group.


In the final result however, the date of the very first and very last date of transaction in this group (or chain) of transactions need to be shown along the SUM of all AMOUNTs.


It took me quite some thinking, but after a while I came up with nested window functions - something like this:

At this point I was nearly sure to have nailed it.

Just bring down the START_DATE to the NULLed columns and GROUP BY CUST_ID, START_DATE, right?

Wrong. Unfortunately :sad:

Unfortunately there is no way to get the the first not null value from the preceding records.


Without that, there is no combination of columns available to perform the grouping as required.

We're toast!


So, the solution I found is this:

drop type customers_tt;

create type customers_tt as table

(cust_id integer, from_date date, to_date date, amount decimal(10,2));

create global temporary column table f14_custom_group ( cust_id integer

, group_id integer

, tx_date date

, amount decimal (10,2));


drop procedure forth_nightly_shopping;

create procedure forth_nightly_shopping ( OUT result customers_tt)

language sqlscript

as

begin

declare cur_group_start date := to_date('01-01-1950' , 'DD-MM-YYYY');

declare cur_group int := -1;

declare cur_group_cust_id int := -1;

declare cur_cust_id int := -1;

declare cur_tx_date date;

declare last_tx_date date := to_date('01-01-1950' , 'DD-MM-YYYY');

declare cur_amount decimal (10,2);

declare cursor c_pre_aggr for select cust_id, tx_date, sum(amount) as amount

                            from customers

                            group by cust_id, tx_date

                            order by cust_id, tx_date;

truncate table  f14_custom_group;

                            

for cur_row as c_pre_aggr do

cur_tx_date := cur_row.tx_date;

cur_cust_id := cur_row.cust_id;

cur_amount := cur_row.amount;

-- new group check: (new cust_id) or 

-- (old cust_id but tx_date is more than 14 days awys from last transaction date)

if cur_cust_id <> cur_group_cust_id then

-- new customer -> new group

        -- increase the group id counter

-- set the current transaction date to be the start date for the group

-- set the currrent customer id to the current customer id

            cur_group := cur_group + 1;

            cur_group_cust_id := cur_cust_id;

            cur_group_start := cur_tx_date;

elseif   cur_cust_id = cur_group_cust_id

             and cur_tx_date >= add_days (last_tx_date, 14) then

         -- old customer

         -- but new group   

            cur_group := cur_group + 1;

            cur_group_start := cur_tx_date;

end if;

             

insert into f14_custom_group(cust_id    , group_id , tx_date    , amount)

                                     values (cur_cust_id, cur_group, cur_tx_date, cur_amount);

last_tx_date := cur_tx_date;

end for;

result = select cust_id

               , min (tx_date) as from_date

               , max (tx_date) to_date

               , sum (amount) as amount

               from f14_custom_group

               group by group_id, cust_id

               order by cust_id, min(tx_date);

truncate table f14_custom_group;

end;

call forth_nightly_shopping ( ?);

It does the trick and the performance might be acceptable ( 3.2 Mio records grouped and aggregated in 23 secs).

However, the problems are obvious:

  • Further parallelism will be difficult to build into this (maybe doable with SP8)
  • the implementation is very inflexible
  • Any option to add selection criteria will need to be included into the procedure and the call
  • Massive copying of data required

Sooo... Does anybody have a better idea to tackle this one?

Cheers,

Lars

7 Comments