on 07-30-2015 12:55 AM
Hello Folks,
Need your help to build logic for the below requirement.
We have a requirement where we need to fetch the record posted after every 24 hours from the first record time stamp.
details about the requirement:
1) Take the first record for each order_number and corresponding time stamp.
2) take the record posted for the same order number after 24 hours.
3) take the record posted after after 24 hours based on the time stamp from 2nd record.
Below is the sample data set.
RECORD_NUMBER | CREATED_DATE |
567546 | 2015-07-28 10:45:24 |
567546 | 2015-07-28 14:13:45 |
567546 | 2015-07-28 22:53:19 |
567546 | 2015-07-29 08:34:39 |
567546 | 2015-07-29 17:13:26 |
567597 | 2015-07-26 11:34:29 |
567597 | 2015-07-27 10:45:32 |
567597 | 2015-07-28 13:12:43 |
567597 | 2015-07-29 18:39:51 |
The result should be like below.
RECORD_NUMBER | CREATED_DATE |
567546 | 2015-07-28 10:45:24 |
567546 | 2015-07-29 17:13:26 |
567597 | 2015-07-26 11:34:29 |
567597 | 2015-07-28 13:12:43 |
567597 | 2015-07-29 18:39:51 |
The records should be grouped based on record_number.
Regards,
Venkat N.
Window functions are your friends. 🙂
SELECT RECORD_NUMBER,MIN(CREATE_DATE) CREATE_DATE FROM (
SELECT *, MIN(TO_DATE) OVER (PARTITION BY RECORD_NUMBER ORDER BY CREATE_DATE DESC) TO_DATE2 FROM (
SELECT *, CASE WHEN DIFF > 24 THEN CREATE_DATE WHEN NEXT_DATE IS NULL THEN '2100-01-01' ELSE NULL END TO_DATE FROM (
SELECT *, SUM(SECONDS_BETWEEN(CREATE_DATE,NEXT_DATE)/3600) OVER (PARTITION BY RECORD_NUMBER ORDER BY CREATE_DATE) DIFF FROM (
SELECT RECORD_NUMBER, CREATE_DATE, LEAD(CREATE_DATE) OVER (PARTITION BY RECORD_NUMBER ORDER BY CREATE_DATE) NEXT_DATE
FROM "TEST".TEST_TABLE)
GROUP BY RECORD_NUMBER, CREATE_DATE,NEXT_DATE)))
GROUP BY RECORD_NUMBER, TO_DATE2
ORDER BY RECORD_NUMBER, CREATE_DATE;
;RECORD_NUMBER;CREATE_DATE
1;567,546 ;Jul 28, 2015 10:45:24.0 AM
2;567,546 ;Jul 29, 2015 5:13:26.0 PM
3;567,597 ;Jul 26, 2015 11:34:29.0 AM
4;567,597 ;Jul 28, 2015 1:12:43.0 PM
5;567,597 ;Jul 29, 2015 6:39:51.0 PM
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Creating a running total of the time distance to the next event is a good idea.
I tried the same but couldn't get it to work (as I was trying to work the time backwards to the starting point.. ).
Your solution is (with proper formatting) also quite easy to understand and follow.
Very nice!
- Lars
I did the cumulative total then the most outer sort is descending by create date, so that the MIN() aggregation on the TO_DATE would put the desired target value on the target records, so that when I finally aggregate by MIN(CREATE_DATE) (and by then I resort by ascending order), it groups by correctly.
Hi Colleagues,
I tougth using WITH is was step-by-step easily and readable but Lars latests words say I'm wrong... 😞
No problem, it always is time to get back and do it better
Converting the WITH version to plain SQL (which also can be modeled as graph) reached a clear solution that as far as I tested works for many cases, including
select
a.record_number,
min(a.create_date) as create_date
from
test_table as a inner join
(select
-- get first create_date per record_number to make the grouping of series
record_number,
min(create_date) as min_dt
from
test_table
group by
record_number) as b
on b.record_number = a.record_number
group by
a.record_number,
-- compute grouping key by interval from first date with desired interval (here 24 hours = 1 day = 86400 seconds)
round(seconds_between(a.create_date,b.min_dt)/(1*86400),0,ROUND_DOWN)
order by 1,2
Regards, Fernando Da Rós
No no no no no!
I didn't mean to say (or imply) that you're wrong - your solution is very fast and produces, as far as I could see, the correct result.
The fault is on my side, since I simply struggle to understand _why_ your solution is correct.
But that's really on me and surely caused by my ridiculously short attention span, my lack of abstract thinking and my general fear of everything too technical!
@All: guys, this is one of the most interesting and entertaining threads in a long time here on SCN.
I'd hope there will be something like that in the future 1DX version of discussions, too.
Thanks for that and cheers!
- Lars
What about this solution?
select
record_number,
create_date
from ( select
tbl.record_number,
tbl.create_date,
cat.datemin,
to_integer((seconds_between(cat.datemin, tbl.create_date)/3600)/24) as days,
mod((seconds_between(cat.datemin, tbl.create_date)/3600), 24) as dif
from test_table tbl
inner join (select
distinct record_number,
min(CREATE_DATE) as datemin
from test_table
group by record_number) cat
on tbl.record_number = cat.record_number )
where dif = 0
or days >= 1
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi José,
Need some adjust. It produces same result from the SQL from Henrique but also doesn't work on new data. The new data is here but I'll reproduce it again here:
The inserts:
INSERT INTO TEST_TABLE VALUES(3112,'2015-07-26 09:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-07-26 20:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-07-27 07:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-07-27 18:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-07-28 05:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-07-28 16:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-07-29 03:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-01 10:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-02 05:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-03 00:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-03 19:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-04 14:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-05 09:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-06 04:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-10 05:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-11 06:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-12 07:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-13 08:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-14 09:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-15 10:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-16 11:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-17 12:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-20 08:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-20 09:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-20 10:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-20 11:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-21 08:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-21 09:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-21 10:00:00');
INSERT INTO TEST_TABLE VALUES(3112,'2015-09-21 11:00:00');
Attached the excel explaining midnights:
Current SQL statement:
select
a.record_number,
min(a.create_date) as create_date
from
test_table as a inner join
(select
-- get first create_date per record_number to make the grouping of series
record_number,
min(create_date) as min_dt
from
test_table
group by
record_number) as b
on b.record_number = a.record_number
group by
a.record_number,
-- compute grouping key by interval from first date with desired interval (here 24 hours = 1 day = 86400 seconds)
round(seconds_between(a.create_date,b.min_dt)/(1*86400),0,ROUND_DOWN)
order by 1,2
;
Here the expected output:
BTW: Where is the owner of thread? jump in and provide feedback, the question is yours, we are just playiing it around but your silent doesn't tell if it's help with your initial question.
Best regards, Fernando Da Rós
Hello Venkat,
Just an idea from my side..it may need data type casting and with slight adjustment in code it may work..
DROP PROCEDURE <procedure>;
CREATE PROCEDURE <procedure> LANGUAGE SQLSCRIPT AS
BEGIN
DECLARE v_var1 TIMESTAMP;
DECLARE v_temp BIGINT ;
DECLARE c_operator NVARCHAR(1) := '+';
DECLARE c_24hours TIME := '24:00:00';
DECLARE CURSOR c_cursor FOR SELECT record_number , created_date FROM table1 ORDER BY record_number
BEGIN
FOR cur_row as c_cursor DO
v_var1 = cur_row.CREATED_DATE|| c_operator || c_24hours ;
IF cur_row.RECORD_NUMBER <> v_temp THEN
execute immediate 'INSERT INTO SCHEMA.TABLE2 ( RECORD_NUMBER, CREATED_DATE ) -- this should pick the first record for each order_number and corresponding time stamp.
SELECT RECORD_NUMBER , CREATED_DATE FROM TABLE1
WHERE RECORD_NUMBER ='||cur_row.RECORD_NUMBER||'
AND CREATED_DATE = '||cur_row.CREATED_DATE||' ;
execute immediate 'INSERT INTO SCHEMA.TABLE2 ( RECORD_NUMBER, CREATED_DATE ) -- This should pick same record number but posted after after 24 hours based on the time stamp from 2nd record
SELECT RECORD_NUMBER , CREATED_DATE FROM TABLE1
WHERE RECORD_NUMBER ='||cur_row.RECORD_NUMBER||'
AND CREATED_DATE > '||V_VAR1||' ;
v_temp = cur_row.RECORD_NUMBER;
END FOR
END
Thanks
Siva
Message was edited by: Sivakumar Palaniswamy
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Please post the create table and insert statements so that we can work with your data.
- Lars
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi lars,
Thanks for the reply. below are the create and insert statements.
CREATE COLUMN TABLE <SCHEMA>.TEST_TABLE (RECORD_NUMBER BIGINT,CREATE_DATE SECONDDATE)
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-28 10:45:24')
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-28 14:13:45')
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-28 22:53:19')
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-29 08:34:39')
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-29 17:13:26')
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567597,'2015-07-26 11:34:29')
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567597,'2015-07-27 10:45:32')
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567597,'2015-07-28 13:12:43')
INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567597,'2015-07-29 18:39:51')
Thanks for the DDL and INSERT statements.
I really liked this question and I spend some time tinkering with it.
Unfortunately I did not find an elegant solution for this, since SAP HANA currently doesn't support recursive queries.
Because the new starting point for any 24 hour interval is based on a calculated value in a row "before" the current one and there could be any number of "before" rows, neither join nor sub-select work here.
The solution to this, unfortunately seems to be a procedure:
create global temporary column table my_tmp (record_number bigint, create_date seconddate);
drop procedure max_by_24_hrs;
create procedure max_by_24_hrs (out max_events
TABLE (record_number bigint, create_date seconddate))
language SQLSCRIPT as
begin
declare cursor c_raw for select record_number, create_date
from test_table
order by record_number, create_date;
declare curr_grp_record bigint ;
declare curr_grp_date seconddate ;
declare curr_record bigint;
declare curr_date seconddate;
-- initialize the loop variables
truncate table my_tmp;
select min (record_number), min (create_date)
into curr_grp_record, curr_grp_date
from test_table;
for cr as c_raw DO
curr_record := cr.record_number;
curr_date := cr.create_date;
-- records group
if :curr_record > :curr_grp_record then
-- new group, new date
curr_grp_record = :curr_record;
curr_grp_date = :curr_date;
insert into my_tmp values (:curr_record, :curr_grp_date);
elseif (:curr_record = :curr_grp_record
and add_days(:curr_grp_date, 1) <= :curr_date) then
-- same record no, but date is at least a day higher than the last starting point
-- new date
curr_grp_date = :curr_date;
insert into my_tmp values (:curr_record, :curr_grp_date);
end if;
end for;
max_events = select * from my_tmp;
end;
call max_by_24_hrs(?);
RECORD_NUMBER | CREATE_DATE |
567546 | 2015-07-28 10:45:24.0 |
567546 | 2015-07-29 17:13:26.0 |
567597 | 2015-07-26 11:34:29.0 |
567597 | 2015-07-28 13:12:43.0 |
567597 | 2015-07-29 18:39:51.0 |
This version at least avoids dynamic SQL and loops over the data only once.
Would be interesting to see if anyone finds a more elegant or better performing option.
- Lars
Hi Colleagues,
Follow two other ways to do it.
I've used WITH just as it's easy but it can be ported to calc view easily.
This is hardcoded to match only Two days, depending on requirement I'd choose this one.....
with
lt_first as ( select record_number,min(create_date) as dt
from test_table
group by record_number),
lt_1day as (select b.record_number,min(b.create_date) as dt
from lt_first as a inner join test_table as b
on b.record_number = a.record_number
and b.create_date > (add_days(a.dt,1))
group by b.record_number),
lt_2day as (select b.record_number,min(b.create_date) as dt
from lt_1day as a inner join test_table as b
on b.record_number = a.record_number
and b.create_date > (add_days(a.dt,1))
group by b.record_number)
select * from lt_first union all
select * from lt_1day union all
select * from lt_2day
order by 1,2;
... the other one is infinite but have potential to crash a server due to cross join:
with
lt_24hrs_pair as ( select a.record_number,a.create_date as dt,b.create_date as dt_next
from test_table as a inner join test_table as b
on b.record_number = a.record_number
and b.create_date > add_days(a.create_date,1)),
lt_min_next as (select record_number,dt,min(dt_next) as dt_next
from lt_24hrs_pair
group by record_number,dt),
lt_min_min as (select record_number,min(dt) as dt,dt_next
from lt_min_next
group by record_number,dt_next),
lt_last as (select record_number,max(dt_next) as dt
from lt_24hrs_pair
group by record_number)
select record_number,dt from lt_min_min union
select record_number,dt from lt_last
order by 1,2;
I've added more lines in order to test if the result kept the same and yes, it still the same.
INSERT INTO TEST_TABLE VALUES(567546,'2015-07-28 10:45:59');
INSERT INTO TEST_TABLE VALUES(567546,'2015-07-28 14:13:59');
INSERT INTO TEST_TABLE VALUES(567546,'2015-07-28 22:53:59');
INSERT INTO TEST_TABLE VALUES(567546,'2015-07-29 08:34:59');
INSERT INTO TEST_TABLE VALUES(567597,'2015-07-26 11:34:59');
INSERT INTO TEST_TABLE VALUES(567597,'2015-07-27 10:45:59');
INSERT INTO TEST_TABLE VALUES(567597,'2015-07-28 13:12:59');
Regards, Fernando Da Rós
PS: I couldn't make the SCN editor show the SQL formatted for better reading 😞
Message was edited by: Fernando Ros
Hi Fernando
So glad someone else came up with something, too. Thanks for that!
This one kept me puzzling a while..
And yes, the (hopefully) much improve code display is something I am looking forward to on the new discussion platform, too!
Admittedly, I also tried to make do with the WITH clause but since the RECURSIVE feature is not there (and no other feature allowing recursion) I switched to the sledge-hammer approach that I posted.
Concerning your second approach, how does this one allow for arbitrary number of 24-hr steps?
To me a core difficulty was the fact that there could be any number of records within a 24hr period which would be the start of a new group.
How does this work in your solution?
BTW: the procedure version really is not that good for mass processing. I blew the record set up to 1.3 mio records and it already took more than a minute. Without additional filters, we really need recursive/hierarchy support for such requirements.
- Lars
Shoot... I just remembered that we already discussed such a requirement some time ago...
Looking at it, I created the same solution again
back then came up with the parent-child hierarchy (which essentially is the missing recursion functionality in this scenario) approach.
I find that quite frustrating as the discussion back then was over a year ago on rev. 74...
Let's keep fingers crossed for more recursion support in SPS after SPS 10!
- Lars
Hi Lars,
I love such on hands, and your request worked for me as a call to army. My luck that I read it and could spent some time (not spare time). I still have a "background job" with this from
You are right, it doesn't work in series of 24 hours. The failure is expose in such sample of series of 23 hours:
26/Jul 09:00
27/Jul 08:00 <-- discarded
28/Jul 07:00
29/Jul 06:00 <-- discarded
30/Jul 05:00
31/Jul 04:00 <-- discarded
01/Aug 03:00
02/Aug 02:00 <-- discarded
03/Aug 01:00
04/Aug 00:00 <-- discarded
04/Aug 23:00
05/Aug 22:00 <-- discarded
The first 24 hours start on 26/jul 09:00 and should end on 27/jul 08:59, the second cycle start two hour earlier, and third forth hours. 😞
I got confused with what I understood from:
1) Take the first record for each order_number and corresponding time stamp.
2) take the record posted for the same order number after 24 hours.
3) take the record posted after after 24 hours based on the time stamp from 2nd record.
Anyhow, the correctness depends on requeriment, if the requirement of 24 hours demand harmonization among all then a date grouping is most interesting.
If the link is necessary according start of series I guess it can be achieved using the first occurence as parameter for grouping the next sets. I'll give a try it later and feedback to thread...
Best regards, Fernando Da Rós
Here's one way to approach it.
- Create DATE() and TIME() GENERATED ALWAYS statements
- Find the minimum timestamp and add it as a calculated column for all rows
- Subtract the time for this from the current time into another calculated column "calctime"
- Rank on the minimum "calctime" for each date
I think that could be built as a graphical calc view, but it might suck.
Wondering what Werner Steyn thinks to all this.
Hmm... not sure I get your points here.
What do we need the generated columns for?
The approach of looking at the running added up time difference between the current row and the groups minimum is straight forward.
Unfortunately it doesn't help with the core problem - that the local minimum time is clear only after the fact of sequentially adding up the time differences for every row.
I could be missing something here, but I'd really love to see this being build and working without a procedure.
- Lars
The generated columns allow us to separate the time and date so we can:
1) Subtract the first time from all the others, to know when the virtual "midnight" is
2) Group by date in the rank function so we create a window function
I'm pretty certain this would work, don't have time to test right now.
John
Hi Lars,
First take a look on attached sheet with the target designed to achive (on left side the behavior of second WITH posted, and the plan to reach the third one). I've also attached the script to run the sample of sheet:
The SQL below is didatic in 4 steps trying to be readable (at least I expect that)
with
lt_starter_per_record as
(select *,seconds_between(to_date(to_char(min_dt,'yyyy-mm-dd')),min_dt) as sec_starter from
(select record_number,min(create_date) as min_dt
from test_table
group by record_number)),
lt_day_range as
(select record_number,create_date,min_dt,sec_starter,dt,
case when seconds_between(dt,create_date) >= sec_starter then 0 else 1 end as ind_day_range
from
(select a.record_number,a.create_date,b.sec_starter,b.min_dt,to_date(to_char(create_date,'yyyy-mm-dd')) as dt
from test_table as a,lt_starter_per_record as b
where b.record_number = a.record_number)),
lt_compute_range as
(select record_number,create_date,
add_seconds(add_days(dt,-ind_day_range),sec_starter) as dt_range_ini
from lt_day_range),
lt_pick_the_ones as
(select record_number,create_date from
(select record_number,min(create_date) as create_date,dt_range_ini
from lt_compute_range
group by record_number,dt_range_ini))
select * from lt_pick_the_ones order by 1,2
;
After functionally work, tested also performance adding 10 mi rows in a one year range with script below:
select top 10000000
round(rand()*1000)+1000 as record_number,
add_seconds(add_days(to_date('2000-01-01'),rand()*365),round(rand()*86400,0)) as create_date
from objects,objects
into test_table;
The execution with all data with no filter was 2,5 seconds.
Statement 'with lt_starter_per_record as (select ...'
successfully executed in 2.641 seconds (server processing time: 2.407 seconds)
Fetched 1000 row(s) in 491 ms 283 µs (server processing time: 0 ms 734 µs)
This was what I said earlier, but reading your comments, and the blog and John's also I understood that there's also a need for range different than this thread (24 hours).
I guess it could be done changing the way group is calculated on lt_day_range / lt_compute_range using a formula like:
- use interval in seconds (example 1209600 seconds = 14 days)
- calculate first_date_time like this one
- compute each date subtracting first_date_time in seconds. Divide it by internal (1209600)... take the integer part
- use this integer for grouping
If it's not clear I can give a try on it later.
Best regards, Fernando Da Rós
User | Count |
---|---|
87 | |
23 | |
11 | |
9 | |
8 | |
5 | |
5 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.