on 06-11-2015 10:50 PM
Dear colleages,
CREATE OUTPUT WINDOW Aggr_window PRIMARY KEY DEDUCED AS
SELECT A.userId userId ,
A.url url,
A.cookieId,
count ( A.timestamp ) count
FROM Prev_stream A KEEP 24 hours
GROUP BY A.userId ,
A.url,
A.cookieId;
However, the value of userId is sometimes NULL, in this case, aggregation is not performed at all.
userId | url | cookieId | timestamp |
---|---|---|---|
NULL | http://targetPage | 111 | 20150501 |
NULL | http://targetPage | 111 | 20150502 |
customer1 | http://targetPage | 999 | 20150501 |
NULL | http://targetPage | 222 | 20150601 |
I expect to get:
userId | url | cookieId | count |
---|---|---|---|
NULL | http://targetPage | 111 | 2 |
customer1 | http://targetPage | 999 | 1 |
NULL | http://targetPage | 222 | 1 |
instead I got only
userId | url | cookieId | count |
---|---|---|---|
customer1 | http://targetPage | 999 | 1 |
How do I deal with this situation?
Thanks for your help,
Tao
so I did the following:
DECLARE
string unknownId := '$$UNKNOWN$$';
string setVar (string value)
{
if (isnull(value)) value := unknownId;
return value;
}
END;
...
CREATE OUTPUT STREAM newStream AS
SELECT
setVar(A.userId) userId,
A.url url,
A.cookieId,
A.timestamp timestamp
FROM Prev_stream A
CREATE OUTPUT WINDOW Aggr_window PRIMARY KEY DEDUCED AS
SELECT A.userId userId ,
A.url url,
A.cookieId,
count ( A.timestamp ) count
FROM newStream A KEEP 24 hours
GROUP BY A.userId ,
A.url,
A.cookieId;
I got now
userId | url | cookieId | count |
---|---|---|---|
$$UNKNOWN$$ | http://targetPage | 111 | 2 |
customer1 | http://targetPage | 999 | 1 |
$$UNKNOWN$$ | http://targetPage | 222 | 1 |
If you have better suggestions, please do let me know.
Thanks and regards,
Tao
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
This is more efficient as you are avoiding creating another stream and executing unnecessary code on every event.
CREATE OUTPUT WINDOW Aggr_window PRIMARY KEY DEDUCED AS
SELECT ifnull(A.userId, 'N/A') userId ,
A.url url,
A.cookieId,
count ( A.timestamp ) "count"
FROM newStream A KEEP 24 hours
GROUP BY ifnull(A.userId, 'N/A') ,
A.url,
A.cookieId;
Thanks
Pauli
CREATE SCHEMA rawEventSchema (
userId string,
productId string,
piwikId string,
timestamp long
);
CREATE INPUT stream rawEventStream
SCHEMA rawEventSchema
;
CREATE OUTPUT window aggr1
PRIMARY KEY DEDUCED AS
SELECT isnull(A.userId, 'NA', A.userId) userId,
A.productId productId,
A.piwikId piwikId,
count ( A.timestamp ) counter
FROM rawEventStream A KEEP 24 hours
GROUP by isnull(A.userId, 'NA', A.userId) userId,
A.productId,
A.piwikId
;
For me, it seems only the boolean result of isnull() function is outputed. Therefore, when userId has NULL value, the output userId is 0, otherwise, the output userId is 1.
Regards,
Tao
User | Count |
---|---|
87 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.