cancel
Showing results for 
Search instead for 
Did you mean: 

what to do if group by column contains NULL value

0 Kudos

Dear colleages,

CREATE OUTPUT WINDOW Aggr_window PRIMARY KEY DEDUCED AS

  SELECT  A.userId userId ,

                 A.url url,

                 A.cookieId,

                 count ( A.timestamp ) count

  FROM Prev_stream A KEEP 24 hours

  GROUP BY A.userId ,

                     A.url,

                     A.cookieId;


However, the value of userId is sometimes NULL, in this case, aggregation is not performed at all.

userIdurlcookieIdtimestamp
NULLhttp://targetPage11120150501
NULLhttp://targetPage11120150502
customer1http://targetPage99920150501
NULLhttp://targetPage22220150601



I expect to get:

userIdurlcookieIdcount
NULLhttp://targetPage1112
customer1http://targetPage9991
NULLhttp://targetPage2221


instead I got only

userIdurlcookieIdcount
customer1http://targetPage9991


How do I deal with this situation?


Thanks for your help,

Tao

Accepted Solutions (1)

Accepted Solutions (1)

0 Kudos

I found a post from Jeff:

so I did the following:

DECLARE

  string unknownId := '$$UNKNOWN$$';

  string setVar (string value)

    {

      if (isnull(value)) value := unknownId;

      return value;

    }

 

END;

...

CREATE OUTPUT STREAM newStream AS

  SELECT

                 setVar(A.userId) userId,

                 A.url url,

                 A.cookieId,

             A.timestamp timestamp

FROM Prev_stream A

CREATE OUTPUT WINDOW Aggr_window PRIMARY KEY DEDUCED AS

  SELECT  A.userId userId ,

                 A.url url,

                 A.cookieId,

                 count ( A.timestamp ) count

  FROM newStream A KEEP 24 hours

  GROUP BY A.userId ,

                  A.url,

                  A.cookieId;


I got now

userIdurlcookieIdcount
$$UNKNOWN$$http://targetPage1112
customer1http://targetPage9991
$$UNKNOWN$$http://targetPage2221



If you have better suggestions, please do let me know.


Thanks and regards,

Tao

Former Member
0 Kudos

This is more efficient as you are avoiding creating another stream and executing unnecessary code on every event.

CREATE OUTPUT WINDOW Aggr_window PRIMARY KEY DEDUCED AS

  SELECT  ifnull(A.userId, 'N/A') userId ,

                 A.url url,

                 A.cookieId,

                 count ( A.timestamp ) "count"

  FROM newStream A KEEP 24 hours

  GROUP BY ifnull(A.userId, 'N/A') ,

                  A.url,

                  A.cookieId;

Thanks

Pauli

0 Kudos

Hello Pauli,

thanks a lot. I tried your solution, however, the NULL userId in the result list have always '1' as value, not the expected dummy value. Do you know why?

Thanks and regards,

Tao


RobertWaywell
Product and Topic Expert
Product and Topic Expert
0 Kudos

Tao,

Can you try using 'NA' instead of 'N/A'. I'm curious to see if that will change the output you are getting.

Thanks

0 Kudos

Hello Robert,

when I replace 'N/A' with other string such as 'NA' or '$$UNKNOWN$$', the NULL userId in the result list have always '1' as value, not the expected dummy value.

Regards,

Tao

Former Member
0 Kudos

Can you post your CCL?

0 Kudos

CREATE SCHEMA rawEventSchema (

     userId string,

     productId string,

     piwikId string,

     timestamp long

);

CREATE INPUT stream rawEventStream

SCHEMA rawEventSchema

;

CREATE OUTPUT window aggr1

PRIMARY KEY DEDUCED AS

SELECT isnull(A.userId, 'NA', A.userId) userId,

               A.productId productId,

               A.piwikId piwikId,

               count ( A.timestamp ) counter

FROM rawEventStream A KEEP 24 hours

GROUP by isnull(A.userId, 'NA', A.userId) userId,

                    A.productId,

                    A.piwikId

;

For me, it seems only the boolean result of isnull() function is outputed. Therefore, when userId has NULL value, the output userId is 0, otherwise, the output userId is 1.

Regards,

Tao

JWootton
Advisor
Advisor
0 Kudos

You're using isnull() which accepts a single argument and returns 0 or 1.  Use ifnull() which is same as firstnonnull() which will return the string if the first argument is null

0 Kudos

Thanks a lot. I mistook isnull as ifnull.

interestingly, isnull() takes three arguments as well.

Best regards,
Tao

Former Member
0 Kudos

Although it is documented that the isnull function only takes one argument.  In realty it can take more than one and returns true (1) if any of the arguments is null.

0 Kudos

Got it. Thanks.

Answers (0)