cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Union not returning expected results. Comparing data in two tables for equality

Former Member
0 Kudos

Hi All. We need to test some changes that we will be rolling out.  The changes are for performance and the data that is output should not change.  To verify this, the plan was to create a before snapshot table by selecting all records from the graphical view, roll out the changes, and then verify the records returned from the same graphical view didn't change. Note: The data in the base tables is static.

As a POC, I created a snapshot table using

CREATE TABLE schema.snapshottable AS (SELECT * FROM "PUBLIC"."GRAPHICAL_VW") WITH DATA;

and then compared it back to the graphical view using different techniques. 

Using Minus/Except

select * from GRAPHICAL_VW

minus (tried except as well)

select * from schema.snapshottable

Expected no rows returned

Received numerous rows

Using Union

select count(*) from

(

select * from GRAPHICAL_VW  --300k records

union

select * from schema.snapshottable  --300k records

)

Expected Since the table was filled with all records from the view, I expected the union to distinct all records and return a count equal to the table row count (300k)

Received  ~408k or so.  I manually checked some of the duplicate rows and they looked to match exactly, as expected, so I don't understand why they didn't reduce down to one row.

Other Attempts without success

     Created table A and table B one after another with the same create statement from the same view.  Similar Results

     Created row tables instead of column tables.

Question

Can anyone explain what I missing here?

Any suggestions on how to compare two data sets for equality, i.e. the before and after snapshot are identical?  Is there a table checksum type feature?

Thanks in advance for the help.

Accepted Solutions (0)

Answers (1)

Answers (1)

lbreddemann
Active Contributor
0 Kudos

We need to know more about your source table data.

Any NULL valued entries in there?

Provide something that reproduces, then we can come up with some ideas.

- Lars

Former Member
0 Kudos

Hi Lars.  Thanks for the reply.  Yes, there are null values in the source table data.

After further testing it appears to be a precision issue which is a little surprising since both tables are that are being unioned are created using the same select statement one after another changing only the table name.

Unfortunately, I'm not able to provide a full solution that reproduces.  The general steps I did were.

1. CREATE TABLE schema.snapshotbefore AS (SELECT * FROM "PUBLIC"."GRAPHICAL_VIEW") WITH DATA;

2. CREATE TABLE schema.snapshotafter AS (SELECT * FROM "PUBLIC"."GRAPHICAL_VIEW") WITH DATA;

3.

select * from schema.snapshotbefore

union

select * from schema.snapshotafter

I expected the result of this union to have the same rowcount as either of the tables after the union's distinct was applied since I expected.  This didn't occur.  To find the differences, I narrowed down the result for each select statement to the same row in each table using the same primary keys. When unioned those, I still received two records.  I then deleted the same columns from both select statements in the union until only one row returned.  Once that happened, I added the last one I deleted back in and received this.

All of the columns in that snapshot are defined as double with no precision.  All match except the last one due to the precision.  Is precision not guaranteed unless it is defined?

Any thoughts on a better way to compare two large (400k+) data sets for equality?

Former Member
0 Kudos

Found the answer to the question about double precision

Data Types - SAP HANA SQL and System Views Reference - SAP Library

For example, 0.1 cannot be represented exactly by combining these binary fractions, in this case you will get inexact results when using the DOUBLE or REAL type. This is not an issue with the SAP HANA database, this is the correct behavior for these data types. For a demonstration of this property of these data types in action please see the example below.

 SELECT TO_DOUBLE(0.1) + TO_DOUBLE(4.6) AS DOUBLE_SUM FROM DUMMY;  DOUBLE_SUM 4.6999999999 

Anyone have a better way to compare datasets like this besides listing each column and rounding all numeric?

TIA

lbreddemann
Active Contributor
0 Kudos

Again: provide a reproducible test case and there's a good chance I can help you with that.

Without that, chances are close to zero for me to look into it.

- Lars