Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
justin_molenaur2
Contributor

Requirement

It is a very common requirement to view various “slices” of data based on different time criteria on the same row in a report or analysis. “Show me current year to date vs. prior year to date sales figures for all sales orgs”. This can be accomplished in a number of ways, many of which have been explored in the following blog posts.

Implementation of WTD, MTD, YTD in HANA using Input Parameters only

Implementation of WTD, MTD, YTD Period Reporting in HANA using Calculated Columns in Projection

Implementation of WTD, MTD, YTD in HANA using Script based Calc View calling Graphical Calc view

Applying YTD in SAP HANA with SAP BO Analysis Office

All of these approaches have one thing in common – they all rely on generation of the time selection criteria at runtime. This is accomplished through either an input parameter/variable from a user or dynamically determining it at runtime of the view in the case of the scripted view.

While effective, this is not really reusable in the sense of a "conformed" DW dimension. Developers may be interpreting the definitions differently across many of the views that start springing up in the system. For example, with “Current Year To Date”, does this include all data leading up to the current day or does it only include data up to the last closed month? Or maybe the last closed week? Two developers may interpret the same semantics slightly differently and users (and developers) may become confused.


Process Approach Overview

Why don’t we predetermine the time slice definitions as attributes physically in the data so everyone can use them?!

I would draw a similarity with cooking popcorn in the microwave. Now, I could read the instructions and manually choose how long I should cook my popcorn OR I can press one button and have that time determined for me without having to provide anything. This is one of key points (for me at least) in BI application design – don’t make your users have to think too hard to get the answers they need. Now, I never said your popcorn won’t burn, and I’d like to think the HANA approach is a *little* more precise, but you get the idea of the ‘easy’ button I am trying to describe here.

In many of the previous DW/BI projects I have been involved in, the date dimensions play a large role in helping to solve this type of “time slice” requirement. Here, a given member of the dimension can have many different flags or indicators present, which when joined to the fact table, can act as filters on your data.

This would replace the need to determine the filters at run time since they are already available to you for use in joins.

The added benefits are:

  • These rules or definitions will be reused across all solutions, therefore imposing some level of data consistency in the DW and end users or content creators can get a consistent experience.
  • By making these rules accessible in a dimension, the options for solutions become more flexible as we can now model directly in the analytic view without the need for a Calculation view.
  • We never have to be concerned about ‘filter pushdown’ since we are relying on the joins in the Analytic views to work for us at the lowest level possible.
  • You can introduce as many different filters as you need to, YTD, MTD, QTD, WTD including fiscal definitions if required. You just need some logic and a new column on your target dimension.

I have to give credit to a colleague of mine Greg Cantwell who put together the production version that this demonstration is built off of.


Happy HANA!

Justin


HANA Development

This solution consists of the following components

  1. Date dimension table based a given granularity (day, month, fiscal period, etc) of your choice.  In my demo I will be using the monthly grain. HANA provides functions to create time data, for which you can generate and then either modify tables directly or move to another schema. See Appendix 1 for details.
  2. A fact table containing the grain chosen in step 1, monthly in this example.
  3. A SQL script (or other method) that can update the date dimension table with the “current state” per business definitions. This would execute at some predefined schedule to ensure that the dimension “moves up” as time changes.
  4. An attribute view that wraps the table from step 1.
  5. An Analytic view that wraps the table from step 1 and includes the attribute view in step 4. Here I will also illustrate how the “time slices” can be baked into the analytic view without using a Calculation view.
  6. A graphical Calculation view that consumes the analytic view.
  7. Performance testing in Appendix 2 for Calc View vs. Analytic View.

Date dimension table. See attached file for insert statements.

CREATE COLUMN TABLE "MOLJUS02"."MONTH_DIM"

    ("YEAR" NVARCHAR(4),

        "YEAR_MONTH" NVARCHAR(6),

        "QUARTER" NVARCHAR(2),

        "MONTH" NVARCHAR(2),

        "YTD" NVARCHAR(1),

        "CY" NVARCHAR(1),

        "PY" NVARCHAR(1),

        PRIMARY KEY ("YEAR",

        "MONTH")) UNLOAD PRIORITY 0 AUTO MERGE;

A fact table containing the grain chosen in step 1, monthly in this example.  See attached file for insert statements.

CREATE COLUMN TABLE "MOLJUS02"."FACT_TABLE"

    ("YEARMONTH" NVARCHAR(6),

        "MATERIAL" NVARCHAR(18),

        "CUSTOMER" NVARCHAR(10),

        "SALES" DECIMAL(15,2),

        "COST" DECIMAL(15,2),

        PRIMARY KEY ("YEARMONTH",

        "MATERIAL", "CUSTOMER")) UNLOAD PRIORITY 0 AUTO MERGE;

A SQL script (or other method) that can update the date dimension table with the “current state” per definitions. Here I will just show the SQL, you can choose how to implement.

UPDATE "MOLJUS02"."MONTH_DIM" SET

"YTD" = case when month(current_date) >= "MONTH"

       then 'Y' else 'N' end,

       "CY" = case when year(current_date) = "YEAR"

       then 'Y' else 'N' end,

       "PY" = case when year(current_date)-1 = "YEAR"

then 'Y' else 'N' end;

For each record in the date dimension, we are using three different attributes to help us determine the time slices.

Column “YTD” (Y/N) tells us whether the month falls within the months that have already passed in the year, INCLUDING the current month. So if we are in March (03), this will flag months 01, 02 and 03 across ALL years that exist in the dimension.

Column “CY” (Y/N) tells us whether the month falls within the current year.

Column “PY” (Y/N) tells us whether the month falls within the previous year.

An attribute view that wraps the table from step 1.

An Analytic view that wraps the table from step 1 and includes the attribute view in step 4. Here I will also illustrate how the “time slices” can be baked into the analytic view without using a Calculation view.


The join is on YEAR/MONTH (201301) and is a 1:N from the attribute view to the fact table.

Now that we have the date dimension hooked up to a fact table, we can now run some queries that help illustrate how these flags can be used. Now notice that we don’t have to pass in any filter criteria for selecting the times slices, that logic is already done for us.

All of the data as shown through data preview, notice how the three flags tell us the story.


This query will show us the first three months that have already passed in the year, against the same months in all years contained in the fact.

SELECT "YEAR", SUM("SALES")

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/AN_TEST_FACT"

WHERE "YTD" = 'Y'


This query will show us all the sales by material for the current year to date

SELECT "MATERIAL", SUM("SALES") AS "CURRENT_YEAR_SALES"

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/AN_TEST_FACT"

WHERE "YTD" = 'Y'

AND "CY" = 'Y'

GROUP BY "MATERIAL"


This query will show us all the sales by material for the current year to date against the previous year to date.

SELECT "MATERIAL", SUM("CURRENT_YEAR_SALES"), SUM("PREVIOUS_YEAR_SALES")

FROM (

SELECT "MATERIAL", SUM("SALES") AS "CURRENT_YEAR_SALES", 0 AS "PREVIOUS_YEAR_SALES"

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/AN_TEST_FACT"

WHERE "YTD" = 'Y'

AND "CY" = 'Y'

GROUP BY "MATERIAL"

UNION

SELECT "MATERIAL", 0 AS "CURRENT_YEAR_SALES", SUM("SALES") AS "PREVIOUS_YEAR_SALES"

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/AN_TEST_FACT"

WHERE "YTD" = 'Y'

AND "PY" = 'Y'

GROUP BY "MATERIAL")

GROUP BY "MATERIAL"



Model the time slices as restricted measures in the analytic view. Now that we have these flags available, we can use them to define restricted measures.



Data Preview on Analytic View


A graphical Calculation view that consumes the analytic view and performs a UNION.

Projections on Analytic views, including the filters on time slice attributes.








Now that we have the CY/PY measures, we can perform variance and percentage calculations as calculated columns.

Data Preview

Performance Considerations with above approach, Analytic View vs. Calculation View

I created a similar scenario as above, but on a fact table with much larger data. I saw approximately 10 million records per a given month. The goal was to measure the performance different between implementing the time slice in an Analytic View restricted measure vs. the Calculation View Unions.

Conclusion

Both methods result in similar runtimes. AV performs *slightly* better (6-7% faster) on total aggregation (no additional columns), and CV definitely performs better (from 15 - 30% faster) with the more detailed dataset. Thus, there is a tradeoff between performance and maintenance/reusability by moving to a Calculation view. More than anything, it proves both methods are acceptable from a performance perspective, but generally if you need more detailed resultset, Calc view offers you better performance.

Analytic View

--Aggregated

SELECT SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), SUM("PRIOR_YEAR_COST"), COUNT(*)

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/AN_TEST_FACT_WDATA"


Average Run Time is 375ms

Statement 'SELECT SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), ...'

successfully executed in 374 ms 683 µs  (server processing time: 370 ms 997 µs)

Fetched 1 row(s) in 0 ms 5 µs (server processing time: 0 ms 0 µs)

Add more detail

--Add Customer

SELECT "KUNWE", SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), SUM("PRIOR_YEAR_COST"), COUNT(*)

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/AN_TEST_FACT_WDATA"

GROUP BY "KUNWE"

Average Run Time is 1.375ms

Statement 'SELECT "KUNWE", SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), ...'

successfully executed in 1.371 seconds  (server processing time: 1.368 seconds)

Fetched 5000 row(s) in 363 ms 267 µs (server processing time: 7 ms 35 µs)

Result limited to 5000 row(s) due to value in Result Preferences

Add even more detail

--Add Customer/Material

SELECT "KUNWE", "MATNR", SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), SUM("PRIOR_YEAR_COST"), COUNT(*)

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/AN_TEST_FACT_WDATA"

GROUP BY "KUNWE", "MATNR"

Average Run Time is 1.820 seconds

Statement 'SELECT "KUNWE", "MATNR", SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), ...'

successfully executed in 1.815 seconds  (server processing time: 1.811 seconds)

Fetched 5000 row(s) in 241 ms 143 µs (server processing time: 10 ms 11 µs)

Result limited to 5000 row(s) due to value in Result Preferences

Very important to note, is that by modeling this way, we stay entirely in the OLAP engine and the Calc engine is never invoked.

Calculation View

--Aggregated

SELECT SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), SUM("PRIOR_YEAR_COST"), COUNT(*)

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/CV_TEST_FACT_WDATA"

Average Run Time is 410ms

Statement 'SELECT SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), ...'

successfully executed in 400 ms 713 µs  (server processing time: 396 ms 866 µs)

Fetched 1 row(s) in 0 ms 4 µs (server processing time: 0 ms 0 µs)

--Add Customer

SELECT "KUNWE", SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), SUM("PRIOR_YEAR_COST"), COUNT(*)

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/CV_TEST_FACT_WDATA"

GROUP BY "KUNWE"

Average Run Time is 950ms

Statement 'SELECT "KUNWE", SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), ...'

successfully executed in 975 ms  (server processing time: 1.306 seconds)

Fetched 5000 row(s) in 266 ms 153 µs (server processing time: 6 ms 127 µs)

Result limited to 5000 row(s) due to value in Result Preferences

Add even more detail


--Add Customer/Material

SELECT "KUNWE", "MATNR", SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), SUM("PRIOR_YEAR_SALES"), SUM("PRIOR_YEAR_COST"), COUNT(*)

FROM "_SYS_BIC"."sandbox.justin.Date_Dim/CV_TEST_FACT_WDATA"

GROUP BY "KUNWE", "MATNR"

Average Run Time is 1.52 seconds

Statement 'SELECT "KUNWE", "MATNR", SUM("CURRENT_YEAR_SALES"), SUM("CURRENT_YEAR_COST"), ...'

successfully executed in 1.518 seconds  (server processing time: 1.514 seconds)

Fetched 5000 row(s) in 279 ms 965 µs (server processing time: 6 ms 128 µs)

Result limited to 5000 row(s) due to value in Result Preferences

Here, we can clearly see that the Calc engine is invoked, which is expected



Appendix 1 – generate time data in HANA

In HANA, we can natively generate the Gregorian time dimensions we need for the above type analysis. You can either modify the table directly in the _SYS_BI schema OR copy it out to an application schema with data and modify.

Generate Time Data - Modeler View, "Data" pane

Choose grain

Schema _SYS_BI now has the data in the relevant table

You can now copy the data to another schema and alter to add the attributes you wish

CREATE COLUMN TABLE "SCHEMA"."TABLE" LIKE "_SYS_BI"."M_TIME_DIMENSION_MONTH" WITH DATA;

ALTER TABLE....

12 Comments
Labels in this area