cancel
Showing results for 
Search instead for 
Did you mean: 

Several questions on PA - Time series analysis

Former Member
0 Kudos

Hello community,

as proposed by Tammy Powlas, I'm taking over my question from this blog to here:

http://scn.sap.com/community/predictive-analytics/blog/2015/11/23/my-first-steps-in-sap-pa--time-ser...

1) Why do some results show trend lines and others don't?

2) What does the horizon warning mean? I found some warnings with 2, 3 or 4 as maximum horizon.

3) How can the structure file be optimized? Is there a how-to or SCN document/blog available?

4) Is there a way to analyze more than only one company at a time?

I would like to load the whole DAX (German main index) and use the same ~20 key figures of all companies for finding the results per company.

Since all shares have the "same attention" (like when in DAX or SDAX or MDAX) I would like to use additional "trends" within the market for analysis.

Is there somehow a "learning effect" I can initiate in the tool by using different data with same variables?


5) Is there a way to use no 4) "more companies at once" and getting only the trend lines per company as result ... not the single predicted values?


6) How do I find out which of the 20 key figures I should keep or change for better results?


7) How does PA deal with mixed input of "total values" and "percentage key figures"?


😎 How can I tell PA which relationships exist between key figures, e.g. those which are an outcome of a formula using the weekly share value.


9) I checked the logs and found statements like:

"The automatic variable selection process discarded all the extra-predictable variables when estimating the trend(<list-of-variables>)" or

"The trend model (Regression<list-of-variables>...has been discarded from the competition." What does this mean?

Are all my 20 key figures in the file neglected and the forecast is based only on the historic share values? What could be the reason?

Please use the numbers above to relate your answer(s) to the respective question(s)...

My feeling is that what is shown in this video http://scn.sap.com/docs/DOC-62239 can't be everything I can do with "time series analysis".

Or is what I found out in my blog really the complete functionality, as of today?

Thanks, Martin

Accepted Solutions (1)

Accepted Solutions (1)

marc_daniau
Advisor
Advisor
0 Kudos

Hi Martin,

The SAP Automated Analytics Time series engine tries to find patterns in the data ; when found they appear as one, two or three components depending on the structure of your data. Those components are listed in the model overview as: trend, cycles, fluctuations. It may happen that no significant trend or cycle or fluctuation is found in the data. The trend component can be of different types (Linear, Polynomial, Lag1,...) ; refer to the user guide for the complete list.

http://help.sap.com/businessobject/product_guides/pa23/en/pa23_ts_user_en.pdf

It is recommended to use extra predictors in order to improve the time series forecast. The sample CashFlows shipped with the product has 23 extra predictors. We expect those extra predictors to have values not only in the past but also for the periods ahead that are in the forecast horizon (21 days for CashFlows). We can see from the model overview that the extra predictor "MondayMonthInd" was helpful in catching a cycle in the data.

The log says: Chosen trend is (Polynom( Date))  Chosen periodicity is (PeriodicExtrasPred_MondayMonthInd)  Fluctuations are corresponding to a (AR(46)) model.

The input file is: ...\SAP Predictive Analytics\Desktop 2.3\Automated\Samples\KTS\CashFlows.txt

One can automatically build a time series model per store, country or company in your case, using a KxShell script. KxShell script on time series is covered on page 68 of the user guide mentioned above. Here is also an example on recommendation.

http://scn.sap.com/community/predictive-analytics/blog/2015/09/02/sap-predictive-analytics-23-recomm...

Marc

Former Member
0 Kudos

Hi Marc,

thanks for your feedback. The User Guide seems helpful... didn't know it before, but will check it!

I understand your answers related to questions 1) and 4) ?

(would have been nice to reference it)

KxShell script seems to be a bit "over-engineered" for data I could load "all at once" in one csv file?!

Regards, Martin

PPaolo
Advisor
Advisor
0 Kudos

Hello Martin,

You can find below some answers to your questions.  The information provided by Marc is already very good and you might have found the answer yourself in the meanwhile.

1) Why do some results show trend lines and others don't?

The Time Series Forecasting module tests various types of trend (linear or polymonial functions based on the time and on the extra variables and others). It might be that no good trend was found or that the simple fluctuations signal was better explaining the output without any trend.

2) What does the horizon warning mean? I found some warnings with 2, 3 or 4 as maximum horizon.

The horizon proposed before generating the model depends on the number of rows with extra variables you have. During the model generation there might be other constraints which are not visible before which tell you that the prediction won’t be accurate or even possible for all of the requested horizon.  Also notice that if you expect your extra-predictable variables to influence your forecasts, you need to know them also in the future. The Time Series Forecast module automatically generates extra-predictable variables related to the date (what is the day, what is the month, what is the quarter, etc.) but you need to provide the business information if you have it. If you don’t have this information for the future you might have a good model on past data but a bad one in the forecast.

3) How can the structure file be optimized? Is there a how-to or SCN document/blog available?

What do you mean by ‘structure file’? If you mean the input data then you just have to make sure that you have as much as business data as possible and that you correctly specify its nature and type (e.g. numeric vs. string, and ordinal vs.continous vs. nominal) in the description page.

4) Is there a way to analyze more than only one company at a time?

I would like to load the whole DAX (German main index) and use the same ~20 key figures of all companies for finding the results per company.

Since all shares have the "same attention" (like when in DAX or SDAX or MDAX) I would like to use additional "trends" within the market for analysis.

Is there somehow a "learning effect" I can initiate in the tool by using different data with same variables?

As Marc pointed out you can use the KxShell script language to automate the process or you can manually build many models adding filters in the dataset (and then automating their refresh with Model Manager)


5) Is there a way to use no 4) "more companies at once" and getting only the trend lines per company as result ... not the single predicted values?

When you apply the model you can chose the option to get the forecasts with their components and then use only the trend part.


6) How do I find out which of the 20 key figures I should keep or change for better results?

You can check the contribution by variables to see what is most influencing the regressive part. Also in the Statistical Reports, under ‘Cross Statistics with the Target’ you can check which variables and which categories mostly affect the target.


7) How does PA deal with mixed input of "total values" and "percentage key figures"?

Can you make an example?


😎 How can I tell PA which relationships exist between key figures, e.g. those which are an outcome of a formula using the weekly share value.

PA might automatically detect such relationships and it might discard one variable if it is too correlated with another.


9) I checked the logs and found statements like:

"The automatic variable selection process discarded all the extra-predictable variables when estimating the trend(<list-of-variables>)" or

"The trend model (Regression<list-of-variables>...has been discarded from the competition." What does this mean?

Are all my 20 key figures in the file neglected and the forecast is based only on the historic share values? What could be the reason?

There might be no influence of those variables on the trend. Also make sure that you have ordinal or continuous extra predictable variables because nominal ones are not taken into account for cycles.


Hope that it helps

PPaolo

Former Member
0 Kudos

Thanks Pierpaolo Vezzosi, for all your detailed answers...

You are right! In the meanwhile, I started reading the User Guide "Classification, Regression, Segmentation and Clustering Scenarios" (pa23_class-clust_user_en.pdf)

... and it revealed the "holy grail" to me 😄

The change from "nominal" to "continuous" in all additional key figures improved the result tremendously!

My MAPE results are in the area of > 0,09 and the new forecast is quite good.

I think I will create a second blog during the next days, documenting my further results, etc.

Regarding no. 7: I have key figures which use EUR-values (in 1€ or 1 Mio. €, like Cashflow, Revenue, Profit, etc) and I have %- Key figures (like profitability, net or ebit margin, etc.)

I was wondering if the tool recognizes the difference, since I don't find "percentage" in the "storage"-dropdown. Only "number, string, integer, date, etc". So I'm not sure about the final result if my additional variables are mixed, but all defined as "number".

I think using only total values would be better than to mix?!

Thanks again,

Martin

PPaolo
Advisor
Advisor
0 Kudos

Hello Martin,

re: Regarding no. 7: I have key figures which use EUR-values (in 1€ or 1 Mio. €, like Cashflow, Revenue, Profit, etc) and I have %- Key figures (like profitability, net or ebit margin, etc.)

I don't think that this is a problem. The absolute values and the percentage values provide a different business meaning and hence, potentially, a different kind of information. If they are correctly defined I believe you can safely use both.  In case there are strong correlations between some of those variables, SAP Predictive Analytics should be able to detect them and discard one variable.

BTW, you could even try and run a Regression on the same dataset to see if there are specific variables who have a strong importance and others who might be less useful or strongly correlated to something else.

If anybody else has a different opinion, please post here, it is an interesting subject.

Thanks and regards

PPaolo

Answers (0)