cancel
Showing results for 
Search instead for 
Did you mean: 

Data Manager

Former Member
0 Kudos

Hi,

I though that Data Manager in PA is to enrich data in order to have easier data analysis. The data can be any kind of data: regression or classification,  time-series or not related to time. But in tutorial "Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Data Manager", it says that the data set must have a time table and the target variable must be binary. Does it mean that it is not possible to enrich other kinds of data with Data Manager?

Best regards,

Rasoul Karimi

Accepted Solutions (1)

Accepted Solutions (1)

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hello Karimi,

Can you please mention the tutorial/page where you find the mention that you quote?

Thanks & best regards

Antoine

Former Member
0 Kudos

Hi,

In page 7, section Configure time-stamped population :

....One of the tables must contain a date column so that the time-stamp concept can be applied ... Note that the variable must return only two values, ie “Yes” and “No or 1 and 0 or any other value pair.

Tutorial : Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Data Manager, November 2015.

Thanks,

Rasoul

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi Rasoul,

Thank you for raising this important question.

Let me try to clarify.

In the Data Manager methodology, we use the three elements in sequence: the entity, the analytical record and the time stamped population. I will not detail here what they are, please refer to Andreas' tutorial.

The time stamped population (TSP) can be seen as your "business question". This is where you define your target, usually outside of the analytical record, as you might want to generate many TSPs for one given Analytical Record - and you do not need the same time to create the TSP and the AR.

This target in the TSP is not limited to nominal variables as our help and Andreas' tutorial say. So I can currently specify a continuous target and then later on use this to create regression or clustering models for instance.

As long I might want to take advantage of the concepts of the reference date (for instance to later on import it in Model Manager and manage the model on the long-run), yes a date column is needed.

For time-series, it is a little bit different as the concept of date handling is built-in. There is no prerequisite to create the different Data Manager objects to take advantage of the notion of reference date. Of course, you will still need a date column in your data source to be able to do time series modeling.

I hope this helps, please tell me if anything is unclear.

Are you looking to achieve a specific scenario?

Thanks & regards

Antoine

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

On the general Data Manager methodology, I found this diagram useful. It's in the guide here, page 7

http://help.sap.com/businessobject/product_guides/pa25/en/pa25_user_ADM_en.pdf

Former Member
0 Kudos

Hi Antonie,

Thank you for your detailed answer. I am working on a time-series forecasting problem. The data is already in a data base. First, I have to extract data and put it in right format. For that, I should use Data Manager. Other possibility is to use SAP HANA Studio and create views. I know Hana Studio, but I prefer to use Data Manager because it seems easier. The functionalists of  Data Manager is less than HANA Studio, but I hope that I can implement all data pre-processing tasks for time-series forecasting in Data Manager.

There are a couple of tutorials for time-series forecasting in PA and they are very useful. But the data is in a text file and not in a data base.

I conclude from your comments that Data Manager can be used for data pre-procesing for binary classification, multi-class classification, regression, clustering, time-series classification/forecasting, and date column is not necessarily required if I do not need it. Is this true?

Thanks,

Rasoul

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

To reply to your question: yes in general a date column is not a mandatory prerequisite - but it is mandatory to do time series forecasting, for obvious reasons.

Please note Data Manager makes it possible to automate the production of data sets, enrich data and retrieve snapshots of data corresponding to specific time-stamped populations ("business questions").

However, specially in the initial phase, you can connect the Modeler to the database directly if you feel like the content of your data set is already rich enough.

I would be curious to understand what you mean by data pre-processing. Automated Analytics Modeler automatically deals with the initial processing of data usual tasks, like dealing with outliers, missing values, continuous/binary variables. As a user, you do not need to do any specific data pre-treatments of this sort.

Hope this helps,

Best regards,

Antoine

Former Member
0 Kudos

There is a typical transactional table in the data base for a retail store. On the other hand, in order to build a prediction model, I need a data set with the following columns :

Item_Id, Hour_of_day, day_of_week, week_of_month, month, holiday, price, profit

This data aggregate sales statistics of items in an hourly-based fashion. Profit is the target column and is equal to sales*price.  For example, between 8-9 of a specific day and week and month, how many items were sold and how much money was earned. Based on that, I want to predict the future. In this way, I treat this problem as a normal regression problem. The other solution is to use time-series forecasting. In this case, profit is still the target variable and the value is predicted in different times. Price and Holiday are also and external variable. The format of the data set would also be like this :

Timestamp,  profit, price, holiday

In HANA studio, I have to create views and calculated columns to generate this data, which is not so easy. I was wondering if I can handle it easier in Data Manager?

You mention three functionalists for Data Modeler, to automate: 1) the production of data sets, 2)enrich data and 3) retrieve snapshots of data.  What I need is the first one: to automate the production of data.

Thanks for your help,

Rasoul

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

You should definitely give it a try with Data Manager ;-). Can you please flag the original question as "Answered" if you feel such is the case?

Thanks & regards

Antoine

Answers (0)