In this blog, I have shared my experience gained while creating forecast procedure using following features of SAP HANA:
- APL (Automated Predictive Library) Forecast Function
- Application Function Modeler (AFM)
I haven't seen any convincing blog which promotes this feature of SAP HANA. Personally, I believe it is really good feature for people with beginner or no SQL scripting skills as AFM is graphical tool. However, the end result is a procedure which can be used later for different purpose. I created this forecast procedure based on the use case provided by Forefront Analytics. The use case instructed to use APL forecast function. The input dataset was provided by Forefront Analytics. I started my research on APL and my main focus was time series function i.e. "Forecast". I learnt about APL installation and different APL functions as I progressed. I discovered about AFM when I was watching SAP Technology videos as part of my APL research.
I have tried to document my work in following three sections:
Prerequisites
There are following prerequisites before we start developing:
- SAP HANA (SPS 09)
- SAP AFL (this should be installed as part of SAP HANA)
- SAP APL (you will need to install it separately)
- unixODBC 64 bit (No version was mentioned in guide so I used 2.3.2)
- SAP HANA Script Server should be enabled
You can refer to this blog if you need to install APL.
What is APL Forecast Function?
The forecast operation consists in building and training a time series model, and then applying the model in order to obtain the forecast data. Following table lists the input and output tables required for forecast function to work. Also, there is expected table structure within the description.
Direction | Type | Description |
IN
| FUNC_HEADER | Optional (The table must be provided but it can be empty.) The function header defines the operating parameters of an APL function call. For instance, a function header can be used to set the logging level or to provide an operation id. A function header is made of a collection of string pairs { name, value }. It's usually the first input parameter of a function. It can be empty.
Expected Table Structure:
Column | SQL Data Type | Description | KEY | (N)VARCHAR, (N)CLOB | The parameter key | VALUE | (N)VARCHAR, (N)CLOB | The parameter value |
Supported Parameter Names:
Parameter Name | Description | OID | Optional ●Supported values: Any string ●Default value: None. An operation ID. This optional ID can be provided by the APL caller to tag all the inserted rows in the the output tables. | ModelFormat | Optional ●Default value: bin. The requested output format for the model. This impacts the table structure of the model output table. | LogLevel | Optional ●Min value: 0 (Disabled) ●Max value: 10 ●Default: 8 This impacts the amount of progress messages and logging information produced by APL and the Automated Analytics engine. These messages can then be retrieved from the log operation table. |
|
IN | VARIABLE_DESCS | Optional The table must be provided but it can be empty. The variable descriptions for the input dataset, as expected by the Automated Analytics engine.
Expected Table Structure:
Column | SQL Data Type | Description | 1st Column | INTEGER | Variable rank | 2nd Column | (N)VARCHAR, (N)CLOB | Variable name | 3rd Column | (N)VARCHAR, (N)CLOB | Storage. Possible values: ●number (the variable contains only "computable" numbers (be careful a telephone number, or an account number should not be considered numbers) ●integer ●string (: the variable contains character strings) ●date (the variable contains dates) ●datetime (the variable contains date and time stamps) ●angle | 4th Column | (N)VARCHAR, (N)CLOB | Value type: the value type of the variable. Possible values: ●nominal (categorical variable which is the only possible value for a string) ●ordinal (discrete numeric variable where the relative order is important) ●continuous (a numeric variable from which mean, variance, etc. can be computed) ●textual (textual variable containing phrases, sentences or complete texts) | 5th Column | INTEGER | Key level | 6th Column | INTEGER | Order level | 7th Column | (N)VARCHAR, (N)CLOB | Missing string value: the string used in the data description file to represent missing values (for example, "999" or "#Empty" - without the quotes) | 8th Column | (N)VARCHAR, (N)CLOB | Group name. | 9th Column | (N)VARCHAR, (N)CLOB | Variable description: an additional description label for the variable. | OID(10th Column) |
| The operation ID, if set. Otherwise a new one is generated. This column is optional. |
|
IN | OPERATION_CONFIG | Mandatory The configuration of the training operation. For this function, you must declare the type of Automated Analytics model and you can declare the optional Cutting Strategy in the OPERATION_CONFIG table as follows:
Key | Supported Values /DescriptionAPL/ | DescriptionAPL/ TimePointColumnName (Mandatory) | Name of the column in the dataset that contains the time points of the time series. | APL/Horizon (Mandatory) | Number of forecasted time points | APL/LastTrainingTimePoint (Optional) | Value in the time point column which represents the last point in time for the training dataset | APL/CuttingStrategy (Optional) | The Cutting Strategy defines how a training set is cut under three subsets (estimation, validation and test sets) when needed. For time series: 'sequential with no test''sequential' (default value) |
Expected Table Structure:
Column | SQL Data Type | Description | KEY | (N)VARCHAR, (N)CLOB | The parameter alias name | VALUE | (N)VARCHAR, (N)CLOB | The parameter value |
|
IN | VARIABLE_ROLES | Optional The table must be provided but it can be empty. The roles of the variables for this training.
When training a model, the roles of the variables can be specified. These variable roles are provided as string pairs {variable name, variable role}. In data modeling, variables may have four roles. They may be: ●Target variables (also known as output variables): a target variable is the variable that you seek to explain, or for which you want to predict the values in an application dataset. It corresponds to your domain-specific business issue. In some businesses, target variables may also be known as variables to be explained or dependant variables. ●Explanatory variables (also known as input variables): an input variable describes your data and serves to explain a target variable. Explanatory variables may also be known as causal variables or independent variables. ●Weight variables: a weight variable allows one to assign a relative weight to each of the observations it describes, and actively orient the training process. ●Skipped variables: these variables are ignored during the training process
Expected Table Structure:
Column | SQL Data Type | Description | NAME | (N)VARCHAR, (N)CLOB | Variable name | ROLE | (N)VARCHAR, (N)CLOB | Variable role. Supported roles: ●input ●skip ●target ●weight |
|
IN | DATASET | Mandatory The name of your input (training) dataset. Datasets are used for training models and applying models. Datasets used for training or applying models can contain any number of columns (within HANA and AFL limits). The supported SQL datatypes for the dataset columns are all the datatypes supported by AFL: ●INTEGER ●BIGINT ●DOUBLE ●CLOB & NCLOB ●VARCHAR & NVARCHAR ●DATE, TIME, TIMESTAMP, SECONDDATE |
OUT | DATASET | The resulting forecast
|
OUT | LOG | The training log The operation log. When performing an APL operation, especially training or applying a model, the Automated Analytics engine produces status/warning/error messages. These messages are returned from an APL function through an output database table.
Expected table structure:
Column | SQL Data Type | Description | OID | (N)VARCHAR, (N)CLOB | Operation ID (OID) | TIMESTAMP | TIMESTAMP | Message timestamp | LEVEL | INTEGER | Message level | ORIGIN | (N)VARCHAR, (N)CLOB) | Message origin | MESSAGE | NCLOB | The actual message |
|
OUT | SUMMARY | The training summary
When training a model, debriefing information related to the training operation is produced. This is known as the training summary. This information is a set of indicators, provided as string pairs { KEY, VALUE }.
Expected table structure:
Column | SQL Data Type | Description | OID | (N)VARCHAR, (N)CLOB | Operation ID (OID) | KEY | (N)VARCHAR, (N)CLOB | Indicator name | VALUE | (N)VARCHAR, (N)CLOB | Indicator value |
|
OUT | INDICATORS | The variable statistics and indicators
When training, testing or querying a model, it's possible to retrieve variable indicators (i.e variable statistics). For each variable, a collection of indicators may be retrieved. These indicators are described using the following attributes: { variable name, indicator name, indicator value, indicator detail (when applicable) }. Indicators are returned from an APL function through an output database table. The output table contains estimator indicators for regression models, to help plotting the regression curve.
Expected table structure:
Column | SQL Data Type | Description | OID | (N)VARCHAR, (N)CLOB | Operation ID (OID) | TARGET | (N)VARCHAR, (N)CLOB | Name of the target, when the indicator is based on it, for example: predictive power of predictive confidence | VARIABLE | (N)VARCHAR, (N)CLOB | Variable name | KEY | (N)VARCHAR, (N)CLOB | Indicator name
| VALUE | (N)VARCHAR, (N)CLOB | Indicator value | DETAIL | (N)VARCHAR, (N)CLOB | Indicator detail |
|
What is Application Function Modeler?
The SAP HANA Application Function Modeler (AFM) is the default editor for flowgraphs. A flowgraph is a development object. It is stored in a project and has extension .hdbflowgraph. By default, the activation of a flowgraph generates a procedure in the catalog. A flowgraph models a data flow that can contain
- tables, views, and procedures from the catalog
- relational operators such as projection, filter, union, and join
- functions from Application Function Libraries (AFL) installed on your system
- attribute view and calculation view development objects
In addition the AFM provides support for some optional, additional components of the SAP HANA Platform such as
- the Business Function Library
- the Predictive Analysis Library
- R Scripts
- Data Provisioning operators
- the generation of task plans
We will be using Application function modeler to create flowgraph consuming APL forecast function. This will generate procedure in our defined schema. We will be following below approach to create forecast flowgraph.
- Create Catalog Objects such as schema, source tables, function tables, output tables
- Create Flowgraph
Assumptions:
- XS project has already beencreated. In this example we have used following XS project: APP1
Create Catalog Objects
Schema
- First of all, please create schema definition file named APP1.hdbschema using following method:
- Right click project name and select New > Other > SAP HANA > Database Development > Schema
- Select the folder where you want to save this file and give it a name “APP1”. Please make sure there is no template selected
- Once file is created then open it and enter following
schema_name = "APP1";
- Right click and activate it. This will create the schema
- Run following SQL statements to grant privileges to _SYS_REPO
grant alter on schema "APP1" to "_SYS_REPO";
Tables
Please create DDL source file (core data services) using following method.
- Right click project name and select New > Other > SAP HANA > Database Development > DDL source file
- Select the folder where you want to save this file and give it a name. Please make sure you have selected empty structure. Perform this three times and create three files func_data.hdbdd, output_data.hdbdd, source_data.hdbdd.
- Once files are created then open it and enter following:
func_data.hdbdd
namespace APP1.db;
@Schema: 'APP1'
context func_data {
@Catalog.tableType : #COLUMN
@nokey
entity FUNCTION_HEADER {
KEY : hana.VARCHAR(50);
VALUE : hana.VARCHAR(50);
};
@Catalog.tableType : #COLUMN
@nokey
entity OPERATION_CONFIG {
KEY : hana.VARCHAR(1000);
VALUE : hana.VARCHAR(50);
};
@Catalog.tableType : #COLUMN
@nokey
entity VARIABLES_ROLES {
NAME : hana.VARCHAR(50);
ROLE : hana.VARCHAR(10);
};
@Catalog.tableType : #COLUMN
@nokey
entity VARIABLES_DESC {
RANK : Integer;
NAME : hana.VARCHAR(50);
STORAGE : hana.VARCHAR(10);
VALUETYPE : hana.VARCHAR(10);
KEYLEVEL : Integer;
ORDERLEVEL : Integer;
MISSINGSTRING : hana.VARCHAR(50);
GROUPNAME : hana.VARCHAR(50);
DESCRIPTION : hana.VARCHAR(100);
};
};
source_data
namespace APP1.db;
@Schema: 'APP1'
context source_data {
@Catalog.tableType : #COLUMN
Entity DATA {
key Date: LocalDate;
Produced: Double;
};
};
output_data
namespace APP1.db;
@Schema: 'APP1'
context output_data {
@Catalog.tableType : #COLUMN
@nokey
entity FORECAST_OUTPUT {
Date : LocalDate;
Produced: Double;
kts_1: Double;
};
@Catalog.tableType : #COLUMN
@nokey
entity FORECAST_LOG {
OID: hana.VARCHAR(50);
TIMESTAMP: UTCTimestamp;
LEVEL: Integer;
ORIGIN: hana.VARCHAR(50);
MESSAGE: hana.CLOB;
};
@Catalog.tableType : #COLUMN
@nokey
entity FORECAST_SUMM {
OID: hana.VARCHAR(50);
KEY: hana.VARCHAR(100);
VALUE: hana.VARCHAR(100);
};
@Catalog.tableType : #COLUMN
@nokey
entity FORECAST_INDI {
OID: hana.VARCHAR(50);
VARIABLE: hana.VARCHAR(100);
TARGET: hana.VARCHAR(100);
KEY: hana.VARCHAR(100);
VALUE: hana.VARCHAR(100);
DETAIL: hana.CLOB;
};
};
- Save the files, right click on them and activate. This will create required tables in schema “APP1”. You will see following tables in schema
Creating Flowgraph
Assumption:
Source data table is populated with data
Please create FORECAST_FG.hdbflowgraph file using following method:
- Right click project name and select New > Other > SAP HANA > Database Development > flowgraph model
- Select the folder where you want to save this file and give it a name.
- Once files are created then open it and right click on empty space to go to properties of this flowgraph model. Please change the schema from _SYS_BIC to APP1
- Add forecast function (drag and drop) from general list in node palette
- Go to function properties and then choose APL_AREA in area and once functions are populated then then choose FORECAST function. Your screen should be like below:
- Next, we will add input table and then map it to corresponding function node. Please select FUNCTION_HEADER from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Source” when system asks you. Now, create a connection from this table to FUNC_HEADER node in function. Please note you can name this each node as per your need.
- Please perform similar for all the input tables and your screen should be like below:
- We will now add output tables and then map it to corresponding function node. Please select FORECAST_OUTPUT from tables list in APP1 schema and then drag and drop to empty area in flowgraph model. Select “Data Sink” when system asks you.
- Before we create a connection between node and table we will need to define the columns in output node of function. Please use similar table structure as we defined above when creating output table in schema APP1. Go to properties of node > signature > add/change columns and its data types
- Now, create a connection from node “Output_Data” to this table FORECAST_OUTPUT. Please note you can name each node as per your need.
- If you are going to truncate table everytime you run procedure then please select following option in node properties.
- Please perform similar for all the output tables and your screen should be like below:
- Please note that in each node there is green tick which means there is no errors
- Now, please activate this model
- Once activated you will see new procedures and table types are created in schema APP1
- To call this procedure please use following SQL statement:
CALL "APP1"."APP1.fg::FORECAST_FG"();
References