The success of Linux adoption within SAP

 

Background

 

2 years ago, SAP Global IT Infrastructure Service – Service Center Labs IT, took the challenge of simplifying Linux OS deployments (OSD) in the area of developer workspace environment.

 

Until then, there was neither Linux OSD service nor Linux support provided by SAP Global IT
in this area.
This means that each developer who needed access to Linux OS, spent valuable time
installing his own Linux system.

From IT management perspective, there was no control over these systems – it was not
secure, not conforms to any guidelines, not managed or inventoried.

 

Together with Senior Linux consultant Shay Cohen of G.S.R. IT consulting Ltd., Labs IT designed and built a flexible and scaleable service to
manage and deploy Linux systems in automated way.

 

When we designed the service, our goal was to provide the end user with a system which is preconfigured and ready for use out of the box. We focused on two main aspects:

  1. Conformity with SAP Global IT standards (e.g. systems naming conventions, security requirements, system settings)
  2. Simplicity:
    1. For IT to deploy
    2. For end user to request and use

 

How we achieved this?

 

Using native Linux tools, Linux Deployment Tool Kit was built and supported the following process:

 

LDT_Process.jpg

The first step of the process after the end user submitted service requires, is the key for the auto configuration and for the out of the box experience we wanted to achieve. In this step, IT technician enters LDT deployment task. In order to enter it, the following data should be provided:

 

  1. User ID from Active Directory which will be defined with elevated permissions on the system.
  2. MAC address of the target system for Linux OSD.
  3. Location of the target system
  4. Equipment No.(Internal asset number) of the target system. This will be used to configure the hostname according to SAP IT naming convention.
  5. System type – VM, Desktop or server – this will affect the way the system will be configured.E.g. different hostname, VMWare tools installed/not installed etc.
  6. SWAP File size.
  7. Required Linux distribution (SUSE/Redhat etc.)
  8. Profile – preconfigure set of tools which will be installed on the system.

 

With this information in the DB the system can be fully automatically installed and configured – ready for use out of the box!

 

This process enables us to reach the goals we set:

  1. Conformity with SAP Global IT standards:
    1. Each Linux system which is deployed via LDT is
      automatically configure – hostname, DNS settings, input local etc. are
      configured according to the deployment task which is entered via SAP IT OS
      deployment portal.
    2. McAfee
      Anti-Virus agent is installed and managed centrally by SAP Global IT Client
      Protection team.
    3. LDT Agent is installed. This agent is the system
      management interface for Labs IT. It checks periodically for tasks waiting for
      the systems and reports back to LDT DB about the system information, heartbeat,
      Anti-Virus agent status and tasks execution results.
    4. Scrambled root
      password with a local rescue account with periodically changing password to
      enable IT support login.
    5. Integration with
      SAP Active Directory domain.
  2. Simplicity:
    1. For IT to deploy – all is required from IT support technician who
      deploys Linux is to enter the required information in SAP IT OSD Portal and
      create a LDT deployment task. Afterwards, the OSD process run automatically
      after the technician boots the system with LDT boot ISO.
    2. For end user to request and use – all it takes for the end user to
      request Linux system is to enter an IT Service request with his used ID and the
      equipment number of his system. Afterwards, he is shipped with a system
      which is ready for use out of the box
      – just login with your domain account
      and password and start working!

 

 

Adoption of the service

 

The service was very successfully adopted by IT teams as well as our customers – SAP HANA developers any other development/QA teams who needs to work with Linux.

Since the service went live in October 2012 over 1,400 LDT OSD took place. Below the monthly deployment trend is presented for the last 5 months of 2013. The screen shot is captured from LDT inventory portal:Statisrics.jpg

 

In LDT portal, we can also track the number of live systems. These are system which reported back to the system in the last 24Hrs. this dashboard present the number of live systems, deviation by geographical region, distribution and type:

Dashboard.jpg

Summary

 

As SAP HANA took place in SAP strategy, the demand from HAVA developers for Linux systems increased drastically, and especially for SUSE Linux.

With LDT service in place, SAP Global IT was ready to support this growing demand with simple to use service.

 

HANA developers have access Linux systems at the tip of their fingertips, reducing the time it takes them to
setup these systems from few hours to few minutes.

Troy Thurston

Table T006D for SHINE

Posted by Troy Thurston Nov 28, 2014

The EPM sample data included with the SHINE demo application is a helpful resource for working on development and modeling scenarios.

 

Included in the sample data are supporting tables that are required to perform Unit of Measure and Currency conversions in analytic and calculation views, as stated in the SAP HANA Modeling Guide.

Image1.png

However, the sample data is missing table T006D.  Without this table, an error is generated when attempting a Unit of Measure conversion.

 

Image2.png

 

The attached file has the necessary SQL DDL and DML scripts to create and load a column table (and create a related synonym) for T006D.  The scripts can be executed from a SQL Console view in the SAP HANA Modeler perspective.

 

Image3.png

 

After refreshing the schema, T006D is displayed and can be referenced for Unit of Measure conversions.

 

Image4.png

Image6.png

 

(Note: T006D table structure and default data obtained from SAP BW 7.4 SP8.)

Big Data is about VOLUME, VARIETY and VELOCITY of data. Let us see how SAP HANA platform full fill the requirement of 3 V’s (Volume, Variety and velocity) challenges of Big Data.

VOLUME

Volume of data increasing day by day and by 2020 it will be 40 Zetabyte .So for Big data now challenge is to store high volume of data.SAP HANA has successfully overcome with the volume aspect of Big Data by fortifying  SAP HANA platform. Following are the two game changing features in SAP HANA platform related to data volume.


      • SAP HANA and HADOOP integration
      • Dynamic Tiering

SAP HANA and HADOOP integration

            HADOOP facilitate to store infinite volume of data using distributed file system.SAP with its release of SP09, very tightly integrated with hadoop.Following are the SAP HANA and HADOOP integration options:

      • SDA (Smart Data Access)
      • SAP Data Services
      • SAP BO-IDT (Information Design Tool)
      • HANA XS Engine and Hadoop Hbase


SMART DATA ACCESS:Smart Data Access (SDA) provides SAP HANA with data virtualization capabilities. This technology allows to create a virtual table to combine SAP HANA data with other heterogeneous data sources like-HADOOP,TERADATA,MS SQL SERVER,ORACLE,SAP Sybase ASE,SAP Sybase IQ,SAP HANA

            

              In SAP HANA SPS07, HANA connect to HIVE:



    CREATE REMOTE SOURCE HIVE

    ADAPTER "hiveodbc"

    CONFIGURATION 'DNS=HIVE'

    WITH CREDENTIAL TYPE 'PASSWORD'

    USING 'user=hive;password=hive';


        • Create Virtual table on HIVE remote data source and consumed it on HANA catalog.


                In SAP HANA SPS08, HANA connect to Apache SPARK:

                           

        • SQL Script to create Remote Data Source to HADOOP SPARK


    CREATE REMOTE SOURCE HIVE

    ADAPTER "hiveodbc"

    CONFIGURATION 'DNS=SPARK'

    WITH CREDENTIAL TYPE 'PASSWORD'

    USING 'user=hive;password=SHa12345';


        • Create Virtual table on SPARK remote data source and consumed it on HANA catalog.

     


                In SAP HANA SPS09, HANA directly connect to Hadoop HDFS:


        • Create Map Reduce Archives package in SAP HANA Development Prospective using JAVA
        • Create Remote Data Source directly to Hadoop HDFS


    CREATE REMOTE SOURCE HADOOP_SOURCE

    ADAPTER "hadoop"

    CONFIGURATION 'webhdfs_url=<url:port>;webhcat_url=<url:port>'

    WITH CREDENTIAL TYPE 'PASSWORD'

    USING 'user=hive;password=hive';


        • Create Virtual Function


    CREATE VIRTUAL FUNCTION HADOOP_WORD_COUNT

    RETURN TABLE ("word" NVARCHAR(60),"count" integer)

    package DEV01."DEV01.HanaShared::WordCount"

    CONFIGURATION 'enable_remote_cache;mapred_jobchain=[("mapred_input":"/data/mockingbird"."mapred_mapper":"com.sap.hana.hadoop.samples.Wordmapper",

    "mapred_reducer":"com.sap.hana.hadoop.samples.WordReducer"}]'

    AT HADOOP_SOURCE;


        • Create Virtual UDF to directly connect to HDFS file.


    CREATE VIRTUAL FUNCTION HADOOP_PRODUCT_UDF()

    RETURN TABLE ("product_class_is" INTEGER, "product_id" INTEGER,"brabd_name" VARCHAR(255))

    CONFIGURATION 'datetiem_format=yyyy-MM-dd HH:mm:ss;date_format=yyyy-mm-dd HH:mm:ss;time_format=HH:mm:ss;enable_remote_caching=true;cache_validity=3600;

    hdfs_location=/apps/hive/warehouse/dflo.db/product'

    AT HADOOP_SOURCE;


     

                                      

             

    CONNECT TO HADOOP USING SAP DATA SERVICE         

        • Select File Format tab from Local Object Library->right click on HDFS File and click New

                                       SDA4.png

        • Provide following parameter values in HDFS File Format editor
          • Name: HDFS
          • Namenode host: <host name of hadoop installation>
          • Namenode port: <hadoop port>
          • Root Directory: < Hadoop file path>=</user/hadoop/input>
          • File Name: hdfs_data.txt

                                                  SDA5.png

        • Click on Save&Close and double click on created HDFS file again to view file format.

                                                    SDA6.png

        • Cretate Project->Job->Data Flow
        • Drag HDFS file to the canvase and make it as source->drag query transformation and target table on the data flow canvase and join.

                                            SDA7.png

        • Double click on Query transformation and schema IN and schema out

                                             SDA8.png

        • Execute Job and view the data brought in HANA from Hadoop.

                                             SDA10.png


    SAP BO(IDT)-HADOOP INTEGRATION

     

                                            HADOOP_IDT.png

    HANA XSENGINE AND HADOOP HBASE

        

                        HANA XSEngine can talk to Hadoop Hbase via server side Javascript.Please refer following article for more details.

                                                 XSEngine.png

     

                        Streaming Real-time Data to HADOOP and HANA

        

    DYNAMIC TIERING

     

              Dynamic tiering is SAP HANA extended storage of SAP IQ ES server integrated with SAP HANA node.Dynamic tie-ring has been included in SPS09.HOT data reside in SAP HANA In-Memory and warm data reside on IQ ES server columnar petabyte storage on disk.It provides environment to increase Terabyte SAP HANA In-Memory capability to Patabyte columnar disk storage without using Hadoop.

     

         HOT & WARM Table creation:

     

    CREATE TABLE "SYSTEM".SalesOrder_HOT" (

    "ID" INTEGER NOT NULL,

    "CUSTOMERID" INTEGER NOT NULL,

    "ORDERDATE" DATE NOT NULL,

    "FINANCIALCODE CHAR(2) NULL,

    "REGION" CHAR(2) NULL,

    "SALESREPRESENTATIVE" INTEGER NOT NULL,

    PRIMARY KEY("ID")

    );

     

     

    CREATE TABLE "SYSTEM".SalesOrder_WARM" (

    "ID" INTEGER NOT NULL,

    "CUSTOMERID" INTEGER NOT NULL,

    "ORDERDATE" DATE NOT NULL,

    "FINANCIALCODE CHAR(2) NULL,

    "REGION" CHAR(2) NULL,

    "SALESREPRESENTATIVE" INTEGER NOT NULL,

    PRIMARY KEY("ID")

    )USING EXTENDED STORAGE;

     

     

    Reference Document:

    SAP HANA SPS 09 - Dynamic Tiering.pdf

     

    Reference SAP HANA Academy Video:

     

    SAP HANA Academy - SAP HANA Dynamic Tiering : Installation Overview [SPS 09] - YouTube

    SAP HANA Academy - SAP HANA Dynamic Tiering: Introduction [SPS09] - YouTube

    Over the last few years the reputation of Hana constantly grew and other people found various arguments why Hana is nothing better. Me being an Oracle expert for more than 20 years was among them, I have to admit. Looking back it was rather a lack of understanding on my side and being trapped in marketing statements of SAP. You think that is a lame excuse? You are probably right.

    So let me take you on my journey with Hana and share some internals you have never read elsewhere before.

     

    The first argument came in 2010, mostly from Oracle, and back then was - understandably - "Of course keeping everything in your head is much faster, but that is simply not doable." If I may translate that, the statement was: memory is too limited and too expensive. True enough, even today. What is the price of a hard disk with 1TB and, in comparison, how much does a server with 1TB of memory cost? A completely reasonable argument.

    Actually, I just digged up a youtube video and it is fun to watch, even today.

     

     

    SAP was arguing at the time that you compress the data and hence you do not need as much memory. We all know how compression works and the costs involved, I found that not very convincing.

     

    What struck me even more however was the fact that traditional databases do cache the data in RAM as well, so they are in-memory so to speak, except that only the frequently accessed data is cached, the archived data does not eat into your memory budget.

     

    What I hated about Hana the most was the row versus column storage. Marketing touted that thanks to the columnar storage you can aggregate data much faster, and when confronted with the question of reading an entire row with all columns, the response was "we have row storage as well". Excellent answer. Now I would need both, a  row and columnar storage for each table? You cannot be serious.

     

    With this kind of mindset I started to look at Hana, did typical performance tests and quickly found out, there is something severely wrong with my assessments of Hana. Thankfully the Hana developers took the time to engage with me and provided me with internals that explained what I missed before.

     

    Let me try to show.....

     

     

     

    The three technologies

    According to marketing the benefit of Hana are the top three technologies shown below. I am sure you had been bombed with the same arguments, Hana is In-Memory and therefore it is fast. And Columnar. It does Compression!

    I can fully understand now, why people, including myself, were skeptical. All of these technologies are nothing new as Larry Ellison stated in above video mentioning Oracle's TimesTen product as an example.

    The secret to the Hana performance is not the three technologies as such, in my opinion, it is the intelligent combination of these three plus the insert-only approach. The reason I am saying that is when looking at each technology individually, all have advantages but sever disadvantages as well. Memory is limited, Compression is CPU intensive. Columnar Storage puts column values closely together whereas row storage provides each row's data as one block. Insert-only requires to drop outdated versions from time to time.

     

    In-Memoryram.png
    Compressioncompression.png
    Columnar Storagerow column storage.png
    Insert-Onlyinsert only.png

     

     

     

     

     

    In-Memory ram.png

    The basic idea is that memory is much faster than disk. Actually it is times faster. A 2014 CPU has a memory bandwidth of 10GByte/sec and higher, a single disk around 150MByte/sec - difference by factor 70. If your program is using the same memory frequently, it is cached inside the CPU's L1 or L2 cache, speeding up the memory bandwidth by another factor of 10. On the contrary the disk speed of 150MB/sec is for sequential read access only, random access is times worse for a disk system whereas has no negative impact on the RAM.
    The downside of memory is the costs of the memory chip itself (7USD/GByte for RAM compared to 0.05USD/GByte for disks as of 2014) and the hardware platform you need, in order to cope with more memory, is getting increasingly more expensive also.
    On the other hand, if I need 1TB of RAM that would be 7000USD. While this is much money compared to a single 100USD disk, it is not much in terms of absolute numbers.
    But you can turn around my argument and simply say, if you have a 1TB big database one disk, use a server with 1TB of memory so all data can be cached.

    So the argument "in-memory!" cannot be the entire truth.

    Compression compression.png

    The idea of compression is simple, a single CPU is much faster than the memory bus and the disk, not to mention that multiple CPUs share the same bus, hence compressing data in order to reduce the amount of data is beneficial as long the overhead of that is not too huge. Therefore every major database supports compression. It is not very popular though as compressing a database block and decompressing it takes its toll. The most obvious cost overhead is when data is updated inside a database block. You have to uncompress the database block, make the change and then compress it again. Hopefully it fits into the same database block still otherwise you need a second one.

    So the argument "Compression!" cannot be the entire truth.

    Columnar storage row column storage.png

    For a simple select sum(revenue) the columnar storage is just perfect. You have to read one column only, hence just a fraction of the whole table data is needed. Imagine you have all the data of one column in one file, this will be much faster as with traditional row orientated tables, where all the table data is in one file (or database object to be more precise) and you have to read the entire table in order to figure out each row's columns value.
    In case you want to see all columns of a single row like it is typical for OLTP queries, the row storage is much better suited.

    So the argument "Columnar Storage!" cannot be the entire truth.

     

    Insert only insert only.png

    A real database should have read consistency in the sense of when I execute my select statement I get all committed data of the table but neither will I see data that has been committed after nor will my long running query fail just because the old value was overwritten by the database writer.
    The only major database I know supporting that since the beginning is Oracle (SQLServer has a option to turn that on/off), but you have to pay a price for that consistency. Whenever the database writer does overwrite a block with new data, the old version has to be kept somewhere, in the rollback segments of the database in case of Oracle. So a simple update or insert into an existing block requires two operations, the actual change plus saving the old version.
    With insert only, the idea is a different one, there in each table the old data is never overwritten but only appended. If you execute an update against an existing row, a new row is written at the end of the table with same primary key value plus a transaction number. When a select statement is executed, it will see multiple versions and use the one with the highest transaction number less or equal to the currently active global transaction number in the database.
    There is a tradeoff hence, the tables grow fast with this approach, especially when you update just a single byte, but on the other hand you are faster.

    So the argument "Insert only!" cannot be the entire truth.

     

    Combining the technologies

    Okay, this was my starting point. All technologies are good ideas, other companies tried these as well and have built proper niche products, mostly around the analytic use cases. One database is a pure in-memory database that needs a regular database to persist the data, others you can insert data but not update or delete. Many support compression but usually it is turned off by customers.

    The claim of Hana is to be a database that supports analytic and transactional use cases and is better than other databases in all areas. That should be easy to be put into perspective, I thought.

    However, the one thing I did overlook at that time was how these technologies benefit from each other. So let us go through a couple of mutual benefits to get the idea.

    Combination of Compression with Columnar Storage compression.png + row column storage.png

    Compressing data you can do best whenever there is a repetitive pattern. Let us have a look at a real example, the material master table.

     

    MANDTMATNRERSDAVPSTALVORMMTARTMBRSHMATKL
    80000000000000000002323.01.2004KfalseROH1
    80000000000000000003804.09.1995KDEVGtrueHALBM00107
    80000000000000000004323.01.2004KBVfalseHAWA1
    80000000000000000005805.01.1996KLBXfalseHIBEM
    80000000000000000005905.01.1996KLBXfalseHIBEM
    80000000000000000006812.01.1996KEDPLQXZfalseFHMIA013
    80000000000000000007810.06.1996KVXtrueDIENM

     

    What can you compress better, the entire file or a single column?
    The answer is obvious but nevertheless I exported these columns from the MARA table, all 20'000 rows of my system, into a CSV file (1'033KB big) and did zip the one file with all data plus nine files with one column each.

    a.png

    Obviously the primary key cannot be compressed much, it is half of the data in fact but all other columns are, the MAND file is 303 byte large, the ERSDA file with all the many create dates is 12'803 bytes big.

    But this is not a fair comparison as the zip algorithm favors larger datasets as it can look for patterns easier and it is a fairly aggressive algorithm. In databases your compression is lower to require less CPU cycles and more important, each file is split into database blocks. Meaning that if you have a fairly wide table, one database block might include one row of data only that is compressed - hence almost no compression possible at all.
    With columnar storage we have no issue with that side effect.

    So as you see, the technology of using columnar storage has a huge positive side effect on the degree compression is possible.

    Combination of Compression with In-Memory compression.png + ram.png

    That's an easy one. The more compression we do the less memory we need. Before we said that the cost ratio between disk and RAM is about 700 times cheaper. Thanks to the compression, usually is a factor of 10, the disk is just 70 times cheaper. Or more important, for your 10 TB database you do not need a server with 10TB of RAM which would be very expensive.

    Combination of Compression with Insert-only compression.png+insert only.png

    Compression has one important downside as well however, what if a row is updated or deleted even? The compression spans multiple rows, so when you change a value you have to uncompress the entire thing, change the value and compress it again. With traditional databases you do exactly that. Okay, you do not uncompress the entire table, the table data is split into pages (database blocks) and hence only the impacted page has to be recompressed but still you can virtually watch how much these indexes slow down updates/deletes in traditional databases.
    So what does Hana do? Quite simple, it does not update and delete the existing data. It appends the change as a new version with a transaction ID and when you query the table, you will read the oldest version of each row, the oldest version that matches the query execution start time. So suddenly no recompression is needed anymore, data is appended uncompressed to the end of the table and once the uncompressed area exceeds a limit, it gets compressed and new data is inserted into a new page of the table.
    The other advantage of that approach is, if a single row is updated multiple times, which row will that be? A booking made 10 years ago? Very unlikely. It will be a recent one, one that is still in the uncompressed area likely.

    Combination of Columnar Storage with Insert-only row column storage.png+insert only.png

    As the data is inserted at the end only, finding the data of a row in the various columns is very simple. According to the current transaction id, the row to read was the version at the table position 1'234'567, hence the task is to find for each column the value at that position.

    Just imagine an Excel sheet. What is faster: Reading the cells (A,10) + (B,10) + (C,10), in other words the row 10 with the three columns A, B and C?

    Or reading the the cells (J,1) + (J,2) + (J,3), in other words the column J with the values in the three rows 1,2 and 3?

    It does not make a difference. None at all. The entire argument of reading row-wise is better is actually based on the assumption that it is faster to read horizontally than vertically. Which is true, if the data is stored on disk in a horizontal fashion. Then the data of one row is closely together and hence read in one rush from the disk cylinder. But on a memory system it does not matter at all.

    Putting all four together compression.png+ram.png+row column storage.png+insert only.png

    What was the issue with compression?

    • Compression works best on similar data -> one column often has similar values -> solved
    • Recompression in case something does change -> we do no change data but insert only -> solved
    • Compression is CPU expensive -> not a pkzip like compression is used but dictionary and pattern compression -> faster than reading the plain data

    What was the issue with memory?

    • More expensive that disk -> thanks to the compression that factor is dampened -> Reality had proven Hana can run even large enterprise ERP systems

    What was the issue with Columnar Storage?

    • You need to locate the row value for each column individually -> But actually, for memory it does not matter if you read two words from nearby or far away memory pointers. With compression this might even be faster!
    • Changing values requires to change the entire column string -> True, hence Hana does not change values, it appends data only.

    What is the issue with Insert-only?

    • More and more old data is present and needs to be removed or memory consumption grows fast -> The most recent data is not added into the compressed storage, it is kept in the delta storage. All changes within that delta storage are handled to allow updates.
    • Above problem is faced only if changes are made on rows in the compressed storage area -> less likely but possible.

     

    Comparing Hana with other databases

    Meanwhile other database vendors have been faced with the realities as well and we have seen various announcements to jump on the in-memory bandwagon as well. Given your knowledge about the Hana internals now, you should be able to quantify the advantage for yourself.

     

    Oracle: With Oracle 12c there is an in-memory option available. This option allows you to store the data in addition to the traditional disk based way in a in-memory area as well. Personally I have mixed feelings about that. On the one hand it is very convenient for existing Oracle users. You execute a couple of statements and suddenly you are times faster with your queries.

    But this assumes that the traditional disk based storage does have advantages and I tried to show above it does not. Hence it is a workaround only, kind of what I criticized when the response was "Hana has column and row storage". And it leaves room for questions like

    • What are the costs to store the data twice when inserting/updating data?
    • To what degree is the database slowed down if part of the memory used for database block caching is now reserved for in-memory storage?
    • No question this will be faster on the query side, but the OLTP load worries me.

    Do me that sounds like the dinosaurs way, trying to sound flexible when all you do is actually doubling the costs. Obviously I lack the hands on experience but sorry Oracle, I was a big fan of yours but that does not sound like a master minds plan to me.

    Nice reading I found: Rittman Mead Consulting: Taking a Look at the Oracle Database 12c In-Memory Option

     

    Do you concur with my opinion?

    This post is part of an entire series

    Hana SPS09 Smart Data Integration - Overview

     

     

    Prior to Hana SP9 SAP suggested to use different tools to get data into Hana: Data Services (DS), System Landscape Transformation (SLT), Smart Data Access (SDA), Sybase Replication Server (SRS), Hana Cloud Integration - DS (HCI-DS),... to name the most important ones. You used Data Services for batch transformations of virtually any sources, SLT for realtime replication of a few supported databases with little to no transformations, HCI-DS when it comes to copying database tables into the cloud etc.
    With the Hana Smart Data Integration feature you get all in one package plus any combination.

     

    The user however has very simple requirements when it comes to data movement these days:

    • Support batch and realtime for all sources
    • Allow transformations on batch and realtime data
    • There should be no difference between loading local on-premise data and loading over the Internet into a cloud target other than the protocol being used
    • Provide one connectivity that supports all
    • Provide one UI that supports all

     

    The individual tools like Data Services do make sense still for all those cases the requirement matches the tool's sweet spot. For example a customer not running Hana or where Hana is just yet another database, such a user will prefer a best of breed standalone product like Data Services always. Customers requiring to merge two SAP ERP company codes will use SLT for that, it is built for this use case. All of these tools will continue to be enhanced as standalone products. In fact this is the larger and hence more important market! But to get data into Hana and to use the Hana options, that is when it becomes hard to argue why multiple external tools should be used, each with its own connectivity and capability.

     

    In addition to that the Hana SDI feature tries to bring the entire user experience and effectiveness to the next level, or lays the groundwork for that at least.

     

     

    Designing Transformations

     

    Let's start with a very simple dataflow, I want to read news from CNN, check if the text "SAP" is part of the news description and put the result into a target table. Using Hana Studio, we create a new Flowgraph Model repo object and I dragged in the source, a first simple transformation and the target table. Then everything is configured and can be executed. So far nothing special, you would do the same thing with any other ETL tool.

    RSSDataflow.png

    But now I want to deal with the changes. With any ETL tool in the market today, I would need to build another dataflow handling changes for the source table. Possibly even multiple in case deletes have to be processed differently. And how do I identify the changed data actually?

     

    RSSDataflow-RTFlag.png

    With Smart Data Integration all I do in above dataflow is to check the realtime flag, everything else happens automatically.

     

    How are changes detected? They are sent in realtime by the adapter.

    What logic needs to be applied on the change data in order to get it merged into the target table? The same way as the initial load did, considering the change type (insert/update/delete) and its impact on the target.

    The latter is very complex of course, but we when looking at what kind of dataflows the users have designed for that, we were able to come up with algorithms for each transformation.

     

    The complexity of what happens under the cover is quite huge, but that is the point. Why should I do that for each table when it can be automated for most cases? Even if it works for 70% of the cases only, that is already a huge time saver.

     

    Ain't that smart?

     

    The one thing we have not been able to implement in SP9 is joins, but that was just a matter of development time. The algorithms exists already and will be implemented next.

     

     

    Adapters

     

    How does Hana get the news information from CNN? Via a Java adapter. That is the second major enhancement we built for SP9. Every Java developer can now extend Hana by writing new Adapters with a few lines of code. The foundation of this feature is Hana Smart Data Access. With this you can create virtual tables, which are views on top of remote source tables and read data from there.

    For safety reasons these adapters do not run inside Hana but are hosted on one or many external computers running the Hana Data Provisioning Agent and the Adapters. This agent is a very small download from Service Market Place and can be located on any Windows/Linux computer. Since the agent talks to Hana via either TCP or https, the agent can even be installed inside the company network and loads into a Hana cloud instance!

    Using that agent and its hosted adapters Hana can browse all available source tables, well in case of a RSS feed there is just a single table per RSS provider, and a virtual table being created based on that table structure.

    Now that is a table just like any other, I can select from it using SQL, calculation views or whatever and will see the data as provided by the adapter. The user cannot see any difference to a native Hana table other than reading remote data will be slower than reading data from Hana.

    That covers the batch case and the initial load.

    For realtime Hana got extended to support a new SQL command "create remote subscription <name> using (<select from virtual table>) target <desired target>". As soon as such remote subscription got activated, the Adapter is asked to listen for changes in the source and send them as change rows to Hana for processing. The way RSS changes are received is by querying the URL frequently and push all found rows into to Hana. Other sources are might support streaming of data directly but that is up to the adapter developer. As seen from Hana the adapter provides change information in realtime, how the adapter does produce that we do not care.

     

     

    This concludes a first overview about Hana Smart Data Integration. In subsequent posts I will talk about the use cases this opens up, details of each component and the internals.

     

    There is also a video from the Hana Academy on youtube:

    Best Practice For Hana Performance Optimization (PART I):

    Hi Everyone, I am sharing my experience of working on Oracle to Hana Migration Project.Below are the few points about the performance optimization of the sap hana code. We can achieve better performance by considering these points while writing SQLScript Procedure.

     

    1.      Always select only the required column instead of selecting all the columns.


    Example: Suppose their are three table TABLE_A ,TABLE_B and TABLE_C with the below structure.


    TABLE_A Structure:


    NameAgeEmp_IdDepartmentSalary


    TABLE_B Structure:


     

    NameDepartmentJob_GradeCompany_NameCompany_Type

     

    TABLE_C Structure:


     

    DepartmentEmp_IdDesignationJob_Location


    Now suppose in your procedure you have to select only the Name,Salary and Designation from these three table based on the join condition and use them to populate the data into some target table TABLE_T.

     

    So,For the given Scenario you should not use the below SQL Statements if you are using this it will lead to performance degrade of the particular procedure.


                                                  INS_1.PNG

     

    If you are using query like above then you are selecting more column than required. So its always better to select only the required columns which will result in performance improvement of your SQL procedures.


                                                     ins_2.PNG

     

     

    2.  Always try to use "NOT EXISTS" and "EXISTS" keyword in your procedure instead of  "NOT IN" and "IN" because using the      "NOT IN"  or "IN" inside the procedure will slow down the procedure performance.

        

         Example: I want to delete all the records from COMPONENT_A where ENTERPRISE ,SITE and PRODUCTION ORDER is not      in HEADER_A.

        

         Using the Below Delete statement will slow down the performance.

                                   DE_1.png

        

         So ,Its always advisable to use the NOT EXISTS statements like below which will improve the performance.

                                      DE_2.png


    3.     Always try to avoid using HDBSEQUENCE in your procedure Becuase it will slow down your procedure performance.

        

         Example:- Suppose I have SALES table with below structure.

            

    ItemProduction_OrderSales_NameSales_OrganisationStatusScenario
    A_1             0
    B_2             0


    Now i want to select all the item from the sales table and add the suffix to all the item of sales table and scenario is one of the sales table column which value is constant.


    Solution:-     So first solution which will come to our mind is to create a hdbsequence  and concatenate that sequence to Item column of the SALES table.


    Steps are given as:

    I.     Create a HDBSEQUENCE.

              a.     Go to Project and follow the steps to created the sequence as below.

                       

                             seq_2.png          

    II.     Now using the sequence created we can write the procedure for our scenario.Please see the below procedure using the sequence.


                    SEQ_3.PNG


    So, My observation was when i tried calling this procedure it took around 1 minute to execute. So i tried below approach.

    If you have any column in your table which is constant through out you process then you should use row number function to achieve the same functionality. which will not affect the execution time at all. Like below.


                         SEQ_4.PNG


    So,When i executed the above procedure it took only few seconds.

    So if anyone have better idea of removing the sequence from hana procedure,Please share you thoughts.



     

    4.     Always try to take filtered data for join operations.

    Example:  In the below Hana Procedure I have used the table variable where we are storing the data from join of three table and their is calculation happening in the same join expression Because of which it takes more time to execute.


    CREATE PROCEDURE TEST_PROC

      LANGUAGE SQLSCRIPT

      SQL SECURITY INVOKER

      AS

    BEGIN

     

    JN_DATA  =      SELECT   T1.RUNTIME

                                             T2.ITEM,

                                             T3.LOCATION

                                FROM   DETAILS T1,

                                             ROUTING T2,

                                             RESOURCES T3

                                 WHERE T1.BOR= T2.BOR

                                 AND      T1.LOCATION = T2.LOCATION

                                 AND      T1.SCENARIO= T3.SCENARIO

                                 AND      T2.ITEM = T3.NAME

                                 AND     T1.BOR LIKE '%BOR_ALT%'

                                 AND     T2.BOS NOT LIKE '%_TMP_%'

                                 AND     T3.ITEM = 'N' OR ITEM IS NULL;

                            

                            

    INSERT INTO TABLE_COMPONENTS (SELECT * FROM :JN_DATA);

                                    

    END;


    In below procedure where i am taking the filtered data for join and it results in faster execution of the procedure.

     

    CREATE PROCEDURE TEST_PROC1

    LANGUAGE SQLSCRIPT

    SQL SECURITY INVOKER

    AS

    BEGIN

     

    EXP_DETAIL    = SELECT RUNTIME,

                                             LOCATION,

                                             SCENARIO,

                                              BOR

                               FROM     DETAILS

                               WHERE  BOR LIKE '%BOR_ALT%';

     

    EXP_ROUTING = SELECT   ITEM,

                                                LOCATION,

                                               BOR

                                FROM       ROUTING         

                                WHERE    BOS NOT LIKE '%_TMP_%';

     

     

    EXP_RESOURCES= SELECT  NAME,

                                                   RESOURCE,

                                                  SCENARIO

                                                  LOCATION

                                    FROM         RESOURCES

                                   WHERE     ITEM = 'N' OR ITEM IS NULL;

     

     

     

    JOIN_DATA   = SELECT     T1.RUNTIME

                                              T2.ITEM,

                                              T3.LOCATION

                                FROM    :EXP_DETAIL T1,

                                             :EXP_ROUTING T2,

                                             :EXP_RESOURCES T3

                                 WHERE  T1.BOR= T2.BOR

                                 AND       T1.LOCATION = T2.LOCATION

                                 AND       T1.SCENARIO= T3.SCENARIO

                                 AND       T2.ITEM = T3.NAME;

                        

    INSERT INTO TABLE_COMPONENTS (SELECT * FROM :JOIN_DATA);

                                    

    END;

     

    5.     Creating a read and write procedure is always better in terms of performance.So always try to create a read and write procedure to get the better performance.

          

           Example: Just for the example i am showing the procedure which takes more time when we use to read and write in the same procedure.

     

    CREATE PROCEDURE HISTORY_DATA

      LANGUAGE SQLSCRIPT

      SQL SECURITY INVOKER

      AS

    BEGIN

     

    DATA_1=(SELECT  SCENARIOID,

                                 LINENUM,

                                SITE,

                                NAME

                    FROM HISTORY);

    ********************************************************

    Many other Transaction on data not shown here

    ********************************************************

     

      INSERT INTO SHIPMENT_HISTORY

       (

        SCENARIOID,

        LINENUM,

        SITE,

        NAME

        )(SELECT * FROM :DATA_1);

     

      DATA_2=(SELECT      SCENARIOID,

                                        SHIPPED,

                                       DATESHIPPED,

                                      SOURCE,

                                      CREATEDDATE

                     FROM HISTORY);

     

    ********************************************************

    Many other Transaction on data not shown here

    ********************************************************

     

    INSERT INTO SHIPMENT_HISTORY

       (

        SCENARIOID,

        SHIPPED,

        DATESHIPPED,

        SOURCE,

        CREATEDDATE

        )(SELECT * FROM :DATA_2);

     

     

    END;

     

    So,the above procedure takes around 1:36 Minutes time when we run it that's the reason i have separated the procedure into read and write procedure.

     

    READ PROCEDURE:  The read procedure in hana does not allow any DML statements inside the procedure.So we will just read the data from the target tables after all the transactions and pass that data to the output parameter of the procedure ,Output parameter of the procedure can be a scalar variable or table variable.

     

    So below steps has to be followed to create the read and write procedure.


    STEP I- First create the HDBTABLETYPE of the same column which you are passing to the output parameter. And to Create the HDBTABLE first we have to declare the artifacts of different datatypes which we can use to create the table type. As shown in the below screen shot.

                   new_read.PNG

    STEP II- Now create the table type using these artefacts like below.      

                 READ_2.PNG


             

    STEP III- Create a read procedure and pass the data to output variable of above table type.

        

    CREATE PROCEDURE HISTORY_DATA_READ

                                                                               (OUT OUT_DATA_1 FULL_PATH_OF_HDBTYPE_HISTORY_1,

                                                                                OUT OUT_DATA_2 FULL_PATH_OF_HDBTYPE_HISTORY_2)

      LANGUAGE SQLSCRIPT

      READS SQL DATA

      SQL SECURITY INVOKER

      AS

    BEGIN

     

    ********************************************************

    Many other Transaction on data not shown here

    ********************************************************

    --final data to be sent to out parameter

     

    DATA_1=(SELECT  SCENARIOID,

                                  LINENUM,

                                  SITE,

                                  NAME

                    FROM HISTORY);

     

    ********************************************************

    Many other Transaction on data not shown here

    ********************************************************

    --final data to be sent to out parameter

     

      DATA_2=(SELECT    SCENARIOID,

                                       SHIPPED,

                                       DATESHIPPED,

                                      SOURCE,

                                      CREATEDDATE

                     FROM HISTORY);

     

    END;

                  

     

    WRITE PROCEDURE:- Now read procedure is created so we will create one procedure which will call the read procedure and we will read the data into another variables which we will use to insert into target tables.

     

    CREATE PROCEDURE HISTORY_DATA

      LANGUAGE SQLSCRIPT

      SQL SECURITY INVOKER

      AS

    BEGIN

     

    --call the read procedure to store the data into two table variables

    ***************************************************************************

     

    CALL HISTORY_DATA_READ (DATA_1_IN,DATA_2_IN);

     

    ***************************************************************************

      INSERT INTO SHIPMENT_HISTORY

       (

        SCENARIOID,

        LINENUM,

        SITE,

        NAME

        )(SELECT * FROM :DATA_1_IN);

     

     

    INSERT INTO SHIPMENT_HISTORY

       (

        SCENARIOID,

        SHIPPED,

        DATESHIPPED,

        SOURCE,

        CREATEDDATE

        )(SELECT * FROM :DATA_2_IN);

     

     

    END;

     

    So now after separating the procedures into read and write it took only 2.01 Seconds to execute.Conclusion is its always better to use read and write procedure.

     

     

    So,these are the some points from my work experience on Oracle to Hana Migration Project.

    Please share your thought about the post,Advise for further improvement is most welcome ..:)

    Happy Reading..:)

    Detect and Act: Insight from Event Streams

    SAP HANA smart data streaming allows you to extract insight from real-time information streams and respond immediately!

     

    SAP HANA smart data streaming lets you capture, analyze, and act on the continuous flow of new information that streams into your business, identify emerging threats and opportunities as they happen, and respond immediately. SAP HANA smart data streaming is a highly scalable event processor for the SAP HANA platform that enables you to capture and process streams of events from many sources in real-time to provide sub-second response to incoming data.

     

     

     

     

    Transform your Business: Manage all your data cost effectively with the performance you demand!

    SAP HANA dynamic tiering allows you to extend HANA memory with disk-backed column store technology.

    SAP HANA dynamic tiering is a highly scalable, integrated option that gives application developers centralized operational control to cost effectively manage very large data sets – terabytes to petabytes – allowing uses to characterize data into temperature tiers and move the data from one temperature tier to another within the same database. With SAP HANA dynamic tiering you can transform your business by managing all your data cost effectively with the performance you demand.

     

     

    For more information on the HANA SPS09 release read the blog by Mike Eacrett on “What’s New for HANA SPS09”

     

     

    a03.png

     

    SAP HANA Studio for MAC = eclipse + SAP plugins

     

    You have probably landed here because you could not find the HANA Studio installation files for the OS X platform which has mysteriously disappeared from the SAP HANA download website. Surprise, surprise...

     

    Until it gets listed again, another way to run HANA Studio on Mac is to install SAP HANA Tools. I tried it out myself and found some minor issues that I will highlight here. Installation should be a quick and simple experience. After all, HANA Studio is basically eclipse + some plugins (i.e. SAP HANA Tools).

     

    Below, I have listed the steps to follow in order to get HANA Studio up and running on Mac running Yosemite (OS X 10.10.1).

     

     

    Step 1: Prerequisites

     

    The JAVA runtime environment is required to run eclipse so you will have to install it first. The question is which one? On the SAP HANA Tools website, it is recommended to use JRE version 1.6+.  For Mac users, Apple supplies their own version of Java for Java versions 6 and below. If prompted by the OS, do not install JAVA for OS X (http://support.apple.com/kb/dl1572) as it is not compatible with eclipse Luna. Personally, I have a preference for using the latest version of JAVA (security) and so I will be installing JAVA SE 8u25 (I am going with the JDK since I need it for other software development purposes as well).

     

     


    b01.png

    Step 2: Installing eclipse


    The next step is to install the eclipse IDE.

     

    c01.png

     


    Step 3: Installing SAP HANA Tools

     

    Third and final step, you will need to install SAP HANA Tools in eclipse. This is where I encountered minor issues that were not expected...

     

    • Open Eclipse IDE
    • Select a workspace at the prompt
    • Click on Help > Install New Software

    d01.png

    • At this point, I got a Missing requirement AFL connector error while trying to install the HANA Tools. To resolve this,  I had to install the Eclipse SDK plugin.
    • In the dropdown menu, select "The Eclipse Project Updates" or add it if you do not see it "http://download.eclipse.org/eclipse/updates/4.4"
    • Make sure the checkbox is checked for "Contact all update sites..."
    • Select Eclipse SDK.
    • Click Next, agree to the license agreement then Finish

    e01.png

    • SAP Network users: you might encounter a Repository not found error like I did if you are on the SAP-Corporate network as access to the repository is blocked. Therefore you will have to switch to another network (i.e. SAP-Internet) and repeat the previous steps if needed.
    • If successful, you will be prompt to restart eclipse
    • Open eclipse again
    • Select a workspace at the prompt
    • Click on Help> Install New Software
    • Add the following repository: https://tools.hana.ondemand.com/luna
    • Make sure the checkbox is checked for "Contact all update sites..."
    • Select SAP HANA Tools. If you did not install the eclipse SDK plugin in the previous step, you might encounter Missing requirement AFL connector error here like I did.
    • Click Next, agree to the license agreement then Finish

    f01.png

     

    You will be prompted to re-start eclipse which means that the installation process is successfully completed.

     

    Now, you can simply connect to a HANA system and get started with the fun part of development.

    This blog is related to few of my experiences while working on a Project(Migrating from Oracle PL/SQL to HANA SQL Scripts). Some of the basic differences that I found between Oracle PL/SQL and HANA SQL Script are as follows:

     

    1. In HANA nvl() function does not exist. We have IFNULL() function to check if an attribute value is null or not.


    For example:

    In Oracle: select employee_name from employees where nvl(salary, 0) > 20000;

    While in HANA:  select employee_name from  employees where ifnull(salary, 0) > 20000;


    The above statements check if the salary is NULL or not, if the salary is null, it sets the salary to 0.


    2. Equivalent function for DECODE (In Oracle) is MAP in HANA. It works exactly the way DECODE works.


    In Oracle : SELECT DECODE (GENDER, ‘F’, ‘FEMALE’, ‘MALE’) “GENDER”, EMPLOYEE_NAME FROM EMPLOYEES;

    In HANA : SELECT MAP (GENDER, ‘F’, ‘FEMALE’, ‘MALE’) “GENDER”, EMPLOYEE_NAME FROM EMPLOYEES;


    The above statements return employee name and gender (FEMALE or MALE, based on Values F or M stored in column GENDER).


    3. Rounding Issues: In HANA, If a column’s datatype is decimal and its precision exceeds the allowed limit, then HANA doesn’t round it based on the next digit unlike Oracle.

         

    For example:

    TABLE NAME: PURCHASE_ORDERS

    COLUMNS: PURCHASEORDERID VARCHAR(20),

              ITEM VARCHAR(30),

              LOCATION VARCHAR(30),

              QUANTITY DECIMAL(15, 3)


    Suppose I want to insert the following row into PURCHASE_ORDERS table:


    INSERT INTO PURCHASE_ORDERS (PURCHASEORDERID, ITEM, LOCATION, QUANTITY) VALUES (‘234212’, ‘Q-123-VE21’, ‘IND’, ‘200.0009’);


    And when I check the Quantity of this record after inserting the record in HANA and Oracle, I get the following result:


    Oracle:

    Oracle.PNG

     

    HANA:

     

    HANA.PNG

    Oracle rounded the Quantity based on the next few digits(rounded 200.0009 to 200.001 as the precision limit is 3), while HANA did not round the Quantity based on next few digits.

     

    4. In Oracle a variable/column having data type as DATE can store timestamp also along with the date, but in HANA the DATE data type can have only date. If you want to have timestamp along with date in HANA, you can use Data type TIMESTAMP.

     

    5. The function CONCAT behaves differently in Oracle and HANA if one or more strings that is to be concatenated is NULL.

               

         For example:

         In Oracle: select concat('a', null) from dual;

                             The output will be 'a'.

              

         In HANA: select concat('a', null) from dummy;

                             The output will be null.

               

    6. Materialized view – Oracle uses materialized view to replicate data to non-master sites in a replication environment and to cache expensive queries in a data warehouse environment. While in HANA, we don’t have materialized view as materialized views are expensive and are not required in HANA.

     

    7. Regular Expression Functions: Regular expression functions in Oracle like REGEXP_SUBSTR, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_COUNT etc. are not yet implemented in HANA. You can write SQL script procedure/function to implement the same functionality.

     

    8. Rowid Psuedocolumn: The psuedocolun ROWID returns the address of the row in Oracle.


    For example:

    SELECT ROWID, EMPLOYEE_NAME FROM EMPLOYEES WHERE EMP_ID = 123;


    This statement returns the row address in Oracle.


    However, HANA doesn’t offer ROWID functionality. Altough there is "$rowid$" in HANA which can provide similar functionality. But it is purely internal and non-documented hence its use should be avoided.

     

    9. Whether it is Oracle or HANA, nesting of select queries should be avoided as it consumes a lot of memory and the performance degrades. Instead, break the query into smaller chunks, put them in variable and use those variables.

    Doing a startup is a risky business. According to a study by Mashable, 75% of all startups and 90% of all products fail. At SAP Startup Focus, we are in the business of helping startups increase those odds considerably. We do this by providing them an amazing software platform SAP HANA they can build their solutions on and by helping them with the go-to-market side of things. Currently we have over 1,700 startups that are building solutions on our HANA platform, from more than 55 countries, with 130 validated solutions available for sale to customers.

     

    As compared to a few years ago, it has indeed become a lot easier and a lot cheaper to build an application you can take to market, or shop around to investors. With widespread adoption of the Cloud and increasing usage of Mobile as a consumption mechanism, even scrappy startups can now develop sophisticated products without getting a 2nd mortgage on the house.

     

    And that is where our very valued partners at SUSE come in. They understand unequivocally that for startups to succeed in a brutally competitive global environment, they not only need access to cutting edge technology like SAP HANA but also need it in a manner that is uniquely suited to their geographic and financial situation. For e.g. in several emerging markets access to commercial-grade bandwidth remains an ongoing issue which means that developing on a platform that is only available in the Cloud remains a logistical challenge.

     

    Hence, we are very proud to announce that starting right now qualified startups in the SAP Startup Focus program will be eligible to receive for a 6-month period a comprehensive developer version of SAP HANA on SUSE Enterprise Linux Server as a single, downloadable Virtual Machine (VM). A single VM reduces the barriers to adoption for the SAP HANA platform, and will allow for startups to quickly evaluate and rapidly prototype solutions while running a developer version of SAP HANA on their local machine.  This will significantly reduce cost, and increase the agility of a startup in the initial product development phase of their lifecycle.

     

    Additionally, startups will also receive free training via SUSE’s Certified Linux Administrator (CLA) Academy, a $2,375 USD value. Pre-sales electronic support and membership in SUSE’s partner program operated by PartnerNet®, which provides access to additional SUSE software, is also included.

     

    To download the VM, please visit the SAP HANA Marketplace, the destination for startups to discover, try and buy the SAP HANA platform as well as applications based on SAP HANA, located at http://marketplace.saphana.com/p/3377

    An tablet application which will be in contact with the main HANA server and give real-time updates (charts + notifications) to the business owner/representative or concerned party so that the response time for the situation can be minimized & real-time analysis can be viewed.

     

    Platforms:

    iOS

    Android

    Windows

     

    What do you think? Please add in the comments.

    Idea Link - SAP HANA mobility : View Idea

    ELIMINATING CURSORS IN HANA

     

    INTRODUCTION:

    Use of cursors in HANA degrades the performance a lot as it performs operation on single record at a time due to which the read and write on table happens more number of times. An alternate to this is to fetch all the records at one stretch and store it in a temporary table type and perform the calculation together. I would like to share a few complex use cases where we eliminated the use of cursors.

     

    USE CASE 1:Reads data by fetching the most relevant record and does update of the most relevant record based on the condition

     

    PROBLEM STATEMENT: We need to select candidates for air Force. There are 2 levels of exams where a single candidate has 2 attempts to clear each level.

    Conditions are:

         1.  There are 3 measure based on which a candidate is tagged ideal for that level

              Level 1: Marks  >= 70, Age >= 25, weight >= 70

              Level 2: Mark >= 90, Age >= 30, weight >= 75 

              The order of importance of the measures is (in descending order):

        1. Marks
        2. Age
        3. Weight

         2.   If the candidate has passed level 2 then his job title should be updated as ‘Pilot’ and should not be considered in level 1 selection. And if he has not           passed then check if he has passed in level 1, if so then update his job title as ‘Co-Pilot’.

     

    PERFORMANCE IMPROVEMENTS:

     

    Before :

    performance_3.png

    After:

     

    performance_1.png

     

    Table :1 MEASURE  [row-count :1,00,000 approx.]

    ROLL_NO

    NAME

    WEIGHT

    AGE

    MARKS

    CATEGORY

    LEVEL

    2001

    Vignesh

    75

    30

    60

    B

    2

    2002

    Prachi

    75

    30

    90

    B

    2

    1001

    Vignesh

    70

    25

    70

    A

    1

    1001

    Srivatsan

    70

    25

    80

    A

    1

     

     

     

    Table 2 : IDEAL

    CATEGORY

    LEVEL

    WEIGHT

    AGE

    MARKS

    B

    2

    75

    30

    90

    A

    1

    70

    25

    70

     

    Table 3 : ELIGIBILITY

    NAME

    ELIGIBLE

    Prachi

    Y

    Vignesh

    Y

    Srivatsan

    Y

     

    Table 4: JOB_TABLE[row-count :1,00,000 approx.]

    NAME

    JOB_TITLE

    Prachi

    Pilot

    Vignesh

    Co-Pilot

    Srivatsan

    Co-Pilot

     

     

    Bottle necks:

    • If the candidate has already been tagged as ‘Pilot’ in category B then he shouldn’t be considered for evaluation in Level 1 which can be done using a NOT IN check but it again degrades the performance.
    • The data should come in the sequence with the highest relevant score of the candidate getting updated for that particular level.    

     

    ------CURSOR ELIMINATED     

    -- TEMPORARY TABLE VARIABLE THAT HAS ALL THE ELIGIBLE CANDIDATES FOR LEVEL 1 AND LEVEL 2

    blog_code5.PNG

    -- TEMPORARY TABLE VARIABLE THAT HAS ALL THE ELIGIBLE CANDIDATES FOR LEVEL 2

      blog_code6.PNG


    -- WE DO A OUTER JOIN TO AVOID SELECTION OF CANDIDATE WHO CLEARED LEVEL 2

    blog_code7.PNG


    -- SELECT ONLY THE RECORDS WHERE 'DEL' FLAG IS NULL SO THAT WE SELECT THE CANDIDATES WHO HAS NOT CLEARED LEVEL 2 BUT ELIGIBLE FOR EVALUATION IN LEVEL 1


      blog_code8.PNG


    CONCLUSION:

    • Assign cursor s to table variables and then process the logic of the cursor at once.
    • Cartesian product as a result of inner Join can be used to get all the records that are processed through nested loop.
    • We can eliminate NOT IN conditions by setting a ‘DEL’ flag in table A and then do a left outer join with table B and then select the only records having flag as NULL which ensures the records are not selected again in table B this method improves the performance drastically when you are performing a NOT IN condition on a large data set.

    Wrap-up

    Hello and welcome back to the last blog post of movie sentiment rating series. Now it's time to recap what we have done and discuss some future work. First of all I wanna share the series and project with you as I promised in the first blog. Here you go.

     

    Series

    Real-time sentiment rating of movies on SAP HANA (part 1) - create schema and tables

    Real-time sentiment rating of movies on SAP HANA (part 2) - data preparation

    Real-time sentiment rating of movies on SAP HANA (part 3) - text analysis and modeling

    Real-time sentiment rating of movies on SAP HANA (part 4) - SAP HANA info access

    Real-time sentiment rating of movies on SAP HANA (part 5) - OData services and SAPUI5

    Real-time sentiment rating of movies on SAP HANA (part 6) - wrap-up and looking ahead

     

    Project

    linkinchow/movie · GitHub

     

    After you import the XS project, the project structure should look similar with mine. For better understanding, I made some notes on the structure and you can find these design-time objects in corresponding blog posts easily.

     

    40_.PNG

     

    OK. Now let's take a look at what we've done. We've rebuilt the movie sentiment rating app using pure SAP HANA XS and text analysis. And now we have two real-time apps using the same dataset, one for desktop based on SINA and the other for mobile using sap.m. During the implementation, we've used some new XS features released in recent SPSs such as CDS and outbound connectivity in SPS06, job scheduling in SPS07. These features did not exist when I built the first version of this smart app in May 2013. With the rapid development of SAP HANA XS, we can always improve our SAP HANA native apps. Then let's have a quick look at these two apps respectively.

     

    Movie sentiment rating based on SINA

    Major features:

    - Search everything in real-time. With the full text index, you can search movie titles, release dates, studios, sentiments, tokens and so on. For example, you can search tweets about a specific movie in this evening from iPhone.

     

    - Analyze everything in real-time. You can analyze how many people like/dislike a certain movie or the overall sentiment of movies in a certain studio. You can even track the sentiment trending. There are lots of things you can analyze.

     

     

    Movie sentiment rating using sap.m

    Major features:

    - For movie goers, you can see the sentiment rating of new release movies based on tweets analysis and select your favorite movie in real-time. From # of mentions you can also infer the popularity. And you can even select the release date range to check more.

     

    - You can jump into details to see what people are now talking about this movie.

     

     

    Compared with Rotten Tomatoes?

    Out of curiosity I compared the rating of top 10 popular new release movies this week in my app with Rotten Tomatoes. The following result is based on 2014-11-23 04:00:00 UTC.

     

    Movie titleMovie sentiment ratingRotten Tomatoes
    The Hunger Games: Mockingjay - Part 17.88.2
    Happy Valley7.57.1
    Food Chains8.410
    V/H/S: Viral5.31.8
    The Mule8.59.1
    A Girl Walks Home Alone at Night9.48.2
    Pulp: a Film About Life, Death &amp; Supermarkets7.38
    Reach Me77.6
    All Relative8.89.2
    Bad Hair6.47.4

     

    Real real-time?

    I watched The Hunger Games: Mockingjay - Part 1 today and have just posted a tweet. You see both apps show my tweet immediately since with the combination of XSJS outbound connectivity and job scheduling, we can crawl tweets in real-time!

     

    41_.png

     

    42_.png

     

    Looking ahead

    Now let's jump into the looking ahead part and discuss some future work about the movie sentiment rating app. There are still lots of things we can improve for this app. I just listed several points as follows.

     

    1. Genres, cast and directors

    Do you remember in the first blog, besides the basic movie metadata, we also have three additional tables "Genres", "AbridgedCast" and "AbridgedDirectors"? Yeah, in the second blog, we searched and inserted data into these tables as well. So why not display genres, cast and directors in the UI? In order to show this info, we can Create Hierarchies in an Attribute View.

     

    2. Geospatial analysis

    The "Tweets" table has two columns which we did not use in our app, i.e., longitude and latitude. Actually we search the location info of tweets in our smart app and if the user provides the info we will store it. You can find the logic in the second blog. With the location info of tweets, we can make some interesting geospatial analysis, e.g., we can analyze the sentiment of movies in each state or for a certain movie/genre, we can compare the sentiment between eastern US and western US. Moreover, if we have age and gender info, we can analyze something more interesting.

     

    3. Search more tweets for popular movies

    Because of API Rate Limits | Twitter Developers, it's impossible for us to search all related tweets. So, the current approach is searching tweets about new release movies every two minutes. There is no difference between movies and each movie is equal to handle. Now the problem comes, for popular movies maybe there are thousands of tweets in just two minutes which we cannot get them once and we will miss some. On the other hand, for some unpopular movies, maybe there is no related tweet in an hour, so we do not need to search them so frequently. It's a waste if we still search them every two minutes. I think it would be better if we can create a dynamic mechanism which can search tweets for popular movies more frequently and tweets for unpopular movies less frequently.

     

    4. Average sentiment of tweets

    Currently if a tweet contains several sentiments, they will be regarded as several mentions. So, sometimes you will see several consequent mentions with the same username and the same time. Actually they belong to the same tweet. In order to avoid this, we can create an average sentiment for each tweet, something like 1 positive + 1 negative = 1 neutral...

     

    5. Local time

    Now the movie app uses UTC time everywhere, but it's not a good user experience, especially for mobile app. You are more willing to use your local time. So, this is also a feature which we can improve.

     

    Further resources about the movie app

     

    Blogs

    Real-time sentiment rating of movies on SAP HANA One

    #movienight on SAP HANA One

     

    Videos

     

     

     

    Webinars

    Big Data - Sentiment analysis of movies on SAP HANA One | BrightTALK

    Movie Sentiment Rating App Powered by Text Analysis on HANA One | BrightTALK

     

    That's it! Hope you enjoyed the movie sentiment rating app the same way we did. Why not use it to pick up your favorite movie? Have fun!

    Hi,

     

    I have submitted an idea on SAP Idea Incubator on custom Stock In Transit Report :

     

    The main idea behind the report is to help the purchasing managers get a detailed view of the status of stock in transit in a way which will be useful to them. The current SAP function MB5T does not provide key information to buyers as they need it. The report will help the buyers know exactly in what status their order is currently. There can be more functionality built in to the report.

     

    If some one is interested in developing this idea, I will be more than happy to provide him with more inputs.

     

    Here is the link with more details of the idea.

     

    https://ideas.sap.com/SAPHANAIdeaIncubator/stock-in-transit-report-for-intercompany?current_tab=Recent&row_num=1&getparameters=1

     

    Eagerly awaiting your feedback.

     

    Regards,

    Amit

    Hello All,

     

    In this blog, I have explained about how to insert records into SAP HANA from a reporting tool like Web Intelligence. This concept can be used in most reporting tools like Dashboards, Lumira & Design Studio to insert data into SAP HANA.

     

    Please find below steps to acheive this.

     

     

    1. Create table in sap HANA

     

    Create column table with name T_B_SALES with below structure in SAP HANA.

    pic1.png

     

     

    2. Create SAP HANA Repository

     

     

    Goto Window -> Show View from Top Menu Bar

     

    Select SAP HANA Repositories.

     

    pic2.png

     

     

     

    Right click on the Repository Area & create new repository.

     

    pic3.png

     

     

    Give Repository Name and click finish.

     

    pic4.png

     

    You will see the Repository successfully created in SAP HANA.

    pic5.png

     

     

      3. reate XS Application

     

    Go to Window -> Show View from Top Menu Bar and select Project Explorer.

    pic6.png

     

     

    4. In Project Explorer Create New XS Project

     

     

    Right click in Project Explorer Area New-> Project ->SAP HANA Development->XS Project.

    pic7.png

     

    Give Project Name & Click on Finish. Now you will see a XS Project folder created in Project Explorer.

    pic8.png

     

    5. Create .xsapp & .xsaccess files in XS Project

     

    Right Click on XS Project folder and select ‘file’ option.

    pic9.png

     

     

      Give file name as .xsapp. Keep this file blank. This file is for XS engine to understand there is an XS application in this package.

    pic10.png

     

    Similarly create .xsaccess file & write below code in it.

    pic11.png

     

     

     

    Right Click on our XS Project XS_BOOK_STORE and create XS JavaScript Source File.

     

    pic12.png

     

     

     

    Name it as “insert_purchase_record.xsjs” and write below code in it.

      var cid = $.request.parameters.get('cid');

      var bname = $.request.parameters.get('bname');

    var no_of_copies = $.request.parameters.get('no_of_copies');   
    $.response.headers.set(
    'access-control-allow-origin','*');

      var conn = $.db.getConnection();

       var pstmt = conn.prepareStatement("INSERT INTO \"UNAME\".\"T_B_SALES\"
    values ('"
    +cid+"','"+bname+"',current_date,"+no_of_copies+")");

    var rs = pstmt.execute();

       $.response.setBody( "Entry has been created Successfully.");

      }

     

      conn.commit(); 

      conn.close();

     

    Note: UNAME is username in SAP HANA

     

     

    Share Project

    Right Click On Project and goto Team-> Share Project

    Once your Project is shared small Yellow cylinder will appear at Project Logo.

    pic12_1.png

     

     

    Now Select all the 3 files (.xsapp,.xsaccess and insert_purchase_record.xsjs) and Activate the files.

    pic15.png

     

     

     

    Then all files will show the small Yellow cylinder at file logo after successful activation of files.

    pic14.png

     

     

    6. Create a WEBI report which will write data back to SAP HANA.

    Create WEBI report with No data Source. As we do not want to see any data, and need to insert data into HANA DB.

     

    Create 3 Variables in BO as below

    pic15.png

     

     

     

    Define 3 input control Text Boxes for each of variables.

    pic16.png

    pic17.png

     

     

     

    Create Report with above variables as below. Try to articulate it to appeal to users.

    pic18.png

     

     

     

    Do not forget to select cell property of insert cell to Hyperlink, as below.

    pic19.png

     

     

    Now, right click On the “Insert” Text, Goto Linking->Hyperlink and put below code in window & parse it.

     

    http://<HANA_SERVER>:<PORT>/XS_BOOK_STORE/insert_purchase_record.xsjs?cid=B0000001&bname=INDIA&no_of_copies=1 

     

      PORT: Mostly = 8000 (for HANA XS ENGINE)

    pic20.png

     

     

    Click on parse. Below window will now appear,

     

    pic21.png

     

     

    Now assign the BOBJ variables to URL parameters, as below.

     

    pic22.png

     

     

    7. Execute the BO Report

     

    Now let’s test our XS application..!

     

    Click on insert, this will take you to the browser and will prompt you to enter HANA password.

    pic23.png

     

      After entering the password and clicking OK it you show you following window.

    pic24.png

     

     

     

    Now Check HANA table T_B_SALES. You will see one entry created in table..!!

     

    pic25.png

     

     

    You can now also try to update and delete the records in HANA DB. You can use the same table, on which we are performing DML operations, as the BOBJ report source and can see the records gets change at WEBI Report level at runtime.

     

    Hope you find this worth of reading..!! Thanks

    Actions

    Filter Blog

    By author:
    By date:
    By tag: