Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

This blog is added into SAP HANA Cookbook for MySQL Developers | SAP HANA

I am an intern visiting Palo Alto from SAP’s Shanghai office for a month-long project. It’s my first trip to the bay area so I am soaking up all the sun and all the excitement here. Last weekend, I found myself wanting to watch a movie. I searched the internet and found all the new releases listed on rottentomatoes and imdb but it was hard to pick one. I wanted to get a pulse of the movie before I watch it not from the critics but actual movie goers like me. Also, I wanted one which had high buzz not only in US but also in China. So I decided, why don’t I build one myself, after all I am in the heart of Silicon Valley.

I decided to pick SAP HANA One to power my app not just because I got the db & application server in the cloud but also because the platform would support sentiment analysis for English & Simplified Chinese right out-of-the-box! I used the Rotten Tomatoes API to find newly released movies and twitter & Sina Weibo APIs for sentiment for US & China respectively.

Prerequisites

Before we start to build the application, we need to get SAP HANA.

If you are interested in using this multilingual sentiment analysis for production or commercial use, you can spin up SAP HANA One.You can find SAP HANA One on AWS Marketplace

If you are interested in just trying out SAP HANA, go with the SAP HANA developer edition. You can follow the steps in How to create a SAP HANA Developer Edition in the cloud

I used SPS6 prerelease version for sentiment analysis in Chinese.

Schema

I did most of my work in the HANA Studio which is based on the eclipse IDE so very familiar for Java and other open-source developers.

First, I created a schema and full text index for all the movie metadata, including title, rating, running time, release data, synopsis, etc. Then I used the JTomato (https://github.com/geeordanoh/JTomato) to populate the table. You can find my code via https://github.com/linkinchow/crawler.

MOVIE: Stores movie metadata, including the title, rating, runtime, release date, etc.

Then I used Twitter4J (http://twitter4j.org) to search the movie keywords on Twitter. I found that twitter, given just the keyword, did a good job pulling all combinations of the movie name: fast and furious, fast & furious. You can find my code via https://github.com/linkinchow/crawler.

TWEET: Stores crawled tweets from Twitter, including ID, time, location, content, etc.

However, I ran into problems while crawling Sina Weibo because they have a strict process for usage of their API. So I decided to use Tencent Weibo instead.

TWEET_ZH: Stores crawled tweets from Tencent Weibo

Next I created a fulltext index and sentiment tables (called VoiceOfCustomer) using the following SQL. Voila! I now have sentiment analysis for all twitter and tencent weibo data!

CREATE FULLTEXT INDEX TWEET_I ON TWEET (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION ('EN') TEXT ANALYSIS ON;


CREATE FULLTEXT INDEX TWEET_ZH_I ON TWEET_ZH (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION ('ZH') TEXT ANALYSIS ON;

TWEET_I: Used to perform sentiment analysis for the table TWEET

TWEET_ZH_I: Used to perform sentiment analysis for the table TWEET_ZH

In addition to the tables in SAP HANA and the full text index to perform sentiment analysis, I also wrote stored procedures to wrap complex SQL making it easy for XS (HANA’s application server) to consume.

Architecture

The final architecture looks like this:

Rating


Now, I had to create a formula to quantify rating. I used a very simple formula for this:

Score = (# of strong positive sentiment * 5 + # of weak positive sentiment * 4 + # of neutral sentiment * 3 + # of weak negative sentiment * 2 + # of strong negative sentiment *1) / # of total sentiments

This score would be helpful to rank movies so I could easily pick the top one.

Additionally, I showed a distribution of the sentiments, positive vs. negative vs. neutral, so I could better understand how strong or weak people’s opinion was on the movie both in US & in China.

XS Application

The application should be built on XS Engine to prevent data transfer latency between the database and the web application server so users can access the website directly. The application was built in the following steps:

Step 1: Create stored procedures for rating and sentiment analysis

Currently, there are two stored procedures in the app. One is for rating and the other is for sentiment analysis:

1. Rating

We can use the following SQLs to create the type and the stored procedure:

CREATE TYPE MOVIEINFO AS TABLE (

      POSTER NVARCHAR(100),

      TITLE NVARCHAR(100),

      RATING DECIMAL(5, 2),

      NUM INTEGER,

      TITLE_ZH NVARCHAR(100),

      RATING_ZH DECIMAL(5, 2),

      NUM_ZH INTEGER,

      YEAR INTEGER,

      MPAA_RATING NVARCHAR(100),

      RUNTIME NVARCHAR(100),

      CRITICS_CONSENSUS NVARCHAR(2000),

      RELEASE_DATE DATE,

      SYNOPSIS NVARCHAR(2000),

      ID INTEGER

);

CREATE PROCEDURE GETMOVIEINFO(OUT RESULT MOVIEINFO) LANGUAGE SQLSCRIPT READS SQL DATA AS

BEGIN

RESULT =

SELECT A.POSTER, A.TITLE, B.RATING, B.NUM, A.TITLE_ZH, C.RATING_ZH, C.NUM_ZH, A.YEAR, A.MPAA_RATING, A.RUNTIME, A.CRITICS_CONSENSUS, A.RELEASE_DATE, A.SYNOPSIS, A.ID

FROM MOVIE A

INNER JOIN

(SELECT ID, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) END AS RATING, SUM(NUM) AS NUM FROM

(SELECT

  1. A.ID,
  2. C.TA_TYPE,

COUNT(C.TA_TYPE) AS NUM,

CASE C.TA_TYPE

      WHEN 'StrongPositiveSentiment' THEN COUNT(C.TA_TYPE) * 5

      WHEN 'WeakPositiveSentiment' THEN COUNT(C.TA_TYPE) * 4

      WHEN 'NeutralSentiment' THEN COUNT(C.TA_TYPE) * 3

      WHEN 'WeakNegativeSentiment' THEN COUNT(C.TA_TYPE) * 2

      WHEN 'StrongNegativeSentiment' THEN COUNT(C.TA_TYPE) * 1

END AS TOTAL

FROM MOVIE A

LEFT JOIN TWEET B

ON A.ID = B.MOVIEID

LEFT JOIN "$TA_TWEET_I" C

ON B.ID = C.ID AND C.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')

GROUP BY

  1. A.ID,
  2. C.TA_TYPE) A

GROUP BY ID) B ON A.ID = B.ID

INNER JOIN

(SELECT ID, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) END AS RATING_ZH, SUM(NUM) AS NUM_ZH FROM

(SELECT

  1. A.ID,
  2. C.TA_TYPE,

COUNT(C.TA_TYPE) AS NUM,

CASE C.TA_TYPE

      WHEN 'StrongPositiveSentiment' THEN COUNT(C.TA_TYPE) * 5

      WHEN 'WeakPositiveSentiment' THEN COUNT(C.TA_TYPE) * 4

      WHEN 'NeutralSentiment' THEN COUNT(C.TA_TYPE) * 3

      WHEN 'WeakNegativeSentiment' THEN COUNT(C.TA_TYPE) * 2

      WHEN 'StrongNegativeSentiment' THEN COUNT(C.TA_TYPE) * 1

END AS TOTAL

FROM MOVIE A

LEFT JOIN TWEET_ZH B

ON A.ID = B.MOVIEID

LEFT JOIN "$TA_TWEET_ZH_I" C

ON B.ID = C.ID AND C.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')

GROUP BY

  1. A.ID,
  2. C.TA_TYPE) A

GROUP BY ID) C ON A.ID = C.ID

ORDER BY B.RATING DESC

;

END;


After creating the type and the stored procedure successfully, we can use the following SQL to test:

CALL GETMOVIEINFO(?);

From the column “RATING” and “RATING_ZH”, we can show the score on the main page.

2. Sentiment analysis

We can use the following SQLs to create the type and the stored procedure:

CREATE TYPE SENTIMENT AS TABLE (SENTIMENT NVARCHAR(100), NUM INTEGER);

CREATE PROCEDURE GETSENTIMENT(IN ID INTEGER, IN LANG VARCHAR(2), OUT RESULT SENTIMENT) LANGUAGE SQLSCRIPT READS SQL DATA AS

BEGIN

      IF LANG = 'EN' THEN

      RESULT = SELECT 'Strong Positive' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'StrongPositiveSentiment'

            UNION ALL

            SELECT 'Weak Positive' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'WeakPositiveSentiment'

            UNION ALL

            SELECT 'Neutral' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'NeutralSentiment'

            UNION ALL

            SELECT 'Weak Negative' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'WeakNegativeSentiment'

            UNION ALL

            SELECT 'Strong Negative' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'StrongNegativeSentiment';

      ELSEIF LANG = 'ZH' THEN

      RESULT = SELECT '很好' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'StrongPositiveSentiment'

            UNION ALL

            SELECT '' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'WeakPositiveSentiment'

            UNION ALL

            SELECT '一般' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'NeutralSentiment'

            UNION ALL

            SELECT '' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'WeakNegativeSentiment'

            UNION ALL

            SELECT '很差' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'StrongNegativeSentiment';

      END IF;

END;


After creating the type and the stored procedure successfully, we can use the following SQLs to test:

CALL GETSENTIMENT(771313125, 'EN', ?);

CALL GETSENTIMENT(771313125, 'ZH', ?);

Step 2: Build the application based on XS Engine

Till now, we can access the tables, indexes, data and stored procedures directly from the XS Engine. To build the application, follow the following steps:

1. Create .xsaccess, .xsapp and .xsprivileges to do the access control.


2. Create getMovies.xsjs to call the stored procedure “GETMOVIEINFO

function createEntry(rs) {

      return {

            "poster" : rs.getNString(1),

            "title" : rs.getNString(2),

            "rating": rs.getDecimal(3),

            "num": rs.getInteger(4),

            "title_zh" : rs.getNString(5),

            "rating_zh": rs.getDecimal(6),

            "num_zh": rs.getInteger(7),

            "year": rs.getInteger(8),

            "mpaa_rating": rs.getNString(9),

            "runtime": rs.getNString(10),

            "critics_consensus": rs.getNString(11),

            "release_date": rs.getDate(12),

            "synopsis": rs.getNString(13),

            "id": rs.getInteger(14)

      };

}

try {

      var body = '';

      var list = [];

      var query = "{CALL SMARTAPP.GETMOVIEINFO(?)}";

      $.trace.debug(query);

      var conn = $.db.getConnection();

      var pcall = conn.prepareCall(query);

      pcall.execute();

      var rs = pcall.getResultSet();

      while (rs.next()) {

            list.push(createEntry(rs));

      }

      rs.close();

      pcall.close();

      conn.close();

      body = JSON.stringify({

            "entries" : list

      });

      $.response.contentType = 'application/json; charset=UTF-8';

      $.response.setBody(body);

      $.response.status = $.net.http.OK;

} catch (e) {

      $.response.status = $.net.http.INTERNAL_SERVER_ERROR;

      $.response.setBody(e.message);

}

3. Create getSentiment.xsjs to call the stored procedure “GETSENTIMENT”

function createEntry(rs) {

      return {

            "sentiment" : rs.getString(1),

            "num" : rs.getInteger(2)

      };

}

try {

      var id = parseInt($.request.parameters.get("id"));

      var lang = $.request.parameters.get("lang");

      var body = '';

      var list = [];

      var query = "{CALL SMARTAPP.GETSENTIMENT(?, ?, ?)}";

      $.trace.debug(query);

      var conn = $.db.getConnection();

      var pcall = conn.prepareCall(query);

      pcall.setInteger(1, id);

      pcall.setString(2, lang);

      pcall.execute();

      var rs = pcall.getResultSet();

      while (rs.next()) {

            list.push(createEntry(rs));

      }

      rs.close();

      pcall.close();

      conn.close();

      body = JSON.stringify({

            "entries" : list

      });

      $.response.contentType = 'application/json; charset=UTF-8';

      $.response.setBody(body);

      $.response.status = $.net.http.OK;

} catch (e) {

      $.response.status = $.net.http.INTERNAL_SERVER_ERROR;

      $.response.setBody(e.message);

}

4. Create index.html and code the HTML part.

<!DOCTYPE HTML>

<html>

      <head>

            <meta http-equiv="X-UA-Compatible" content="IE=edge">

            <title>Real-Time Movie Rating</title>

            <script src="/sap/ui5/1/resources/sap-ui-core.js"

                        id="sap-ui-bootstrap"

                        data-sap-ui-libs="sap.ui.commons,sap.ui.ux3,sap.viz"

                        data-sap-ui-theme="sap_goldreflection">

            </script>

            <!-- add sap.ui.table,sap.ui.ux3 and/or other libraries to 'data-sap-ui-libs' if required -->

            <script>

                        sap.ui.localResources("movieui");

                        var view = sap.ui.view({id:"idMovieMatrix1", viewName:"movieui.MovieMatrix", type:sap.ui.core.mvc.ViewType.JS});

                        view.placeAt("content");

            </script>

      </head>

      <body class="sapUiBody" role="application">

            <h1>Real-Time Movie Rating</h1>

            <div id="content"></div>

      </body>

</html>

5. Create some views and controllers to use native SAP UI 5 to accelerate building the application.

Website

The live webapp is available at (http://107.20.137.184:8000/workshop/sessionx/00/ui/MovieUI/WebContent/index.html) but I bring down the AWS instance to reduce the billing cost. I have captured screenshots and a brief video if you find the server is down.

Real-time movie rating homepage

The following screenshot is the app’s main page. For each movie, there are two scores: the upper score is from Twitter and the lower score from Tencent Weibo.

I heard a lot of buzz about “Man of Steel” but it is currently ranked No. 7 so I was really curious. “Man of Steel” had a 3.72 rating but “20 Feet from Stardom” had a 4.54 rating. Interesting! Looking closer I discovered that this was because “20 Feet” had only 351 mentions but “Man of Steel” had more than 20K, meaning that a popular movie may not necessarily be the one with the highest score but could also be one which has the most buzz.     

I then created a page with detailed breakdown of the sentiments of the Movie’s sentiments for both Twitter and Tencent Weibo. Looks like “Man of Steel” has a higher positive sentiment in China compared to the US. Well not surprising, we like superhero movies and Superman is our favorite.

Sentiment/Social Media

Twitter

Tencent Weibo

#

%age

#

%age

Strong Positive

9,986

44%

528

34%

Weak Positive

5,903

26%

723

47%

Neutral

839

4%

12

1%

Weak Negative

2,067

9%

123

8%

Strong Negative

3,757

17%

166

11%

Let's see what the score on Rotten Tomatoes looks like. The critics have given it a meager 56% but 82% of the audience liked it. That number is compares well with 70% positive sentiment rating from my real-time rating app.

"20 Feet from stardom" has 97% rating from critics and 100% from the audience on rottentomatoes. So my real-time rating app was able to successfully identify this hit from social sentiment on twitter. Looks like the movie is a sleeper hit!

This application is just a prototype now and I hope to make more enhancements to the drill-down page. For the next version, I want to use the predictive libraries in SAP HANA to create a recommendation engine for movie based on a user’s interests, something like a “Pandora for movies”. Hope you enjoyed reading my blog.

What's next

We get requests to publish the potential multilingual sentiment analysis core service used in this movie rating demo as an API to support similar requests from other industries. SAP HANA SPS6 supports insert/update data through oData service. So, we want to provide a set of RESTful web service calls to help application developers to access this feature easily and fast based on SPS6. With this API, developer just needs to provide the search keyword to query the Twitter data. And the output will be the sentiment analysis result (positive/negative summary or sentiment trend) based on the automatically loaded Twitter data.

Do you think this API as a Service is useful? Please share your feedback in the comment!

28 Comments