This blog is added into SAP HANA Cookbook for MySQL Developers | SAP HANA
I am an intern visiting Palo Alto from SAP’s Shanghai office for a month-long project. It’s my first trip to the bay area so I am soaking up all the sun and all the excitement here. Last weekend, I found myself wanting to watch a movie. I searched the internet and found all the new releases listed on rottentomatoes and imdb but it was hard to pick one. I wanted to get a pulse of the movie before I watch it not from the critics but actual movie goers like me. Also, I wanted one which had high buzz not only in US but also in China. So I decided, why don’t I build one myself, after all I am in the heart of Silicon Valley.
I decided to pick SAP HANA One to power my app not just because I got the db & application server in the cloud but also because the platform would support sentiment analysis for English & Simplified Chinese right out-of-the-box! I used the Rotten Tomatoes API to find newly released movies and twitter & Sina Weibo APIs for sentiment for US & China respectively.
Before we start to build the application, we need to get SAP HANA.
If you are interested in using this multilingual sentiment analysis for production or commercial use, you can spin up SAP HANA One.You can find SAP HANA One on AWS Marketplace
If you are interested in just trying out SAP HANA, go with the SAP HANA developer edition. You can follow the steps in How to create a SAP HANA Developer Edition in the cloud
I used SPS6 prerelease version for sentiment analysis in Chinese.
I did most of my work in the HANA Studio which is based on the eclipse IDE so very familiar for Java and other open-source developers.
First, I created a schema and full text index for all the movie metadata, including title, rating, running time, release data, synopsis, etc. Then I used the JTomato (https://github.com/geeordanoh/JTomato) to populate the table. You can find my code via https://github.com/linkinchow/crawler.
MOVIE: Stores movie metadata, including the title, rating, runtime, release date, etc.
Then I used Twitter4J (http://twitter4j.org) to search the movie keywords on Twitter. I found that twitter, given just the keyword, did a good job pulling all combinations of the movie name: fast and furious, fast & furious. You can find my code via https://github.com/linkinchow/crawler.
TWEET: Stores crawled tweets from Twitter, including ID, time, location, content, etc.
However, I ran into problems while crawling Sina Weibo because they have a strict process for usage of their API. So I decided to use Tencent Weibo instead.
TWEET_ZH: Stores crawled tweets from Tencent Weibo
Next I created a fulltext index and sentiment tables (called VoiceOfCustomer) using the following SQL. Voila! I now have sentiment analysis for all twitter and tencent weibo data!
CREATE FULLTEXT INDEX TWEET_I ON TWEET (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION ('EN') TEXT ANALYSIS ON;
CREATE FULLTEXT INDEX TWEET_ZH_I ON TWEET_ZH (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION ('ZH') TEXT ANALYSIS ON;
TWEET_I: Used to perform sentiment analysis for the table TWEET
TWEET_ZH_I: Used to perform sentiment analysis for the table TWEET_ZH
In addition to the tables in SAP HANA and the full text index to perform sentiment analysis, I also wrote stored procedures to wrap complex SQL making it easy for XS (HANA’s application server) to consume.
The final architecture looks like this:
Now, I had to create a formula to quantify rating. I used a very simple formula for this:
Score = (# of strong positive sentiment * 5 + # of weak positive sentiment * 4 + # of neutral sentiment * 3 + # of weak negative sentiment * 2 + # of strong negative sentiment *1) / # of total sentiments
This score would be helpful to rank movies so I could easily pick the top one.
Additionally, I showed a distribution of the sentiments, positive vs. negative vs. neutral, so I could better understand how strong or weak people’s opinion was on the movie both in US & in China.
The application should be built on XS Engine to prevent data transfer latency between the database and the web application server so users can access the website directly. The application was built in the following steps:
Step 1: Create stored procedures for rating and sentiment analysis
Currently, there are two stored procedures in the app. One is for rating and the other is for sentiment analysis:
1. Rating
We can use the following SQLs to create the type and the stored procedure:
CREATE TYPE MOVIEINFO AS TABLE (
POSTER NVARCHAR(100),
TITLE NVARCHAR(100),
RATING DECIMAL(5, 2),
NUM INTEGER,
TITLE_ZH NVARCHAR(100),
RATING_ZH DECIMAL(5, 2),
NUM_ZH INTEGER,
YEAR INTEGER,
MPAA_RATING NVARCHAR(100),
RUNTIME NVARCHAR(100),
CRITICS_CONSENSUS NVARCHAR(2000),
RELEASE_DATE DATE,
SYNOPSIS NVARCHAR(2000),
ID INTEGER
);
CREATE PROCEDURE GETMOVIEINFO(OUT RESULT MOVIEINFO) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
RESULT =
SELECT A.POSTER, A.TITLE, B.RATING, B.NUM, A.TITLE_ZH, C.RATING_ZH, C.NUM_ZH, A.YEAR, A.MPAA_RATING, A.RUNTIME, A.CRITICS_CONSENSUS, A.RELEASE_DATE, A.SYNOPSIS, A.ID
FROM MOVIE A
INNER JOIN
(SELECT ID, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) END AS RATING, SUM(NUM) AS NUM FROM
(SELECT
COUNT(C.TA_TYPE) AS NUM,
CASE C.TA_TYPE
WHEN 'StrongPositiveSentiment' THEN COUNT(C.TA_TYPE) * 5
WHEN 'WeakPositiveSentiment' THEN COUNT(C.TA_TYPE) * 4
WHEN 'NeutralSentiment' THEN COUNT(C.TA_TYPE) * 3
WHEN 'WeakNegativeSentiment' THEN COUNT(C.TA_TYPE) * 2
WHEN 'StrongNegativeSentiment' THEN COUNT(C.TA_TYPE) * 1
END AS TOTAL
FROM MOVIE A
LEFT JOIN TWEET B
ON A.ID = B.MOVIEID
LEFT JOIN "$TA_TWEET_I" C
ON B.ID = C.ID AND C.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')
GROUP BY
GROUP BY ID) B ON A.ID = B.ID
INNER JOIN
(SELECT ID, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) END AS RATING_ZH, SUM(NUM) AS NUM_ZH FROM
(SELECT
COUNT(C.TA_TYPE) AS NUM,
CASE C.TA_TYPE
WHEN 'StrongPositiveSentiment' THEN COUNT(C.TA_TYPE) * 5
WHEN 'WeakPositiveSentiment' THEN COUNT(C.TA_TYPE) * 4
WHEN 'NeutralSentiment' THEN COUNT(C.TA_TYPE) * 3
WHEN 'WeakNegativeSentiment' THEN COUNT(C.TA_TYPE) * 2
WHEN 'StrongNegativeSentiment' THEN COUNT(C.TA_TYPE) * 1
END AS TOTAL
FROM MOVIE A
LEFT JOIN TWEET_ZH B
ON A.ID = B.MOVIEID
LEFT JOIN "$TA_TWEET_ZH_I" C
ON B.ID = C.ID AND C.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')
GROUP BY
GROUP BY ID) C ON A.ID = C.ID
ORDER BY B.RATING DESC
;
END;
After creating the type and the stored procedure successfully, we can use the following SQL to test:
CALL GETMOVIEINFO(?);
From the column “RATING” and “RATING_ZH”, we can show the score on the main page.
2. Sentiment analysis
We can use the following SQLs to create the type and the stored procedure:
CREATE TYPE SENTIMENT AS TABLE (SENTIMENT NVARCHAR(100), NUM INTEGER);
CREATE PROCEDURE GETSENTIMENT(IN ID INTEGER, IN LANG VARCHAR(2), OUT RESULT SENTIMENT) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
IF LANG = 'EN' THEN
RESULT = SELECT 'Strong Positive' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'StrongPositiveSentiment'
UNION ALL
SELECT 'Weak Positive' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'WeakPositiveSentiment'
UNION ALL
SELECT 'Neutral' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'NeutralSentiment'
UNION ALL
SELECT 'Weak Negative' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'WeakNegativeSentiment'
UNION ALL
SELECT 'Strong Negative' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_I" A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'StrongNegativeSentiment';
ELSEIF LANG = 'ZH' THEN
RESULT = SELECT '很好' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'StrongPositiveSentiment'
UNION ALL
SELECT '好' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'WeakPositiveSentiment'
UNION ALL
SELECT '一般' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'NeutralSentiment'
UNION ALL
SELECT '差' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'WeakNegativeSentiment'
UNION ALL
SELECT '很差' AS SENTIMENT, COUNT(*) AS NUM FROM "$TA_TWEET_ZH_I" A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'StrongNegativeSentiment';
END IF;
END;
After creating the type and the stored procedure successfully, we can use the following SQLs to test:
CALL GETSENTIMENT(771313125, 'EN', ?);
CALL GETSENTIMENT(771313125, 'ZH', ?);
Step 2: Build the application based on XS Engine
Till now, we can access the tables, indexes, data and stored procedures directly from the XS Engine. To build the application, follow the following steps:
1. Create .xsaccess, .xsapp and .xsprivileges to do the access control.
2. Create getMovies.xsjs to call the stored procedure “GETMOVIEINFO”
function createEntry(rs) {
return {
"poster" : rs.getNString(1),
"title" : rs.getNString(2),
"rating": rs.getDecimal(3),
"num": rs.getInteger(4),
"title_zh" : rs.getNString(5),
"rating_zh": rs.getDecimal(6),
"num_zh": rs.getInteger(7),
"year": rs.getInteger(8),
"mpaa_rating": rs.getNString(9),
"runtime": rs.getNString(10),
"critics_consensus": rs.getNString(11),
"release_date": rs.getDate(12),
"synopsis": rs.getNString(13),
"id": rs.getInteger(14)
};
}
try {
var body = '';
var list = [];
var query = "{CALL SMARTAPP.GETMOVIEINFO(?)}";
$.trace.debug(query);
var conn = $.db.getConnection();
var pcall = conn.prepareCall(query);
pcall.execute();
var rs = pcall.getResultSet();
while (rs.next()) {
list.push(createEntry(rs));
}
rs.close();
pcall.close();
conn.close();
body = JSON.stringify({
"entries" : list
});
$.response.contentType = 'application/json; charset=UTF-8';
$.response.setBody(body);
$.response.status = $.net.http.OK;
} catch (e) {
$.response.status = $.net.http.INTERNAL_SERVER_ERROR;
$.response.setBody(e.message);
}
3. Create getSentiment.xsjs to call the stored procedure “GETSENTIMENT”
function createEntry(rs) {
return {
"sentiment" : rs.getString(1),
"num" : rs.getInteger(2)
};
}
try {
var id = parseInt($.request.parameters.get("id"));
var lang = $.request.parameters.get("lang");
var body = '';
var list = [];
var query = "{CALL SMARTAPP.GETSENTIMENT(?, ?, ?)}";
$.trace.debug(query);
var conn = $.db.getConnection();
var pcall = conn.prepareCall(query);
pcall.setInteger(1, id);
pcall.setString(2, lang);
pcall.execute();
var rs = pcall.getResultSet();
while (rs.next()) {
list.push(createEntry(rs));
}
rs.close();
pcall.close();
conn.close();
body = JSON.stringify({
"entries" : list
});
$.response.contentType = 'application/json; charset=UTF-8';
$.response.setBody(body);
$.response.status = $.net.http.OK;
} catch (e) {
$.response.status = $.net.http.INTERNAL_SERVER_ERROR;
$.response.setBody(e.message);
}
4. Create index.html and code the HTML part.
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Real-Time Movie Rating</title>
<script src="/sap/ui5/1/resources/sap-ui-core.js"
id="sap-ui-bootstrap"
data-sap-ui-libs="sap.ui.commons,sap.ui.ux3,sap.viz"
data-sap-ui-theme="sap_goldreflection">
</script>
<!-- add sap.ui.table,sap.ui.ux3 and/or other libraries to 'data-sap-ui-libs' if required -->
<script>
sap.ui.localResources("movieui");
var view = sap.ui.view({id:"idMovieMatrix1", viewName:"movieui.MovieMatrix", type:sap.ui.core.mvc.ViewType.JS});
view.placeAt("content");
</script>
</head>
<body class="sapUiBody" role="application">
<h1>Real-Time Movie Rating</h1>
<div id="content"></div>
</body>
</html>
5. Create some views and controllers to use native SAP UI 5 to accelerate building the application.
The live webapp is available at (http://107.20.137.184:8000/workshop/sessionx/00/ui/MovieUI/WebContent/index.html) but I bring down the AWS instance to reduce the billing cost. I have captured screenshots and a brief video if you find the server is down.
Real-time movie rating homepage
The following screenshot is the app’s main page. For each movie, there are two scores: the upper score is from Twitter and the lower score from Tencent Weibo.
I heard a lot of buzz about “Man of Steel” but it is currently ranked No. 7 so I was really curious. “Man of Steel” had a 3.72 rating but “20 Feet from Stardom” had a 4.54 rating. Interesting! Looking closer I discovered that this was because “20 Feet” had only 351 mentions but “Man of Steel” had more than 20K, meaning that a popular movie may not necessarily be the one with the highest score but could also be one which has the most buzz.
I then created a page with detailed breakdown of the sentiments of the Movie’s sentiments for both Twitter and Tencent Weibo. Looks like “Man of Steel” has a higher positive sentiment in China compared to the US. Well not surprising, we like superhero movies and Superman is our favorite.
Sentiment/Social Media | Tencent Weibo | |||
# | %age | # | %age | |
Strong Positive | 9,986 | 44% | 528 | 34% |
Weak Positive | 5,903 | 26% | 723 | 47% |
Neutral | 839 | 4% | 12 | 1% |
Weak Negative | 2,067 | 9% | 123 | 8% |
Strong Negative | 3,757 | 17% | 166 | 11% |
Let's see what the score on Rotten Tomatoes looks like. The critics have given it a meager 56% but 82% of the audience liked it. That number is compares well with 70% positive sentiment rating from my real-time rating app.
"20 Feet from stardom" has 97% rating from critics and 100% from the audience on rottentomatoes. So my real-time rating app was able to successfully identify this hit from social sentiment on twitter. Looks like the movie is a sleeper hit!
This application is just a prototype now and I hope to make more enhancements to the drill-down page. For the next version, I want to use the predictive libraries in SAP HANA to create a recommendation engine for movie based on a user’s interests, something like a “Pandora for movies”. Hope you enjoyed reading my blog.
We get requests to publish the potential multilingual sentiment analysis core service used in this movie rating demo as an API to support similar requests from other industries. SAP HANA SPS6 supports insert/update data through oData service. So, we want to provide a set of RESTful web service calls to help application developers to access this feature easily and fast based on SPS6. With this API, developer just needs to provide the search keyword to query the Twitter data. And the output will be the sentiment analysis result (positive/negative summary or sentiment trend) based on the automatically loaded Twitter data.
Do you think this API as a Service is useful? Please share your feedback in the comment!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
38 | |
19 | |
13 | |
13 | |
11 | |
10 | |
10 | |
10 | |
8 | |
8 |