Predicting My Next Twitter Follower with SAP HANA ...

Former Member · ‎09-02-2013

I am very lazy when it comes to social networks, I would love to have thousands of followers in Twitter, but I don’t have the will to tweet frequently enough to grow my number of followers. Regardless of that, I regularly check my twitter account expecting that magically some new follower comes my way, and when it does, I feel like I accomplished something. I know its silly, but I can’t help it. Anyway, I wonder if I could use the SAP HANA Predictive Analytics Library (PAL) to see who my next follower will be. SAP introduced many new features with the release of SPS06, and one of those is the Link Prediction Algorithm in PAL. Predicting links in social networks is not something new, it has been around for many years. This algorithm tries to answer the following question: Given a snapshot of a social network, can we predict which new interactions among its members are likely to occur in the near future? This is commonly known as the link prediction problem and there are multiple approaches based on measures for analyzing the “proximity” of the different nodes in a network. When we say social networks, we not only mean Twitter or Facebook, but it can also apply to, for example, employees in a company. This algorithm is also oftenly used in Fraud Prevention to detect missing nodes (fraudsters) in criminal networks.

Like I already said, there are multiple ways in which we can approach the link prediction problem, and specifically in PAL, there are 4 different methods implemented to compute the distance of any two existing nodes using existing links in a network:

Common Neighbours
Jaccard's Coefficient
Adamic/Adar
Katz

I’m not going to get into the details of how the different methods work, for that you can take a look at the PAL User Guide. Instead I’m going to get my hands on it ;).

I want to predict my next twitter follower, so the first thing I need to do is download data from twitter that I can use to train the algorithm. For that I’m going to use Python, more specifically, a Python library called Tweepy which is basically a wrapper around the Twitter API.

First we need to setup Python to be able to connect to HANA. If you don’t know how to do this, you can take a look at this wonderful post by Blag that shows how to do it.

Then, we need to download and install Tweepy (https://github.com/tweepy/tweepy)

Now that we are all set, we can start downloading data from Twitter. I’m going to create a Column Table in HANA to store the data.

CREATE COLUMN TABLE LINK_PREDICT( FOLLOWER INTEGER, FOLLOWING INTEGER );

First I’m going to download my Followers List by running the following Python Script. I don’t have a lot of followers so this will only take a couple of seconds.

import tweepy

import dbapi

consumer_key="..." #Your Consumer Key

consumer_secret="..." #Your Consumer Secret

access_token="..." #Your Access Token

access_token_secret="..." #Your Access Token Secret

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

con = dbapi.connect('hana_host', 30015, 'SYSTEM', 'password') 

cur = con.cursor()

for user in tweepy.Cursor(api.followers_ids, screen_name="LukiSpa").items():

    cur.execute("INSERT INTO LINK_PREDICT VALUES(?,?)", (user, 'userid')) #Save the content to the table. Replace userid with your Twitter User ID

Now, I would like to get the Followers of my Followers, for that I’m going to run the Python script below. Beware that Twitter limits the number of request you can make to the API, so to avoid exceeding that limit and getting an error message I’m waiting 60 seconds before making a new call to the API, that means that this code can run for quite a long time, so I would suggest running it over night.

import tweepy

import dbapi

import time

consumer_key="..."

consumer_secret="..."

access_token="..."

access_token_secret="...

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

con = dbapi.connect('hana_host', 30015, 'SYSTEM', 'password') 

cur = con.cursor()

query = "SELECT FOLLOWER FROM LINK_PREDICT"

ret = cur.execute(query)

ret = cur.fetchall()

for row in ret:

    ids = []

    for page in tweepy.Cursor(api.followers_ids, id=row[0]).pages():

        ids.extend(page)

        time.sleep(60)

    for user in ids:

       cur.execute("INSERT INTO LINK_PREDICT VALUES(?,?)", (user, row[0]))

And finally, I want to download my Followings plus the Followings of my Followers (besides me)

import tweepy

import dbapi

import time

consumer_key="..."

consumer_secret="...

access_token="..."

access_token_secret="..."

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

con = dbapi.connect('hana_host', 30015, 'SYSTEM', 'password') 

cur = con.cursor()

query = "SELECT DISTINCT FOLLOWING FROM LINK_PREDICT"

ret = cur.execute(query)

ret = cur.fetchall()

for row in ret:

    ids = []

    for page in tweepy.Cursor(api.friends_ids, id=row[0]).pages():

        ids.extend(page)

        time.sleep(60)

    for user in ids:

        cur.execute("INSERT INTO LINK_PREDICT VALUES(?,?)", (row[0], user))

Now I’m ready to run the Link Prediction Algorithm. I wanted to run it using the AFM (Application Function Modeler), but for some reason this algorithm is not available in the tools palette, not sure if this is a bug or something wrong with my PAL implementation (any comments here will be much appreciated), so I will need to do it the old way.

First I create the procedure by calling AFL Wrapper Generator

SET SCHEMA MYSCHEMA;

DROP TYPE PAL_LP_DATA_T;

CREATE TYPE PAL_LP_DATA_T AS TABLE("FOLLOWER" INTEGER, "FOLLOWING" INTEGER);

DROP TYPE PAL_LP_RESULT_T;

CREATE TYPE PAL_LP_RESULT_T AS TABLE("FOLLOWER" INTEGER, "FOLLOWING" INTEGER, "SCORE" DOUBLE);

DROP TYPE PAL_CONTROL_T;

CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR(100), "INT_ARGS" INTEGER, "DOUBLE_ARGS" DOUBLE, "STRING_ARGS" VARCHAR(100));

DROP TABLE PAL_LP_PDATA_TBL;

CREATE COLUMN TABLE PAL_LP_PDATA_TBL( "ID" INTEGER, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100));

INSERT INTO PAL_LP_PDATA_TBL VALUES (1,'MYSCHEMA.PAL_LP_DATA_T','in');

INSERT INTO PAL_LP_PDATA_TBL VALUES (2,'MYSCHEMA.PAL_CONTROL_T','in');

INSERT INTO PAL_LP_PDATA_TBL VALUES (3,'MYSCHEMA.PAL_LP_RESULT_T','out');

CALL SYSTEM.afl_wrapper_generator('PREDICT_FOLLOWER','AFLPAL','LINKPREDICTION', PAL_LP_PDATA_TBL);

And then I execute the procedure with the data I downloaded from Twitter

SET SCHEMA MYSCHEMA;

DROP TABLE #PAL_CONTROL_TBL;

CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL LIKE PAL_CONTROL_T;

INSERT INTO #PAL_CONTROL_TBL VALUES ('THREAD_NUMBER', 2, null, null);

INSERT INTO #PAL_CONTROL_TBL VALUES ('METHOD', 1, null, null);

INSERT INTO #PAL_CONTROL_TBL VALUES ('BETA', null, 0.005, null);

DROP TABLE LP_RESULT;

CREATE COLUMN TABLE LP_RESULT LIKE PAL_LP_RESULT_T;

CALL _SYS_AFL.PREDICT_FOLLOWER(LINK_PREDICT, #PAL_CONTROL_TBL, LP_RESULT) with overview;

Let’s take a look at the results

Hmmm, seems like I will have one new follower, let’s see on tweeterid.com who he/she is

@atul_vaikul, I have no idea who you are but I’m here waiting mate! :smile:

We went thru all this trouble to find my next follower, but that’s not all, I can also find out in the results who should I be following

@SAPCommNet is the twitter account of SCN – I was surprised that I didn’t already follow it. Same with @SAPinMemory, almost a no-brainer to follow and @JohannesSchnatz is in fact blogging a lot about SAP and SAP HANA. I don’t really share his interest for SAP HCM (and fishing), but we are both guitar players, as it seems!

Hope you liked it!

Follow me on Twitter: @LukiSpa (especially you, @atul_vaikul)

Info en Español sobre SAP HANA™:

www.HablemosHANA.com

Predicting My Next Twitter Follower with SAP HANA PAL

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z

Difference between SAP S/4HANA :Public Vs Private edition : RISE with SAP

The SAP Hana Reference for SAP Basis Administrators

Challenge Submission : Chitti - The HEAL bot (built using Design Thinking)

Official Product Tutorials – SAP BusinessObjects Web Intelligence 4.x

Challenge Submission: SAP Conversational AI Chatbot helps to test SAP projects using Int4 IFTT

Want to learn SAP HANA? Where to Start? Certification?

CAI Challenge Submission: SAP CAI Chatbot Integration with SAP-TM - Cost Simulation –