Text Analysis of IPL Match using Twitter Data (Par...

Technology Blogs by Members

Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!

Hi Friends,

I am back with continuation to my below blog,

http://scn.sap.com/community/hana-in-memory/blog/2014/05/16/text-analysis-of-ipl-match-using-twitter...

In this part of document, we will be focusing on Custom Dictionaries.

Recap:

If you refer to below screen shot it indicates when SQL query is executed for

TA_TYPE = ‘PERSON’, Virat Kohli & Ashwin repeated few times in separate rows.

Why this is happening:

When comments are entered in Twitter by different Users, it depends on individuals

how data is entered.

Possibility of having Cricketer names entered in different ways is a common scenario.

To make it Standard and for easy analysis, we need to create custom dictionaries and let system return a uniform name when SQL is executed.

Now let’s see how this can be achieved:

Create custom HANA Text Analysis configuration file

In HANA studio create a workspace followed by creating and sharing a project.

Under this project create a new file with extension “hdbtextconfig”.

Copy all the contents of one of the predefined configurations delivered by SAP they are located in the HANA repository
package: “sap.hana.ta.config”.

For this exercise, let’s copy contents of the configuration file “EXTRACTION_CORE_VOICEOFCUSTOMER”.

Creating a Text Analysis Configuration: Section 10.1.3.2.1 of the
HANA developer guide SPS07: http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf

In next document I will highlight how to create Custom Dictionary and put in Custom Configuration that we created just now to achieve analysis on Twitter Data and avoide repeated names when running SQL to perform analysis.

SAP Managed Tags:
SAP HANA

2 Comments

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Count

Text Analysis of IPL Match using Twitter Data (Part 2)

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z