Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member
0 Kudos

Hi Friends, 

I am back with continuation to my below blog, 

http://scn.sap.com/community/hana-in-memory/blog/2014/05/16/text-analysis-of-ipl-match-using-twitter...

In this part of document, we will be focusing on Custom Dictionaries.

Recap:

If you refer to below screen shot it indicates when SQL query is executed for 

TA_TYPE = ‘PERSON’, Virat Kohli & Ashwin repeated few times in separate rows.

Why this is happening:

When comments are entered in Twitter by different Users, it depends on individuals 

how data is entered.

Possibility of having Cricketer names entered in different ways is a common scenario.

To make it Standard and for easy analysis, we need to create custom dictionaries and let system return a uniform name when SQL is executed.

Now let’s see how this can be achieved:

Create custom HANA Text Analysis configuration file

In HANA studio create a workspace followed by creating and sharing a project.

Under this project create a new file with extension “hdbtextconfig”. 

Copy all the contents of one of the predefined configurations delivered by SAP they are located in the HANA repository
package: “sap.hana.ta.config”.

For this exercise, let’s copy contents of the configuration file “EXTRACTION_CORE_VOICEOFCUSTOMER”.

Creating a Text Analysis Configuration: Section 10.1.3.2.1 of the
HANA developer guide SPS07: http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf

In next document I will highlight how to create Custom Dictionary and put in Custom Configuration that we created just now to achieve analysis on Twitter Data and avoide repeated names when running SQL to perform analysis.

2 Comments
Labels in this area