Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
rahul_aware
Explorer
My previous blog post was on- how to deploy a UIMA annotator on HCP as REST service. I have been playing around with various UIMA annotators that do some wonderful stuff extracting information from unstructured data. I started with concept mapper which finds concepts in the source text by comparing it with concept dictionary loaded in memory. It helps in identifying and enriching the concepts of your interest. It’s been widely used in medical field to analyze medical records and patient history where medical terms and its properties are well documented (National Library of Medicine).

 

I was thinking hard to find an application for concept mapper running on HCP that could demonstrate its strength. I started with SAP HCP glossary as source of dictionary terms and was playing with some blog text to find the named entities. Idea was to classify blogs based on glossary terms. Like – find all the blogs which are related to Cloud connectivity services, blogs related to document services etc. However, I was not very happy with the dictionary based on glossary terms. Results received from Concept mapper based on this dictionary had many problems. I had to abandon this idea for a newer one- finding person names in text.

 

NameFinder service is implemented as a REST service running on Netweaver cloud. You can try it at-

https://namefinders0007950666trial.nwtrial.ondemand.com/uima-simple-server-concept/?mode=form

 

Enter some sample text with person names in it. Here some content taken from wiki.



Submit the text to NameFinder annotator. You will see result with <NameAnnotation> xml tags.

 



NameFinder annotator in based on OpenNLP toolkit that uses machine learning process for analyzing natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.

 

This rest service can be easily consumed using following SAUI5 javascript code to show the NameAnnotation in a table–

              var oModel = new sap.ui.model.xml.XMLModel();


              $.ajax({


                  url: 'http://localhost:8080/uima-simple-server-concept/',


                  type: 'POST',


                  data: 'text='+InputText+'&mode=inline',


                  dataType: "xml",


                  success: function(xml) {


                     oModel.setData(xml);


                     sap.ui.getCore().byId("cTable").setModel(oModel);


                     sap.ui.getCore().byId("cTable").bindRows("/NameAnnotation"); 


                  }

 

Next challenge was to find a good source of unstructured data that can be programmatically fed to NameFinder REST api. ‘scnReader’ project by tom.vandoorslaer came handy. Thank you Tom; for your wonderful work here. In no time, I was able to import this SAPUI5 project into eclipse and add my own code to read RSS feed items and pass it to NameFinder REST api and show the names in a table.

 

I added couple of input fields to SCN reader to take any RSS feed link and number of items to fetch.



As a sample, I used NetWeaver Cloud Developer Center RSS feed for blogs.

 



On entering RSS feed link and submitting – it fetches recent 10 items and shows in the table.



 

Now we have blog list with its content that can be sent to NameFinder REST service for analysis. On clicking ‘Get Names’ following result in shown in ‘Identified English Names’ table:



On scrolling down-



Further scroll down-





Total 18 names are identified from recent 10 blog posts in Cloud community. With some more javascript, number of occurrences of a name can be calculated for comparison. With some efforts, these results can be persisted into database (HANA?) with some additional information like blog category(tags), blog link, results from concept mapper, date and time information, etc to come with some real world application.

 

So now you know who is famous on SAP netweaver cloud community. Some of the NameAnnotations are not person names. These miss hits are due to machine learning algorithm and name finder model.  These models can be enriched and algorithm can be retrained using OpenNLP library to improve the accuracy.
5 Comments
Labels in this area