I have downloaded SAP Lumira and created this easy test project with some dummy data.


My intention is to just to get a T-shirt but I hope to play with this tool in the future because I think is nice.



Jacob P George

SAP Data Geek Challenge

Posted by Jacob P George Nov 29, 2013

I have downloaded SAP Lumiria and played around with the prediction tool .

The Data I have used is the Co2 emission data available  for different countries in the world.

Please find the Top 50 ranked co2 emission data i have analysed in the following picture.

I dont know the practical application of these charts. I have written this blog for the GC T-Shirts



Actually My actual intention was to just to get a T-shirt. But I actually started liking this tool. There was this sample data for my favorite Cricket Player Sachin Tendulkar in the location http://scn.sap.com/docs/DOC-31433 .So I started looking at his stats. The figure I like the most is his number of centuries against Australia. Australia being such a dominant side he knows how classy player he is.  The stats pretty much tells the truth.I am a BASIS Consultant I normally install the tools for our customers , I like the usability side of it.


























Hundreds of thousands of people have downloaded SAP Lumira, personal edition in the last few months and one the questions I hear the most is:


SAP Lumira works well on Excel data, but can I use it with other data sources?


The short answer is: You can.

We have to 2 editions of SAP Lumira. Personal Edition which is absolutely free - go to www.saplumira.com to download the application. This edition allows you to connect to Exel data (CSV, XLS and XLSX) and cloud data hosted on SAP HANA One.


Standard Edition is the other one.

With Standard Edition, you can also connect to a variety of data sources. SP13, our lastest release can connect to the following data sources:


Microsoft SQL Server2008, 2012
Oracle10, 11
SQL Anywhere12
Adaptive Server Enterprise15.5
Teradata12, 13, 14
IBMDB2V9, V10 for s/OS, and LUW.
NetezzaNetezza Server 4 Netezza Server 5 Netezza Server 6
Postgre SQL9
Generic JDBC
Generic oDATA2.0
Hadoop Hive0.10
SAPHANA Database1.0
ERP6SAP Java Connector
R/3Release 4 - SAP Java Connector
mySAPERP 2004Java Connector



SAP Lumira, standard edition also connects to

  • SAP BW data exposed as views in SAP HANA


How to try SAP Lumira.standard edition for free for 30 days?

1. Install SAP Lumira, personal edition and start the application

2. From Home, Click on the 30-Day Trial button at the bottom (as shown below)



3. That's it! Enjoy connecting to all the sources listed above for 30 days for free!


AND don't worry, at the end of the 30-day trial, your copy of Lumira won't be locked down. You will still be be able to enjoy all the capabilities of SAP Lumira, personal edition with your Excel data. And if Standard Edition is a must for you, you can purchase it by clicking on the Buy Now button (as shown above); it will bring you directly to our eStore.


Having trouble or want some help?

If you experience problems or just want to ask a question, contact one of our SAP Lumira

experts.They are here to help you. And it's free. Yes, totally free!

phone (+1 855 558-6472), email (support.lumira@sap.com), or Twitter (@saplumiraexpert).

Last night I was fortunate enough to attend a class presented by the author Andy Kirk which was a super speedy whistle stop tour through lots of content from his recent book "Data Visualisation - a successful design process". It was an unplanned event for me as I responded to a tweet saying there was 1 ticket left and as the event was taking place just 10 mins walk from my companies London office i jumped at the chance.




It was a great class not only for those to whom the craft of Data Visualisation was new to them but also had some great nuggets for those wanting to optimize their design approach.


As part of the class there were practical activities and one of note was to explore a data set on obesity trends from the World Health Authority.  Delegates were asked to look at the dataset in excel and identify data variables and ranges, ways they might transform the data and crucially identify some possible data questions and interrogations.  Andy then shared his answers and his data visualisations with the class.   This activity got me thinking ....


Could I use SAP Lumira to undertake these tasks and recreate the data visualisations ?


Andy kindly shared the dataset with all the delegates and on my train journey home I started to play.  I acquired the data set into SAP Lumira which was only 11,580 rows and 14 columns.  Not a large dataset by any means but packed with insights to be drawn out.   I then set out to reproduce each of the visualisations discussed in the class using SAP Lumira.



Visualisation 1


Andy Kirk




SAP Limura

Not too difficult but I had to reduce the number of data items shown to the top 50 to allow me so see and therefore select (highlight) the USA ans UK



Visualisation 2


Andy Kirk




SAP Lumira





Visualisation 3


Andy Kirk





SAP Lumira


Now this was the toughest one in the set as SAP Lumira doesn't have this chart component type in it's library (an opportunity for an enterprising developer using the SDK maybe).  So I worked through a number of different ideas but sadly had to pivot the data outside of SAP Lumira in Excel and add multiple data sets to structure the data in a shape to drive the visualisations I had in mind. This method worked easily and I will repeat this approach in the future.




Not as easy to read as the original




You know I just love Pie charts !!  This kind of works but the scale doesn't expose the differences well enough.



My thinking turned a corner ...   In Andy's visualisation you are comparing the gap between the BMI in each gender by region, you could call this the variancebetween the gender scores.  So I set to work on showing the variance between the gender as an absolute number and then plotting how it changes by Region and between 1980 and 2009.


You can easily see the changes in the variance for example in Africa between 1980 and 2009 has widened by nearly 1 full point.  This approach worked but it masked the gender split.  


Then again my thinking moved on to using a Radar Chart which I think is the clearest and closest to the original representation by Andy Kirk.






Visualisations 4 & 5


Andy Kirk





SAP Lumira





5 viz.JPG

These seams to be a "bug" in Asc and Desc sorts in SP13 but the visualisation are pretty much there.



Visualisations 6 & 7


Andy Kirk




SAP Lumira






Visualisation 8


Andy Kirk




SAP Lumira


Now this didn't look tough until you think about the volume of data points plotted.  193 countries x 30 years = 5790 in a 6 zone trellis chart.  Sadly SAP Lumira couldn't render the chart.




BUT .....   If you can build one for one region then there is further possibility:




Use the new COMPOSE feature to build a storyboard with all 5 regions displayed.






It took me about 3 hours to prepare, explore the data and build all the visualisations and I'm really happy with the results.  With more time refining the titles, colours etc. I think SAP lumira could really step up to the mark in delivering high quality Data Visualisations.




Content reproduced with the kind permission of Andy Kirk, visualisation blogger, designer, consultant, author, teacher, trainer and speaker


Hey Everyone!!


We've just had another great Wednesday, where we took to Google+ LIVE and talked about the week and Lumira .


We, the LEX (Lumira Expert) team wanted to share that with you!!







Enjoy the video, we'll be breaking it down and adding all the documents we used shortly!!









Hello forum, after some days fighting with SLD, SMSY, LMDB, and so more I get really frustrated about the low control to manage my solution manager data on a PCOE process. As I told on some previous comments and post, I try to download data from sap marketplace, import more system details from SLD's of  managed systems, etc... I decide to publish that blog post here rated that on Lumira forums because here is more interesting for us to see how to analysis that job with that tool than for Lumira people, when this sure don't have enough relevant or interest due to their complexity,


While explain that problems to the coffee machine I remember that we can use Lumira to analyse solution manager data, so, why not try to read the result of that job with Lumira ? lets go  what I can setup the log job to be able to analyse it with Lumira.


1.- Export the job result that contain around 3600 lines of error messages to Excel, that test is done from a Test system.


2.- Modify the excel layout to prepare data, that part is the bored one, because you have to create and filter to hide some useless message, sort the result by message type/class, and you need to create some formulas to be able to create "indicators" on Lumira.




That are the columns that I added to get indicators on Lumira (you can do that on Lumira, but for me is easy in excel )




And example of the formulas:


=IF.ERROR(IF(FIND("authorization not found for customer";C2;1);MID(C2;6;11);"");"")

=IF.ERROR(SI(FIND("authorization not found for customer";C2;1);MID(C2;81;13));"")


=IF.ERROR(SI(FIND("when calling SAP backend";C2;1);EXTRAE(C2;14;12);"");"")



3.- Import it to Lumira.




4.- After the import you have to prepare the data in Lumira, create the indicators on the left side and select different tools to display the information.



5.- For my test system, that's the result were I can see the different errors of "Refresh_admin_data_from_support", group by error type and with the option to display each group in a different chart, errors grouped as:


- Wrong mapping in AISUSER table for user XXXX

- Amount of deleted users

- Systems that has been created successfully

- System's that can be created

-.Error on Installation, like user don't have authorization for installation .....

- Warning about what installation didn't has been flaged to be downloaded

- Amount of duplicated systems download fronm SMSY ( that's is a surprise for me, I can't imagine that this can happens while download the data from marketplace )

- Customer with no saprouter updated on marketplace

- Users without auth for maintain system data in marketplace.





I'm really new with Lumira and Analytic tools but with the ease functionality of Lumira, it's possible to read that solman job's easily; on the last 2 screenshots I try to represent the amount of error message on that job, where we can see that the winner is AI_SC_EN when if we avoid successful message the rest are problems on message creation. On the right hand is a chart to represent the installations that has wrong customizing in that solution manager test system. if we link the high value with the system list of the log message we can see where is the error on the configuration. That sample could be more usefull if you use the "join" functionality of Lumira to map the result with the AISUSERS, AISAPCUSTNOS and V_AININSR tables,


All that test are from my trial of Lumira, and only from excel data sources. I need to talk with my BO colleges to connect Lumira to solman BW data sources, then that analysis could be done near to in live system without mapping and adapting excels.

The purpose of this blog is to show the application possibilities of SAP Lumira regarding a brief time series data analysis of stock market data.


Basically, I picked several companies that are traded on the NYSE and gathered their historical data, such as opening price, closing price, adj. price, traded volume, market capitalization and so on. Once the data was gathered, I decided to take a closer look on the performance of the closing prices of these corporations over ten years for my first data analysis with Lumira.

As soon as I had my .csv files merged into Lumira and prepared for the analysis, I started to assign the necessary key figures. Once I had the closing prices and the corresponding dates, I was ready to start. The graphics could be easily created by applying the closing prices to the y-axis and the date to the x-axis.


3M - MMM (2003 - 2013) Closing Price Time Series


AXP - American Express (2003 - 2013) Closing Price Time Series


BA - Boeing Company (2003 - 2013) Closing Price Time Series


CAT- Caterpillar (2003 - 2013) Closing Price Time Series


CSCO - Cisco Systems (2003 - 2013) Closing Price Time Series


CVX - Chevron (2003 - 2013) Closing Price Time Series


DD - du Pont de Nemours (2003 - 2013) Closing Price Time Series


GE - General Electric (2003 - 2013) Closing Price Time Series


GS - Goldman Sachs Group (2003 - 2013) Closing Price Time Series


KO - Coca Cola (2003 - 2013) Closing Price Time Series


PG - Procter & Gamble (2003 - 2013) Closing Price Time Series


SAP - SAP (2003 - 2013) Closing Price Time Series

Moreover, I was curious how each share performed in comparison to the others. Therefore, I created a time series analysis that took all available closing prices into account.

Performance 2007 - 2013.JPG

Total Performance 2007 - 2013


Furthermore I wanted to compare the performance of each share in between the year 2007 and 2009 (before / after the subprime crisis & Lehman) and also particularly in the year 2008.

Performance 2007 - 2009.JPG

Total Performance 2007 - 2009

Performance 2008.JPG

Total Performance 2008

It was obvious that in fall 2008 (25th of September – exactly one week after Lehman filed chapter 11 / bankruptcy) the closing prices of almost each share in the comparison began to drop heavily.

Performance 2008 (2).JPG

Performance Drop  2008


Although a similar analysis can be created with common computing tools, I found it quite interesting to make my first steps with Lumira and to explore the software’s functionalities.


Have a nice day and thank you for reading :-)

I presented a few days ago at the UK & Ireland SAP User Group Annual Conference (http://www.sapusers.org/conference/) in the Data Visualisation stream.  I wanted to try and explain to the conference attendees what SAP Lumira really is and how it could be used in their organisations.


I thought the slide deck may help out other SCN members who are also asking the same question  ... "SAP Lumira - What's all the noise about"



Click on the link below to download the presentation:




When SAP Visual Intelligence  was first renamed to SAP Lumira, it was mainly a change in name only. Most of the capabilities stayed fairly similar, as it primarily marked a shift in direction for SAP. Pierre Leroux tackled that, and other questions on Lumira in an article written in May of 2013.


Since then,  the growth of the solution has ramped up, recently moving from SP 11 to SP12. Before a lot of people had downloaded SP12, SP13 was being released. The changes have been coming fast and furious, and there is a rumoured release of SP14 being ready before the New Year. The changes have been pretty big as well. Changing from the Flash based SP11 to the HTML5 based SP12 was more like changing from 1.11.0 to 2.0. The look and feel has completely changed as well. A friendlier user experience was the aim of the development team, and they hit the mark.


With new releases coming out so quickly, there is bound to be confusion about capabilities of each version. It’s one of those confusions that I’m going to address here today.


When Lumira SP12, Personal Edition was first released in October of this year, it had the capabilities of working with Excel files, and CSV files, and that was all. We were very clear on what you could connect, and what you couldn’t. With the launch of SP13, we muddied the water a little. It’s not that we weren’t clear about the capabilities, it’s that some people took for granted that the free personal edition would pretty much stay the same…well, because it was free. The SAP Analytics team and the Lumira developers had something else in mind though. When asked about the reasoning for this, Mani Srinivasan replied," the main thought behind adding HANA one to the personal edition is to provoke some interested in the HANA dev community to provide a quick visualization tool to interact with and understand the HANA data. This was provided with the intention to get these folks try out Lumira and promote it within their org."



Pers connection Close up.png

When SP13 went GA Status, the personal addition had a surprise waiting for its’ users. The ability to connect and download with HANA One was added to the Excel and CSV formats that Lumira could work with. I emailed Henry Banks to clarify that this wasn’t a trick, and that this connection was indeed available to

which he replied, “HANA One is our HANA platform, but made available to customers via a public cloud (AWS) “and is ideal for starter innovation projects” (marketing brief). It means they don’t purchase the software, nor do they host nor maintain the infrastructure – instead, they lease the product as a ‘software as a service.” He also sent a link to explain it further.

There were a lot of SAP people that missed the new addition.  I was recently taking in a sales presentation on some of SAP’s analytic offerings, and Lumira was one of the solutions that were being discussed. The presenter was pretty knowledgeable about the Lumira application, versions and capabilities. He started talking about the type of files that the personal edition of Lumira worked with, “CSV and Excel….” That was it. Evidently, he too had missed the new connections available to users of the free version of Lumira. Thus, the reason for this overly lengthy blog.


I felt the need to clarify that these connection types are indeed available in the Free, Personal Edition of SAP Lumira, and that SAP is committed to developing solutions for every level of user out there, no matter what they do (or don’t) pay. The capabilities need to make sense, and adding HANA One to the Personal Edition does just that.


With the release of SP14 right around the corner, make sure you pay close attention to all the new capabilities, because you never know what you may find.

Weekly Technical Education with the SAP Lumira Expert Team!



The Refresh Data Video has arrived!


If you have a dataset which is updated monthly/weekly/daily/hourly, then this video is perfect for you.


Here we show you how to refresh your data within SAP Lumira to quickly and efficiently update any changes which may have been made in your origional dataset.





                     Stay tuned for more posts and videos to more tutorials!







The ABCs of SAP Lumira - presented by:

SAP Lumira Experts

Anthony, Bijan, and Chris





Get SAP Lumira FREE



Need some guidance or assistance?


Follow us on Twitter @SAPLumiraExpert:

contributions by:


140 characters not enough?


Email: support.lumira@sap.com







+1 855 5 LUMIRA

(+1 855 5 586 472)

Hi Guys,


First Says - Thanks to SAP Lumria Developers. It's awesome, releases in Our Biggest Ocean SAP.


In 19th Century, In Most of Organisations they Fighting with Larest data to find the required message or to finding the required details and to find the answer. They are using different ways to elaborate the this large data and they recruites the Prof. Analyzers . Every Day morning, they starts with this  To achieve this and to save the burden and to save the Man power SAP introduce the Data Analysing in SAP LUMIRA.


Main aim is to analysing the larger data and Provides the Required Data and answer with in the sort time.


I am thinking still it is required some awerness program regarding SAP LUMIRA in some countries. They still fighting with on Man Power.


You can see Trailer for Data Geek ( official Trailer in below link). Now laks of people using this Software. Help fulls to elaborate the data of customer and Market analysis,Profitiablity analysis etc.




Data Geek 2.0 - The Rise of Dark Data (Official Trailer)



SAP Provided some of Demo Videos for How to analyzing the Data in SAP LUMRIA.


See Below link elabrotes the Financial data big into small.




Same you like you can do HR large data in to small




Same you like you cand see below Vidoe for Supply chain analysis.




This SAP LUMIRA is awesome and help full to all organizations to analyze the large data into small.


I think Still some awarness required for these in some countries.

Data Geek Challenge – Analyzing the world of cyberspace, factors that affect it

Cyberspace is a realm of electronic communication. It is the virtual reality where online communication takes place over computer networks in every part of the world; millions of people are using it every minute and every second. But what I’m concerned about is the many questions online I came across pertaining to the factors of a country and how it affects cyberspace; however there was no definite answer to it.


In this blog, I will attempt to find out if there are factors that will affect the amount of cyberspace usage in the country.

So, let’s get started! For performing my analysis on which factors affect the country’s cyberspace usage, I have selected the top 5 countries with the highest percentage of internet users which are:
Iceland, Norway, Netherland, Sweden and Luxembourg.

And the bottom 5 countries with the lowest percentage of internet users which are:

Ethiopia, Democratic Republic of Congo, Guinea, Niger and Sierra Leone

I’ve decided to research on 4 major factors that may affect a country’s cyberspace usage. Without further hesitation, let’s get started with the first factor

1. Education Level
Education is a major component of well-being and is used in the measure of economic development and quality of life, which is a key factor determining    whether a country is a developed, developing, or underdeveloped nation.




The above charts clearly show that Education Index affects the country’s internet users.  The top 5 countries all have education index of above 0.97, which leads to over 90% internet users. In comparison, the bottom 5 countries have an education index of less than 0.4, resulting in the percentage of internet users being just above 1%.

Let us consider that education is one of the major factors, what other factors play a role in a country’s cyberspace usage?


2. Gross Domestic Product per Capita (GDP per Capita)

Next, let’s take a look at a country’s economy and see whether it plays a significant role. GDP per capita is one of the primary indicators of a country's economic performance. It is calculated by either adding up everyone's income during the period or by adding the value of all final goods and services produced in the country during the year. GDP per Capita is sometimes used as an indicator of standard of living as well, with higher GDP per Capita being interpreted as having a higher standard of living.



From the above visualization, the top 5 countries with the most internet users again have a higher GDP as compared to the bottom 5 countries with less internet users. However, there is no conclusive evidence that a higher GDP is equivalent to a higher percentage of internet users.

As we can see, among the top 5 countries with the most internet users, there is a huge difference in their GDP per Capita. Thus, we can agree only to a certain extent that GDP does play a part in the percentage of internet users.

3.     Population

With that said, let’s move on to population. It is said that you can’t build a country without its population. So let’s take a look and see if a country population affects the percentage of internet users. Population is the total number of people inhabiting in a specific country. A country’s people are like gears for a machine. In order for the machine (country) to progress, a country’s population has a huge impact on the country’s growth.



By comparing the two charts above, I can see that while Iceland holds the record of having the highest percentage of internet users, it only has a population of around 30,000! It seems like having lesser population would result in a higher percentage of internet users?


The results were unbelievable! I’m surprised that having a larger population does not result in a higher percentage of internet users.



4.     Size of a Country

Would the size of the country be a factor for a higher percentage of internet users?

The size of a country comprises all land and inland water bodies (lakes, reservoirs, river) of a country. The idea is that countries with a larger physical territory are more self-sufficient than smaller countries. Because of a greater territory, the potential access to natural resources and other foundation are higher thus a better economic growth.



From the charts, it seems like I was wrong that the country’s size will affect the percentage of internet users. Having a larger country like Congo, Niger and so on does not mean a higher percentage of internet users. All the top 5 countries with the higher percentage of internet users couldn’t compare their country’s size against the bottom 5 countries with the lowest internet users.


In conclusion, I would say that I agree to a certain extent that all the mentioned factors play a part in determining the country’s percentage of internet users. No single factor would solely affect the percentage of internet users in a country.  However, it seems that education index plays a more significant role in affecting the percentage of internet users in a country.





In this blog, I have tried analyzing the different factors that affect the percentage of internet users. While performing my analysis I realized another factor I would like to explore which is the total monthly internet views against internet users of a country.


Total monthly internet views refer to the total number of people who use the internet to perform tasks such as browsing of internet, communication, etc.


The analysis I performed earlier on uses percentage of internet users against the population of a country. So for the next part, I’ll attempt to compare the raw number of internet users and their monthly views.


For performing my analysis, I have selected the top 5 countries with the highest number of internet users which are:
China, United States of America, India, Japan and Brazil


The above column chart shows an impressive visualization, comparing the 2 sets of data. From this chart it clearly shows that China has the highest number of internet users of more than 500,000,000 followed by United States, India, Japan and Brazil. To my surprise, this data shows that although a country may have the highest number of internet users like China, it does not equate to having the most number of monthly views.



Data Geek Challenge – Analyzing COE prices of vehicles in Singapore



Before buying a new vehicle, potential vehicle owners in Singapore are required by the Land Transport Authority (LTA) to first place a monetary bid for a Certificate of Entitlement (COE). You will need a COE if you wish to register a brand new car in Singapore. COE quota is set by the Land Transport Authority (LTA) for the purpose of regulating & controlling the numbers of new cars on Singapore roads so as to avoid over-population of cars. The COE is the quota license received from a successful winning bid in an open bid uniform price auction which grants the legal right of the holder to register, own and use a vehicle in Singapore for a period of 10 years.

COEs comes in five categories. A, B, C, D, E respectively. The categories are sub-divided according to the capacity of the car.

Vehicle Categories

Vehicle Type


Cars with a capacity of 1600CC & below Taxi


Cars above 1600CC


Goods vehicle & bus





*CAT E COEs are transferable and can be used for any kind of vehicle and capacity, unlike CAT A and B.*

The price of COEs is determined by open bidding from public which is held on every first and third week of the month. An individual or company can submit a bid via ATM or online for an amount they want to bid; the end results will be the amount tendered by the lowest successful bidder.

      John is a young working adult who wants to buy a new car. He wants to know what are the trends for COE prices to determine if there’s a best time to buy a car. He also wants to predict the likely COE prices in the near future.





Let’s start off with finding out the average price for COE for the different categories in the past 4 years (including this year).

Note: The numbers below are all in SGD (Singapore dollars)


We can notice a general rising trend of the price for the COE in the last 4 years. So, John drilled down to take a look at CAT A monthly COE price for the year 2013. John would like to know if the monthly COE price amounts are increasing as well.


Well, the average price for COE CAT A is around $73000SGD. However, as you can see from the chart, the COE prices are not increasing from January to December. For example, in January and September of 2013, the prices exceed $80000SGD, while in the months of March, April, May and June, the prices are below $70000SGD. There is a huge difference in the COE prices across all the months in 2013, so what factors play a part in this change? Let’s take a look at one of the factor which is the bid quota of every month and the number of people who bid on that month.



It looks like the more the number of people that place a bid, the lower the COE price. This seems rather strange but a possible explanation could be that people have the thought that during the duration between March to May, the COE price will be lower, hence there are more people bidding. This could be one of the reason why the number of bidders is so high but the price is so low or perhaps the number of bids is not a factor that affects the COE price.


Another possible factor which may affect COE prices could be the economy. If the economic outlook becomes less rosy in 2013, this could also dampen demand for cars. Historically there has usually been a slight correlation between equity and COE prices. If the share market booms, buyers are more willing to pay high COE prices. Education could be another factor, when people are more highly educated they may earn more and be able to afford a car.

Now let’s take the trend of the past 4 years to make a prediction of the price of COE for the year ahead.

To create this predictive model, we select R-Triple Exponential Smoothing from the “Time Series” as this will allow us to have “Months” as Period as compared to the other algorithms.


Next, double click on the algorithm. This will create a connection between the data source and the algorithm. Hover over the algorithm and select “Configure Properties”.


Now let’s start off with creating a chart based on the trend of CAT A from January 2010 till October 2013.


Save and close. Hover over to the algorithm and select “Run Analysis”.



The trend from January 2010 to October 2013 is displayed. Now, let’s do a prediction of the price for the next 14 months, which is till December of 2014.



If we change the Output Mode from Trend to Forecast, we get an extra parameter "Periods to Predict". The periods to predict refers to how far you want to predict. In this case it would be 14 as we would like to know the prediction of the price of COE for the next 14 months


Let’s run the analysis again:


This predictive model has foreseen that the price of COE will continue to increase. The result shows that the COE will increase all the way to over $100,000SGD in the year 2014.


After looking at the trend and prediction using SAP Predictive Analysis, John has decided to make a bid to purchase the COE in early 2014, before the price of COE increases to over $100,000.

Weekly Technical Education with the SAP Lumira Expert Team!



This week's educational video will be on Refreshing Data!


We will post the corresponding video tomorrow.


In the meantime, be part of the conversation and follow us on Twitter and G+!!!

refresh data.jpg







The ABCs of SAP Lumira - presented by:

SAP Lumira Experts

Anthony, Bijan, and Chris







Get SAP Lumira FREE



Need some guidance or assistance?


Follow us on Twitter @SAPLumiraExpert:

contributions by:


140 characters not enough?


Email: support.lumira@sap.com







+1 855 5 LUMIRA

(+1 855 5 586 472)

This blogs describes about my Internet browsing experience. Using SAP Lumira application, I have generated some graphical representations which shows top 10 Internet Browser ranked & rated by speed, consistency, reliability.


When I first started logged in SCN couple of months ago the first issue was related to SAP PI Module. I began my blog in Feb 2013, inspired by those being done by  Ravi Shankar Venna. My SCN blog topics focus on SAP PI, SAP Lumira, Career Network & SCN. I am planning to post many technical blogs related to SAP & SUP in future. I never make statements that might hurt the feelings of others or be offensive in any way. If I receive any comments from anyone, immediately I will take the action.


I prefer blog postings that aren’t too long; scrolling through a long page of text is sometimes beyond my time and attention span. I always enjoy seeing images, especially slide shows, and I appreciate being able to enlarge images from blogs. When I post to my blog I usually upload one or more images, or I include links to a website of interest. Sometimes I’m able to update my blog daily while at other times a week or more might pass before I have time to post.


Most of my blogs aren't too long. I always enjoy posting creative images, innovative ideas and interesting topics. Sometimes, I also include links to my blogs and documents which helps other to reference in detailed way. I’m always looking for interesting blogs that inspire me.I am happy to inform you that, blogging becomes my daily activity and it is my hobby now.


It always easy to browse the world wide web using internet browser and locate/access webpages. Using Internet browser we can read text, view images, play videos, listen to songs, download and upload videos etc., Internet browser play a vital part in daily activity for personal use, official & business purpose. One of my favorite blogs which help others to understand about the graphical representation using SAP Lumira. I think this type of blogs which motivate others to generate various types of graphs, tables, charts which will be very useful for their projects. So I have decided to merge Internet Browser with SAP Lumira.


Let's come to the point, why I have posted this blog? 


Why graphical representation is important? 

Why is our participation is important for graphical representation, tables, charts during client discussion ? 


Answer for all the above questions : "Graphical presentation skills are important for Individual success, Business success, Time Management, Leadership".


With the help of graphical representation we can easily convey the message to our employee, managers as well as our clients. SAP Lumira application is one of the best and easy application to generate graphical representation, tables and charts much faster and easier.


I have decided to generate some interesting topic about "Internet Browser" which helps us to "Log in to the web link", "Browse to the contents" and "Excited about the performance". That's why I have decided to post this blog as " I Logged, I Browsed, I Excited" 


Using SAP Lumira and Excel Data sources, here are some of the Graphically represented figures below.  I choose various websites and analyzed about "Top 10 Internet Browsers in the World". Generated with pictorial representation of graphs and charts.


Also we can insert images as background for all the graphical representations. I have generated and attached graphical representation with images for your reference:


Top 10 Internet Browser:





Internet Browser.jpg


Ranking 1.jpg


Ranking 2.jpg

Ranking 3.jpg


Ranking 4.jpg

Few more updates will be added soon!


Happy Updating and enjoy Data Geek Challenge! 

Hi all!


For my Data Geek Challenge entry, I decided to analyze data about major Hollywood films from 2007 and 2011. I used data set Hollywood Budgets from a page called Information is Beautiful (http://www.informationisbeautiful.net/2012/hollywood-budgets-a-data-viz-challenge/). There they have wrote: "It always bugs me how Hollywood grades or broadcasts the success of a film by gross income. Profitability, or % of Budget Recovered, is a way better grade of a film’s success. Especially in America, where each film has such high printing and advertising costs, that it needs to recover about 250-300% of its budget to be deemed a true hit.

In fact, if you use Profitability as an index, it changes the view considerably. Take 2007, for example, where the biggest grossing film was Pirates Of The Caribbean: At Worlds End. But it only recovered 320% of its budget. But the most profitable film of 2007 by far was…Can you guess? Have a look at the data."

With all my pleasure!

The data set does not include all the movies that have been incurred in Hollywood in the years 2007-2011. Especially the list of ones made in 2011 is not complete. I also deleted rows with films that had incomplete information, for which I estimated that it would not significantly affect the results of the analysis. Nevertheless there are still 662 films to analyze! Moreover I enriched the data with information on the main actors and added a dimension % of Domestic Gross in US Opening weekend.

The attributes that I used:


From them I used next ones as measures:



I started with a simple question: Which year did we spend the most for visiting cinemas and enjoying our movies?


I found out that Worldwide gross is increasing through years…why? It could be that we are visiting cinemas more often, it is also possible that the tickets went up over the years, or that studios sell a lot more promotional items. This offers a possibility for a further research, but I decided to stay with the data that I have in front of me.


I did a similar chart, replacing Worldwide Gross with Budget. Also budget is constantly increasing…no recession in movie industry, I guess.

And which movies by analyzed years brought the most in the box office? Taaa-daaaa:


No surprises really, ha? The Avatar really stands out and as a huge fan, don't see why not. It was the most anticipated movie in last years…James Cameron did his job good..AGAIN. The list of movies with top worldwide grosses (http://boxofficemojo.com/alltime/world/) shows, that Titanic was on the top of the list for 12 years. I wonder which movie will beat this two biggest earners.


When I added also a budget to the chart, we can see how much were the films actually profitable. A lot!


But are this films also the most profitable? Let's have a look:


When I used Profitability as a measure, I've got a whole different picture: Film Paranormal Activity is an absolute winner. The return of the budget was more than 10.000 times! To see which movies were most profitable in other years, I had to exclude Paranormal Activity:


We can see that the stats are quite correct, while comparing it to this list of most profitable movies based on ROI (http://www.the-numbers.com/movie/budgets/):


Next, I wanted to analyze what is the connection between Audience score, Worldwide gross and income on US Opening weekend. I found out, that movies with bigger audience score (The King's Speech or The Artist) or biggest worldwide gross (Avatar) not necessarily have a high attendance in the first weekend. Interesting is also that The Twilight Saga: New Moon was the second in earning the most in US opening weekend, but overall worldwide gross isn't that big. Maybe there was no teens left after first couple of weekends.


When I changed worldwide gross to profitability chart change a whole lot:


The next question of mine to answer was: does the budget affect on audience scores? I ranked TOP 20 films by audience score (size of the blocks) and add their budget (colour of the blocks). We see that more than half of movies had the budget only up to $50 million, so we can easilly say that the cost of the movie doesn't really affect it's quality.


When I changed attribute to Rotten tomatoes scoring, we can see quite a difference. That's because Rotten tomatoes represent average meta score of critical reviews, and Audience scoring is completely done by users of the portal. Nevertheless we can see the same picture as above: the budget doesn't really affect people's taste.


I also wanted to see how did critics score the biggest earners:


Even Megan Fox couldn't save the Transformers: Revenge of the Fallen. 


Again one example showing that you can have smaller budget but still good earnings (Toy story 3):



From this pie charts we can see that a lot of studios made movies in 2011 comparing to previous years. Promising business isn't it? But to see data more clearly I filtered it to TOP 10 lead studios by worldwide gross:


The shares changed in 2011: Fox didn't do nothing much, but Relativity Media and DreamWorks Pictures had their big shot…let's remember: DreamWorks Pictures made Transformers: Dark of the Moon in 2011, with beautiful Rosie Huntington-Whiteley…guess that was a right choice.

I wanted also to check if the Academy is properly doing their job…and yes(!) they are...all of the films awarded with Oscars in those 5 years have an audience score bigger than 80 %.


Interesting view is also a comparison between Oscar awarded films by their domestic and foreign gross. Coming from Slovenia, a very small country in Europe, it's interesting to me, that we (ok, not only we as Europe, but also other worldwide countries) prefer some films more that they do in US. I calculated the share of domestic gross in worldwide gross and then the average of all 662 films…and it's 54,45 %! So even if the US market is smaller than all the other foreign countries together, most Hollywood films earn half of their gross in US! So it's quite right to say, judging by grosses, that we liked The King's Speech much more than US audience did. Hmmm? It's also another way around…I never heard of the movie The Blind side 'till now.



And which studios are making this awesome films and how much do they spend for them:


And movies from which studios do we score the highest?


I found out another interesting fact: people in US are rushing in cinemas to watch horror and actions films:


Action films achieve also the biggest worldwide grosses, while comedies are the most profitable:


Talking about profitability: which films did not even recover 50 % of the budget?


Why is that? Weren't they any good? Adding audience scores tells me that's the case, yes. But surprisingly, the Take Shelter film has an impressive audience score, but still manage to get only 30 % of their budget back.


What about the stars? Here are the TOP 20 main actors by average audience scores of their appearances, also by worldwide gross. We can see that being Harry, was Daniel Radcliffe's best move.




So that's it. With this data set, I could do even more visualizations, but I think it's enough. I found some interesting facts and I enjoyed working with Lumira, mostly because it's very quick.

I was hearing about SAP Lumira from my customers, and they were asking me questions about it for awhile. I played with a demo data set a little bit and I was amazed that the tool is very easy to learn and create visualizations.


However, I saw some lacking features in the tool especially data sources. I was expecting to see SAP NetWeaver BW connection, but it wasn't there. I am sure SAP will eventually integrate this tool with SAP BW as well considering many of SAP customers using BW instead of SQL databases. In the meantime, it is possible to connect BW InfoProviders with multi-source universes which can be consumed by Lumira. It might also be good idea to add bullet charts to this tool which could be used to visualize performance.


When I was reading SAP Lumira section on SCN, I saw this Data Geek Challenge. I decided to accept it.


For this challenge, I wanted to use Arms Imports and Exports data set from the World Bank. I aimed to use this data because there is so many conflicts around the World going on right at this moment and the governments are spending so much money on arms instead of using this money on education, health, food, housing and so many basic needs of the people.



About the data set:


Arms Exports: http://data.worldbank.org/indicator/MS.MIL.XPRT.KD

Arms Imports: http://data.worldbank.org/indicator/MS.MIL.MPRT.KD



The data set is arms imports and exports starting from 1960 to 2010. Arms transfers cover the supply of military weapons through sales, aid, gifts, and those made through manufacturing licenses. Data cover major conventional weapons such as aircraft, armored vehicles, artillery, radar systems, missiles, and ships designed for military use. Excluded are transfers of other military equipment such as small arms and light weapons, trucks, small artillery, ammunition, support equipment, technology transfers, and other services.




1.Export and Import Amounts in 10 Year Groups



When looked carefully to the chart, it is possible to see that imports and exports are at the top in the 80s which would be the cause of Cold War. As it is seen that in the 90s this dropped.




2. Exports and Imports by Years



The chart above shows arms imports and exports by years. Hopefully, arms imports and exports are in decline since the 80s.




This charts shows arms imports and exports by year for Turkey. Turkey mostly depends on imports for arms. Imports hit the top in 1998, but after that it dorpped sharply especially in 2001. The reason in 2001 would probably be the financial crisis in Turkey.





I also wanted to see Iraqi imports. It was interesting because of Gulf War I in 1990, Iraq was imposed by sanctions by the US until 2003 Gulf War II.



3. Top 25 Importers and Exporters



The data was filtered years between 2000 and 2010. The United States, Russia, and Germany are the top 3 of exporters of arms. China, India and South Korea are the top 3 importers of arms. It was really interesting to see Greece being the number 4 importer because right now they are struggling with financial crisis.



However, in the 90s, the United States was the top one exporter by so far. Russia in 2000s caught up with the US. Also, Turkey was the top one importer of arms in the 90s. There were conflicts in southeastern Turkey in the 90s which ended in 1999. After the end of the conflicts, financial crisis is followed. It is not the major reason for crisis, but Turkey spent 17.3 billion dollars in the 90s for arms.




4. Imports & Exports in the World





China and India imports the most arms in the world.




Russia and the United States are the top 2 exporters in the World.


Some Facts:


There are 842 million people in the World do not have access to enough food to eat.


Nearly one in four people, 1.3 billion - a majority of humanity - live on less than $1 per day, while the world's 358 billionaires have assets exceeding the combined annual incomes of countries with 45 percent of the world's people. UNICEF


To satisfy the world's sanitation and food requirements would cost only US$13 billion- what the people of the United States and the European Union spend on perfume each year.






At last, countries spend $24.9 billion just to buy arms in the World in 2010 alone. This money could be used to benefit people around the world. It could be used for housing, food, health, education, and so many other things.





Think about it.

Over the past 17 years I have reviewed and implemented a variety of business intelligence, reporting and predictive analytics solutions including SAP's offerings. I have been following SAP Lumira since the debut as SAP Visual Intelligence at SAP TechEd in 2012. I remember sitting in various SAP TechEd sessions hearing SAP Product Managers tell the audience to expect three to four week feature release cycles.  At that time, I was both shocked and skeptical. I thought that what I was hearing was almost impossible from a global business intelligence vendor the size of SAP.  I was wrong. 


This past year I bought SAP Lumira Professional at the Sapphire conference. At that time I had blogged about other data visualization solutions with SAP Hana. I felt SAP Lumira needed to mature a bit more before I would leverage it in my projects. Since then I have already received a few updates. No sooner do I install v1.12 and a few weeks later v1.13 is released with 1.14 right behind it.  These SAP rapid releases are truly amazing.  I am also impressed with the many improvements that I am seeing. I know there is still some catch up work to do but in one year alone there has been a remarkable, huge leap forward in the Data Discovery space for SAP customers. Here are a few of my review notes from a BI Professional perspective.  Keep in mind that I do review all the Data Discovery offerings and try to be vendor neutral in my assessments.


Upon launching the latest and greatest SAP Lumira v1.13, I noticed the Welcome screen had been enhanced with links to sample content, videos and other resources.  I was pleased with the nicer user interface and delightful new branding.  The development path steps were visually displayed and linked to the related screens for a quick and easy jump start.



For my review, I wanted to test a Big Data set to see how SAP Lumira would perform.  I used the Hortonworks Hadoop Sandbox NYSE Stock data set and a Hive connection to load the data.  To connect and query Big Data with Hive, you need to install the drivers. Other options for connecting and querying Big Data are described on the SAP + Hortonworks partnership page.  This page contains excellent reference documents and a Modern Data Architecture diagram that showcases how Hadoop integrates with, complements and extends your existing data platform. 



Once I had the Big Data loaded, I used the Prepare features to add a Time hierarchy for time based intelligence and analysis.  To create a Time hierarchy, I selected the gear icon on the CalendarDate attribute and the selected the Time hierarchy option in the menu. Instantly SAP Lumira generated a drillable hierarchy including Year, Quarter, Month and Date.  The Prepare screen also had quite a few features for data cleansing including data type conversions, geospatial data types, filters, sorts, appends, merges, show or hide a field, calculated fields with a library of data formatting and logic functions. 



Now that my Big Data was cleansed and prepared, I was ready to get to the fun part…visually exploring the data to look for patterns and trends.   To begin visualizing a data set in SAP Lumira, you use the Visual features.   By simply dragging fields onto the user interface and choosing a visualization, I could immediate see some patterns immediately.  There is a plethora of visualization types available including but not limited to Column (Bar), Line, Pie, Area, Stacked, Dual Axis, Combination, Donut, Scatter, Bubble, Tree Maps, Heat Maps, Geospatial Maps, Radar, Box Plots, Word Clouds, Waterfall, Parallel Coordinate, Funnel Charts and Grids.   I was able to quickly view up to 10,000 data points on the screen.  This increase in data points is a great improvement over the prior 1,000 limit from earlier this year. Using the brushing features, I was able to select a subset of data to narrow my focus on and dig deeper into the details.




A wonderful feature that I stumbled upon in the review was Predictive Calculations.  By choosing the down arrow on a measure I was able to choose a Forecast or Linear Regression Predictive Calculation type to add to my visualization along with specifying how many periods forward I wanted to predict. 


Each time I created a visualization, I could optionally save it to my document collection for usage in a SAP Lumira story board. Available visualizations for story boards are displayed as thumbnails at the bottom of the SAP Lumira user interface. To create an SAP Lumira story board, a relatively new feature as of v 1.12, you use the Compose features. This functional area of SAP Lumira allows you to select a layout, drag views onto the story board layout sections, and add filters, text boxes and images. There is an option to immediate preview your work and easily switch between authoring and viewing during the development process.   



When you are happy with your story board, you can create a new one to add to the story or you can share your analytic creation. In SAP Lumira v1.13 there are a few different way to share story boards.  You can export the file for another SAP Lumira to import and explore.  You can also publish the dataset to SAP HANA, Explorer, Streamworks or SAP Lumira Cloud.  You can also email the visualization as a Portable Network Graphics (.png) image.   In my review, I chose to publish to the SAP Lumira Cloud. 


Since I had a larger data set, it did take a few minutes to complete the transfer of both the SAP Lumira views and dataset created in the SAP Lumira desktop to the SAP Lumira Cloud.  When it did finish uploading, I saw both my views file and dataset in the My Items list.  Immediately I wanted to see how my masterpiece looked in a web browser but could not find any way to see it up there.  I later learned that the SAP Lumira views built in the desktop version do not render in SAP Lumira Cloud right now.  I do hope that limitation changes in future releases.  So what does render in the in SAP Lumira Cloud?  Exploring a bit more I discovered that a different authoring and exploration view of my uploaded Hadoop data set was available.  This SAP Lumira Cloud authoring environment did not have as many bells and whistles as the SAP Lumira desktop version but it did render the views extremely fast. 



All in all, I found SAP Lumira v1.13 to be easy to use and a significant advancement over previous versions. The rapid release cycles are simply amazing.  Hadoop Big Data sets rendered quickly and completely without errors.  The richer data preparation, time intelligence, wide array of visualization types, predictive features, story boards and sharing features are all great strides forward for SAP in the highly competitive Data Discovery market.

I am not a crazy online shopper like some of my friends, but I found it very useful and interesting to read some of the customer product reviews posted by online retailers like Amazon.com.  One time on a bus heading back home after work, a lady sitting next to me was reading a book “To the Lighthouse” by Virginia Woolf which immediately aroused my curiosity.  To see if it could be my next vacation book, I went to Amazon.com to see what other people say about the book.  Amazon.com did a good job by collecting customer reviews on products, but, in terms of analyzing and presenting those reviews, it didn’t give me too much other than showing a simple “star” column chart with a couple of the most helpful favorable/critical ones followed by listing all of the reviews (see below figure). 




For products which come with hundreds or thousands of reviews, it is just too difficult for human beings to read through all the reviews to analyze them. Data discover tools like SAP Lumira can be very helpful for this kind of insight hunting.


From a business point of view, not only end consumers, but also product designers and product sellers are very interested in getting any deeper insights from customer product reviews.


Data Sources

Two data sources are used for this blog post (although I collected a bunch of customer product reviews from Amazon.com):


  • Product Reviews: To the Lighthouse: Amazon.com



  • Amazon.com: Customer Reviews: Happy Camper Two Person Tent With Carry Bag



Below figure shows their data volume size:


My Challenges

There are three challenges in my tour to use SAP Lumira to analyze Amazon customer product reviews.


  • Data extraction

My first challenge is to extract the data from Amazon.com web site into a format that Lumira could look.  After some online research, it didn’t seem very straightforward to find APIs to easily get the review data by the public.  To avoid spending too much time on this, I ended up writing a small VBA (Visual Basic for Applications) script inside MS Excel to automate Internet Explorer to fetch the web pages and parse the review data directly into Excel sheets which could be easily fed into Lumira.  Below figure shows the simple frontend GUI of the script:




  • Sentiment analysis

To my knowledge, at the time when this blog post was written, Lumira provides little sentiment analysis capabilities.  Again, to make it simple, I ended up writing another small VBA script to add a lexical level sentiment analysis algorithm which is based on the research paper “Language-independent Bayesian sentiment mining of Twitter” by Alex and Zoubin.  In order for Lumira to analyze the lexical sentiment, I created a second dataset by breaking down the review dataset into rows of words, followed by a merge (join back) with the review dataset.  This operation sometimes results in a large dataset which gives me the next challenge.


  • Large dataset


The Lumira Desktop I used is the free download version 1.13.0.  There is a limitation to use it with large dataset.  When performing sentiment analysis on “To the Lighthouse”, I bumped into a performance bottle net which I cannot overcome very easily.   To continue my insight hunting journey, I have to pick another data source – customer reviews on “Happy Camper Two Person Tent With Carry Bag” - which has a smaller dataset but just good to illustrate some of the potential insight that could be seen on "To the Lighthouse", any other reviews, or text contents.


Result Part I: Interesting Insights for "To the Lighthouse"


  • The majority of the reviews came from Paperback.  Wow, Ebook still doesn't catch up.


  • Paperback reviewers share the same pattern with Ebook reviewers, but Audio reviewers have a quite different pattern (less happy).


  • Top 100/500/1000 Reviewers all gave 5 star ratings.  Note: Badge is a symbol that Amazon.com gave to the reviewers who earn top reviewer badges by writing good quality reviews.


  • Most of the U.S. reviewers came from the four cornners of the country.  Do these states have larger population or do they have more time to read a novel?
  • Reviewers near occeans seem to be happier to the book than those far from water.  Interesting! Could this be related to the fact that the story of the novel is all about visits to a Scotland island?
  • West coast reviewers seem to be happier to the book than those on the east coast. Why?


  • There was a high peak around year 2000 that people follow up (vote) on the reviews.  Can anyone tell me what happened in that year?


  • Instead of reading all the reviews, I only want to read the top 5 or 10 reviews (but top 1 is too less):



Result Part II: Interesting Insights for "Happy Camper Two Person Tent With Carry Bag"

  • The most frequently used word in all reviews is "TENT" - not a surprise as we are looking at a tent.  Next to it are "SO, USE, SMALL, GOOD, ZIPPER, HAVE, IF, ONE, SET, VERY, EASY, CAMPING", etc..


  • The number of occurrence of "TENT" is three times more than that of the next word.


  • In below figure I rendered the most positive (happy) words with green color and the most negative (sad) words with red color.  It turns out the happy words are much more than the sad words in the reviews.
  • In the center there are a cluster of neutral words which are surrounded by some happy words.  Sad words are scattered towards to the edge.  This figure suggests on average the customers are more likely to have a positive feeling about the product.


  • Below figure indicates the same.  Sad words (sentiment < 0.5) are on the left side.  Happy words (sentiment > 0.5) are on the right side.  Neutral words (sentiment = 0.5) are in between.  The high peak in red is the word "TENT" which falls under the neutral area.


  • Filter only the most positive words:


  • Filter only the most negative words:


  • If anyone wants to summarize the reviews of the product, the words of the leftmost column are good candidates.  Then a summary could be something like "a very good and easy to use tent for camping".




SAP Lumira did a good job to help me to see, imagine, and show the review data I got from Amazon.com, although in some areas it could do better to reduce my challenges.  It is a good sign to see that extensibility and big dataset support features surface out on the roadmap of Lumira. So, it is probably a good time to write a blog post here in response to the data geek challenge, and as a stop on the trip before the busy pre-Christmas season came.  However, my journey to seek the insight with the help of SAP Lumira is far from its destination, especially when seeing there are a lot of potentials in areas such as sentiment analysis at higher levels.


I also attached the datasets used in this blog post (the review data I collected from Amazon.com plus the calculated sentiments) for those who are interested in following this blog post. To use them, remove the .txt file extension after unzip.

I’m not an expert on any sport, but I have recently become a casual fan of baseball.  With the excitement of the post-season this year, I decided to hunt down some baseball data to see if I could better understand professional baseball based on popular team-level MLB statistics.  I’m definitely not an expert in sports statistics, but I pulled the well-known Lahman baseball database and calculated some summary statistics to evaluate team performance.  I used SAP Predictive Analysis (with SAP Lumira visualization components) to visualize the data and perform some predictive analytics for the 2013 post season.


Metrics and Data

I pulled just a few metrics to summarize batting and fielding performance of each team.  The metrics I pulled were:

  • Put Outs and Errors per inning out for each of the main positions (1B, 2B, 3B, C, CF, RF, LF, SS)
  • HR, H, R, 2B, 3B, SO, SB, CS, SF, SH per At Bat

For all teams from 1981 – 2012 during the regular season only. 


Visualizing Changes Over Time

Not being familiar with the ins and outs (get it?) of baseball, I decided to look for trends over time—would these metrics be consistent, or have strategies changed over the years?

I can look at these trends over time by league (HR increased through the early 2000s and have since been decreasing):



And by metric, it looks like the frequency of 2B hits has been increasing, while 3B hits have been steadily decreasing.  Though interestingly, the number of runs per AB has been relatively steady since 1993.


Perhaps these decreasing trends in scoring are driven by an improvement in fielding or pitching? Steady decreases in errors per inning out for all infield positions and increases in strikeouts per at bat suggest this could be the case.  Or, as many of my baseball-fan coworkers have pointed out, it could have something to do with the rapid increase in the use of steroids during the 90s and then a decrease in use of steroids after the harsher steroid policy penalties were implemented in 2005 and the 2006 investigation by the MLB into steroid use.





Stolen bases have always fascinated me in baseball, so I wanted to look at the prevalence of stealing over time.  Interestingly, this is the one metric that showed significant differences between the leagues with NL teams stealing much more frequently than AL teams through the early 90s, though over time they have tracked much more closely and now show similar trends.



Visualizing Differences by Team

Still on stolen bases, which teams are most effective at stealing?  For the 2012 season, Milwaukee, Miami, and San Diego had the highest frequency of stealing, with Milwaukee, Miami, Oakland, Minnesota, and Kansas City stealing most effectively (fewest CS per SB).  Pittsburgh, Arizona, and Baltimore are the least effective at stealing bases, with Pittsburgh successfully stealing only 58.4% of the time. 


We can also visualize the base stealing geographically by state, with darker blue states stealing less than lighter blue states. Generally, base stealing seems to be less popular in the AL and the East.


Shifting over towards batting strategy, we can compare the frequency of sacrifice flies and hits by team. Sacrifice hits seem much more common in NL teams, and even sacrifice flies are relatively uncommon in the AL except for a few teams (Minnesota, Texas, Toronto, Tampa, the Yankees, and Boston).


Let’s look at fielding performance for infielders by team in 2012.  Interestingly, it appears that the NL teams have lower frequency of errors across all positions, but especially for 3B errors.


Outfield fielding performance seems to differ less between leagues, but the NL still seems to have slightly better fielding.  Maybe Kansas City should look at firing their center fielder (though it looks like their main center fielder was injured in April).


Predicting Post-Season Performance

Expanding into the SAP Predictive Analysis toolset, I built a modeling dataset that uses the regular season team-level (season summary) statistics discussed above to predict the outcome of a post-season matchup.  This model uses all post-season games from 1981-2012 to train the model, and I have scored it on every possible (and impossible) matchup for the 2013 post season.  (2013 regular season data was pulled from ESPN and a variety of other sources since it is not yet included in the Lahman database).


The post season information in the Lahman database is at the series-level, so I used this to create simulated records for each game in a series (ex. if there was a 4-3 series between Team 1 and Team 2, there are 4 records with Team 1 winning and 3 with Team 2 winning), so this model predicts the outcome of a series between teams. It also only takes into account position-level statistics, not statistics of any particular player, so if the team lineup changes significantly in the post season due to injury, it will likely not be as accurate.  In order to predict the likelihood of winning, I used my Custom R Logistic Regression algorithm, but similar analysis could also be done with a decision tree model.

The factors in the model that were most predictive in determining the outcome of a post-season matchup included:


  • Catcher error rate
  • Center Fielder error rate
  • Runs per At Bat
  • Hits per At Bat
  • Strike Outs per At Bat
  • Sacrifice Flies per At Bat
  • The difference in put outs at 2B between teams
  • The difference in the error rates between teams for 1B
  • The difference in the error rates between teams for 2B
  • The difference in the error rates between teams for SS


The model has an Area Under the Curve (AUC) of 0.722 (where values closer to 1 indicate better predictive performance), which indicates a relatively good predictive fit, but is not a highly predictive model.  Additionally, due to the relatively low volume of data, 100% of the sample was used for both fitting and validation.

Once the model has been developed, I’m able to simulate the results of every pairing of teams for every possible series in the 2013 post-season and once the matchups have been decided, determine the model’s predicted victor.  The chart below summarizes the predicted outcome for each matchup in the post season and the actual outcome of the series or game.


Over the course of the season, the model was over 50% accurate, predicting 5 of 9 series correctly, which isn’t bad for someone that knows nothing about baseball statistics or the results of the season so far.  Interestingly, though Boston finished the regular season in first place overall, the model consistently predicted them performing poorly in the post season. I’m excited to use the same model next year and see how it performs!


See other blog posts on predictive analytics and data visualization using SAP Predictive Analysis and SAP Lumira under the Predictive Analytics topic at sapbiblog.com.  Follow me on Twitter at @HillaryBlissDFT!

SAP Lumira has now gone social. Follow SAP Lumira on all the social media streams to stay up-to-date with all content.





Facebook: fb.com/SAPLumira

Twitter: @SAPLumira

YouTube: youtube.com/SAPLumira

Instagram: instagram.com/SAPLumira

Google+: @SAPLumira

Data Geek: www.sap.com/datageek

I am one of them who has much passion on Space specially all things above earth, Since Childhood space related stuffs are always occupy my night dreams. This was kind of visiting space (always without spacecrafts or space suits), meeting aliens ( very beautiful purple women ), big stars (with shape of star), shooting stars. Its kind of exploring the unknown, unlimited. Above all I don't have any chance on getting to know the technology behind these research. Even I could some time wish to become Space scientists. My limitation ends with watching Space related movies, get the latest news of newly invented planets.


Its SAP Lumira lights up my dreams in a very practical way, This is going to be my "dream" project.



Data Source


With the help of my contacts I have come to know about NASA open data challenge, there are bulk of NASA research data available for public to develop better product which helps all the BIG data challenges with NASA. I with the help of 2 friends ( non SAP ) we have been searching for the best and simple data source to fit for our analysis, my friends pushed me to choose climate data, but my passion on space not allowed me to think anything other than the space science data. but we are zero in space related terms. We were keep on thinking of data for 3 months without any action.  Because open NASA data are huge with all the formats. I am sure even very big data geeks also can't work on these data without the help of field experts who has interpret these data.


But its US shutdown helps much, during the shutdown time, I got replied from NASA scientists whom I mailed 3 months before, helped me to choose some correct track of data. They are even encouraged me to participate their NASA big data challenge , below are my data source





Data Insights


True Challenge is not with the huge data, its all about getting things, predictive, analytic results from the data. For getting result we must give some dedicated time for study up the data. Gone through the data sets and make sure familiar with the terms before start to analyse. I give a very short description of what my data about and how it used for analysis.


Kepler is a space instrument kind of telescope launched by NASA to discover Earth-like planets orbiting other stars. Its uses various kind of discovery methods to observe the other planets. The size, distance, velocity etc can be obtained. All the discovered particles from the device not considered as real planet, some time it ends with false positive or some time it was eclipse binaries. For more information on Kepler data please visit http://en.wikipedia.org/wiki/Kepler_spacecraft


We created our own excel sheet data set from the meta data from the above site using various features like merge, number conversion, date and time conversion.



Data Analysis


These are few way, I used to get some very meaningful analytic results, but using these sample methods, we can get 100+ visualisation results out of the data.



1.  Impact of distance on Density and Mass ( using Scatter Graph)


This is very interesting, Its clearly shown distance gives big impact on Mass.





2. Year of Invention


Its very true, After the Kepler launch, the number of invention of new planets are dramatically increased.


And also last couple years Kepler helped to discover more number of planets, if we look at the reason behind that, there might be lot, technical improvement on discovery methods used on Kepler and other way round we can try to relate with other constraints like gravity. This is what data scientists work for. Trying to get the various possibilities from the already available data.





3. Top 5 Photometric measurements


I used SAP Lumira's Raking functionality to get the top 5, out of 10000+ entries








4.  Number of False Positives


All the inventions of the Kepler not considered as Planets, some time it ends with False Positives its kind of other objects like eclipse binaries. Out of all the measurement with in less than a second I could get the list of confirmed planets.





5.  Highest Temperature


Using Tag cloud, we could see the highest Temperature Candidate 





6. Density ( Close look!!! )


This chart clearly gives very detail comparison of Candidate density with Earth . If we closely look our earth is only 3.72% only.





7. Mass and Radius variation on Earth and Jupiter using Story Board


This is four dimensional data, Hence I used combined line and column chart with Storyboard to bring very meaningful visual composition.





8. Discovery Methods and Proper Motion comparison


The below helps the Kepler team to optimize identification methods







9. Number of General, Light, Transit Curves


This is one of the very sensitive analysis and mostly used in identification of false positives





10. Mass and Radius on Earth Detail Scatter Matrix





Next Step


I thank everyone including NASA guys who helped me for this wonderful work. Without you all support, this was not possible. The above is just the beginning, I have very much looking on estimation of next new planet invention analysis using predictive, might be I share those information in future.



     This is my first blog on SCN. I have chosen various countries’ expenditure patterns on Research & Development as my dataset from ‘United States Census Bureau’. The dataset contains country-wise statistics on   ‘Total R & D expenditure as a % of GDPR & D expenditure shared by  government,   R & D expenditure shared by Industry, Per capita expenditure on R & D and Expenditure on R & D education’.

Using SAP Lumira, I've analysed the data-set in following aspects:

  1. Countries with highest R & D expenditures
  2. Industry’s contribution towards R & D
  3. Government’s contribution towards R & D
  4. Per capita expenditure on R & D and
  5. Expenditure  on R & D education



Let us begin our analysis,

1.  Countries with highest R & D expenditures


Here, we can see the top 5 countries in terms of maximum R & D expenditure (as a % of GDP) are Sweden, South Korea, Finland, Japan and Switzerland with 3.60, 3.47, 3.47, 3.44 and 2.90 % of GDP expenditures respectively.


2.  Industry’s contribution towards R & D



Here, we get an interesting picture, although Sweden is in top when it comes to R & D spending as an whole, but when it comes to the industry share in R & D spending, Luxembourg is on top with 79.72 % contributed by industry followed by Japan with 77.71%, South Korea with 73.65%, China with 70.37% and Switzerland with 69.73%.


3. Government's contribution towards R & D


Yet here, we get to explore another hidden fact with India standing on top with highest(80.81%) government contribution towards R & D followed by Russia with 62.62%, Poland with 58.61%, Brazil with 57.88% and Slovakia with 53.92%.


4. Per capita expenditure on R & D


Yet another interesting analysis here. Again it’s the turn of Sweden standing high with highest per capita expenditure on R & D (1,320$ in 2011) followed by Luxembourg (1,300$), United States (1,221$), Finland (1,206$) and Japan (1,157$).


5.  Expenditure on R & D education


Another hidden fact now. Sweden again stands first with highest expenditure (0.77 % of GDP) on R & D education followed by 2 new entries in our analysis so far i.e., Denmark (0.70% of GDP), Iceland (0.69 % of GDP) and as usual Switzerland (0.66 % of GDP)  and Finland (0.65% of GDP).



This analysis helps us to deduce the mindset of people and the cultural background of different countries. For example, in countries like Russia, Brazil and India (where I belong) people like to have a secured profession and they give much importance to family and relationships. They usually do not prefer taking new initiatives and innovations themselves unless sponsored, for the risk of failure because most of the people here fall in middle class. Usually, they risk less to go in an unconventional path as far as their career is concerned. On the other hand, in countries like Sweden, Denmark, Switzerland, USA etc people love to take initiatives, challenges and they tend to do something innovative. Moreover, as most of the people here are above middle class, sustenance is not a concern. That is one of the reasons why we find that these are the countries where most of the innovations come from and many Fortune 500 companies have their roots here. But, in countries like India, majority of the large enterprises are either state run or foreign collaborated and most of the research in these countries pertain to either space or defense which are again government sponsored. This is one of the main reasons why we see maximum role of government in research of these countries contrary to the techie countries. However, trends are rapidly changing. Even in countries like India which is considered to be a land of culture, people are breaking their age old conventions and taking new initiatives, and therefore industry's contribution towards research is significantly improving on par with the governments'.

SAP's Matt Lloyd gave this EA102 session from Las Vegas TechEd, covering desktop and Cloud,the future and how bring Lumira to on-premise.  You can watch the recording of this session here.


The usual disclaimer applies that things are subject to change.

He said the Lumira Cloud version is touch-enabled as it works on your iPad


Overview of Self Service BI


He reviewed the single version of the truth; IT created universes and everyone asked questions on the same dataset.  Data has been growing at an exponential rate and IT cannot keep up the demand.


Self-service would allow you to mash up the date sources.  Everyone at all levels of operation has questions.


Figure 1: Source: SAP


The business wants an engaging application, to get data in real time in a repeatable fashion.


If you have a quick question can you share it through your devices on the go.


IT wants to enable self-service so they don’t need to create the queries but leverage the existing data sources


Today there is Lumira and BusinessObjects Explorer.  Someone would create the InfoSpace so you can go and ask questions.


With Lumira you can publish to Explorer and create an Infospace and not ask someone to do that.


Lumira Desktop


Lumira is SAP’s self service tool.  You can take the data in and publish to HANA as a table without HANA modeling.



Figure 2: Source: SAP


Figure 2 shows the Lumira Desktop for the PC.  There are two versions – 32 bit and 64 bit


For questions about the Mac version he suggested seeing the EA 187 session.


Sybase IQ is the columnar database stored locally in the desktop.


With the connected mode you can look at HANA data or download where you can use the enrichment.


The new desktop tool is based on HTML5 with sample data built in


Currently you cannot share the Storyboard to the cloud but they are working on that; you can share the dataset


Lumira Cloud


This was released at SAPPHIRE in May


Figure 3: Source: SAP


This is for the Business Users; Lumira Desktop is for the business analyst


Once data is in the cloud they can collaborate and it does not require IT as it is hosted by SAP


Anyone can sign up at cloud.saplumira.com for a free account


Figure 4: Source: SAP


It is similar to desktop as you can pull in from Excel CSV; it is touch friendly


Figure 5: Source: SAP


The Lumira Cloud supports geo maps but the difference with the desktop version is in the desktop version you can enrich the hierarchies; this is not available in the cloud yet. You need latitude and longitude data.


You can do reverse geocoding using Google (I had not considered that).


Figure 6: Source: SAP


Figure 6 covers the free option on the left and the Enterprise version – “share with named users” for team work. There is a 5 user minimum for the Enterprise license.


Figure 7: Source: SAP


This is a hosted service by SAP – does not require IT


It is built on HANA but it is hidden from the users


If you have an SCN account and you can use that to sign up and with a token it supports SSO


It is built on HTML5 technology for cross platform support


This is part of the HANA Cloud platform.




Figure 8: Source: SAP


Future is subject to change. In the future you can publish on Lumira for HANA, an application on HANA, using your HANA roles, security and views.


Figure 9: Source: SAP


They are working on Lumira for BI4 – integrating Lumira into the BI Platform.


Figure 10: Source: SAP


Figure 10 is a recap of the session comparing today and the future.  He said there will be more to share at SAPPHIRE next year.


Figure 11: Source: SAP


Figure 11 is the summary. Future they are working for Lumira for HANA on premise and integration with the BI4 platform.


He suggested going to saplumira.com, a new site announced at SAP TechEd.


Question & Answer

Q: How does it handle security?

A: If pull in Excel or CSV – no security

If connect to database, BIP/HANA – you are providing credentials


Q: How large can it be?

A: Depends on memory on machine; recommend 5M cells (not rows/columns) – recommend to leave everything in HANA

You get a warning at 30 million cells


Q: Does it work with BEx Queries?

A: It currently does not work today; but it is part of the roadmap


Q: Does it support .UNV?

A: It supports both UNV and UNX


For BW it only supports UNX


Q: Is there a way to see what transformations?

A: Do keep track of and someday they plan to share it; make want it available in future


Q: How do you publish a report with UNV?

A: You are publishing a copy of that data


Related Links:

Lumira Site

Lumira Cloud

Reporting 2013

Future of Integrating SAP Lumira into Your Existing BI 4 Landscape SAP TechEd Online Notes Part 2

#SAPTechEd: Can You Trust End-Users With Self-Service BI? by Timo Elliott

Big Data Geek - Is it getting warmer in Virginia - NOAA Hourly Climate Data - Part 2 by John Appleby

This is my second blog post for the Data Geek Challenge. I found various data sets describing demographics and some other aspects of Serbia. Some data sets are from Census 2011 and other are from different statistical research projects. Altogether, they give one interesting statistical picture of my country.


On the first graphic we can see population structure by age groups and gender. We may notice that the most numerous groups are people from 50 to 65 years of age. In Serbia, one of the national issues is low natality rate, so we can say that the population is aging.

Age Males Females.jpg


Serbia is multicultural country. On the second graphic is population structure by nationality or ethnicity.




Next we look at basic demographic indicators for Serbia. These are:


Basic demographic indicators of population



The number of live births per 1000 inhabitants



The number of deaths per 1000 inhabitants



Natural increase per 1000 inhabitants



The number of marriages per 1000 inhabitants



The number of divorces per 1000 inhabitants



The number of infant deaths per 1000 live births



The number of divorces per 1000 marriages



Total fertility rate



The number of live births outside marriage (per 1000 live births)



The average age of mother at birth of all children



The average age of mother at birth of the first child



The average age of the deceased person - male



The average age of the deceased person - female



The average age of groom at marriage



The average age of bride at marriage



The average age of groom at the conclusion of first marriage



The average age of bride at the conclusion of first marriage



Life expectancy of men



Life expectancy of women




Visually it looks like this:

Basic indicators.jpg


Number of Serbs living abroad is very high. Because of earlier wars, sanctions, and the poor economic situation, many Serbs left to live in foreign countries. On the next graphic we can see Serbian diaspora.




Unemployment rate is very high in Serbia and that is one more of burning national issues.




Average wages are even now very low compared to European standards but that is nothing comparing how it was in the 90’s. I can remember that we could buy just a few products in market for whole month salary. It was the time of hyperinflation. Today, economic situation of the country is stable and on the rise.




Serbia is very popular touristic destination. There are many national parks, spas, mountains, forests as well as many urban destinations like numerous festivals, most notably EXIT festival in Novi Sad, Guča trumpet festival and Belgrade Beer Fest.




Belgrade is the capital of Serbia and the largest Serbian city with 1.65 million people live within the administrative limits. I live in the second largest and by me the most beautiful city in Serbia, Novi Sad.




At the end, I wish you to come to Serbia and meet my people and culture. You will have a good time, guaranteed!



Thanks for reading and have a nice day!

If you already have SAP Lumira installed on your PC, you can update it at not additional cost by following the instructions below:


Update Instructions if you have SAP Lumira SP11* or earlier:

  1. Download the SAP Lumira install EXE file (www.sap.com/downloadlumira) and run the EXE file
  2. On Step 1 - Confirm Update, click Next to begin the installation

    Confirm update.png
  3. On Step 3 - Finish Installation, click Finish to exit the installation. You're done!

    Finish Installation.png



Update Instructions if you have SAP Lumira SP12 or later:

  1. Start your SAP Lumira application
  2. Under the menu Help, select Check for new updates
  3. If an update is available, a popup box will appear (show below). Click OK to install the available update


Install available.png



Autoupdate Difficulties

If your experience difficulties with the autoupdate (the autoupdate popup box says No new update available AND you know there's one available), I recommend unchecking the Configure Proxy check box under Network (select  File \ Preferences \ Lumira Preferences). That did the trick for me.


Lumira Preferences.png



*: Don't know what version of SAP Lumira you have?

Open your SAP Lumira application; under Help, select About SAP Lumira;

At the bottom of the screen in the left corner you will see the version number.

That's the SAP Lumira version installed on your PC.  In the example below, the version is 1.12 or SP12

SAP Lumira version.png


Problems Updating?

If you experience problems and you would like one of our SAP Lumira

experts to guide you through the update process, please contact them by

phone (+1 855 558-6472), email (support.lumira@sap.com), or Twitter (@saplumiraexpert). They are there to help you.



You don't have SAP Lumira, personal edition installed?
It's free! Just go to www.saplumira.com, download the application, and start using it against your Excel data.
You will be able to update it when a new release becomes available.

Sukumar M

Analysis On Lumira

Posted by Sukumar M Nov 6, 2013

Sales data analysis simplified with SAP Lumira


I built a sample dataset for an agro chemical company which had Sales data. Using Lumira I could bring out in pictures the highlights of the Net Sales and the Sales Quantity across various aspects. Below I have provided few of the reports I built using Lumira within few minutes.


Case Study – 1


Business Users were interested to know the following facts out of Sales Data

·         Net Sales By Region

·         Gross Sales By Buying Group

·         Gross Sales By Product

                                    Case Study 1.png

Case Study – 2


     The story board I created as below represents various parameters of the gross Sales by Sales Representative. Also shows the top sales performers.

Business was interested to check on the Salesman receiving the commissions to recognize the best.


This is the Tag Cloud Visual which shows the commission paid to the Salesman. The report can be drilled down to answer following questions:

·         How are salespeople ranked by quarterly/yearly performance?

·         Which 5 salespeople should receive an award for annual sales?


Sales by the Representatives

Case Study 2.1.png

Case Study 2.2.png


Case Study – 3


     The story board created as below represents Budget quantity and Actual quantity sold by the Sales team. This shows the improvement of the sales. Also shows the top trades in 2011.

Business was interested to check on the Actual sales and forecasted sales for the year 2011.

SAP Lumira tool helps in sales forecasts to monitor and act on individual opportunities, more accurately forecast current and future period revenues and understand the drivers that distinguish won vs. lost deals. Users can use graphical dashboards to quickly access actual sales performance vs. Targets and sales management forecasts. Marketing users can analyze and lead progression through each stage of the sales cycle to quantify the effectiveness and revenue impact of marketing efforts.


                                   Case Study 3.png



In different perspectives, analysis of sales using various visualizations for the business are as below:


Report 1


To understand the Gross Sales and Net Sales, the Sales data analyzed on the different fiscal year/period. Looking at the below radar chart, I could conclude that the last 2 quarters of every year had better sales when compared to the quarter 1 and 2.


Gross Sales and Net Sales by Fiscal Period

Report 1.png

Report 2


Below 3D Chart represents the gross and net sales by year. Very intuitive placeholders are available for users to place the metrics and parameters. Here again the similar analysis but in a vertical bar charts.


Gross and Net Sales by Year

   Report 2.png


Report 3


This Map represents the Sales quantity By US regions. It is very clear from the below map, that the green areas are best performing regions in-terms of the sales quantity.


Regional Sales Quantity

Report 3.png


Report 4

A story board was created in order to bring out the Sales analysis at higher level from where the users can drill into the reports for the specifics.

This story board shows me various aspects as below:

·         Gross and Net sales for US regions

·         Gross sales and Invoice Price by US regions

·         Gross sales and  Net Sales by year


Complete Picture of the Gross and Net Sales for the Sales Analysis

Report 4.png



Report 5


This report gives the picture of net sales on Region level. Based on the interest, we can do a drill down analysis of the Georgia State using the drop down lists available on top of the screen.


Net Sales Analysis at State Regional Level


Report 6.png


Report 6


This report gives the picture of net sales on Region level. Based on the interest of region, we have done analysis of the Georgia State using the drop down lists available on the top of the screen.

Net Sales Analysis at City Level

Report 7.png


Report 7


This below combination chart shows, that the family of products called Plant Growth Regulators had high margin of sale. We can see that the vertical bars and the line at chart is having lot of deviation, with which i could identify the fact pictorially.


Report 8.png    



Report 8

This is the waterfall report which gives the picture of rebate for buying groups. The buying group with highest rebate is Astra, which makes my focus turn to the Astra buying Group.


                                    Report 9.png

To conclude with, an exhaustive analysis as done above, gives the users a self-driven analysis tool to visualize in their way and take decisions quickly. One can easily analyze the data and present the finding in various forms based on the audience.


I think a tool like this will give a lot of scope to the executive business users to do analysis without interference of an IT department.

Hope the above post helps to know what a self-service BI tool like Lumira can do.

So I loved the idea of the Data Geek challenge, but I'm more of Big Data Geek. And it turns out that the National Oceanic and Atmospheric Administration (NOAA) have publicly available hourly climate data, since 1901.


It's available at ftp://ftp.ncdc.noaa.gov/pub/data/noaa and you can download all of it freely. I've been kind to my IT folks in my office network so I've been slowly downloading it over the last few weeks, and I'm up to 2003 and I've got 39GB of compressed data so far. It is GZIP compressed and expands about 10:1... and in addition, the data volumes grow massively since 1973 and keep growing to the present day. I expect 600GB-1TB of raw data before we're done.


So we're going to need SAP HANA and Lumira to find some interesting information, and I thought I'd take you on the journey of how to do this. This is Part 1 of however many it takes me to find some meaning in the data.


Update: Part 2 is now available here! In part 2 we look at Tammy Powlas' question "is it getting warmer in Virginia?"


Downloading the data


I've gone about this using the following command:


wget --mirror ftp://ftp.ncdc.noaa.gov/pub/data/noaa


There are other ways you could use to get the data much faster, including aria2, pcurl or httrack but I wanted to be kind on my IT team and this doesn't use too much bandwidth or mess up our office network. It will take me a few weeks to get all the data and then I can keep it up to date any time!


Loading Data


The data comes in a pretty incomprehensible format and the lines look a bit like this:


0081999999999992002010100004+17900-075900FM-13+9999ELML V02099999999999999999N9999999N1+02791+02791101361REMSYN072BBXX  ELML7 0100/ 99179 70759 4//// ///// 10279 20279 40136 8//// 222//;


You can download a PDF document of how this is all formatted here:




It turns out that it is a complex fixed format file, so I'm just going to use the mandatory fields, which are the first 34 fields. This gives us data like location, timestamp, temperature, wind, visibility. Pretty comprehensive. I'm a big fan of UNIX scripts to reformat this stuff so I wrote an awk script to reformat the fixed format files into CSV.


awk 'BEGIN { FIELDWIDTHS = "4 6 5 4 2 2 2 2 1 6 7 5 5 5 4 3 1 1 4 1 5 1 1 1 6 1 1 1 5 1 5 1 5 1" } {








































Now I can put all 12653 files that constitute 2002's data into one big CSV file!


for a in `find 2002/*2002`; do ./test.sh < $a >> 2002.csv; done


Now I have a pretty human readable format - we can make some sense of this!


0173,010010,99999,2002-01-01 00:00:00,4,+70930,-008660,FM-12,+0009,ENJA ,V020,320,1,N,0100,1,22000,1,9,N,070000,1,N,1,-0039,1,-0087,1,10032


Whilst that script ran (it takes a few minutes per year), I'm went ahead and created the HANA DDL for this table, as well as for the reference data that exist within the PDF, for things like Quality, Source Flags, Observation Types, Wind Types, Sky Ceiling Methods and Visibility. This will make thinks like FM-12 human readable like "SYNOP Report of surface observation form a fixed land station". I've attached the DDL as a SQL script so you can run it on your own system.


A few minutes later, we have 63,653,095 readings for 2002 to pull into SAP HANA. We dispatch that in 2 minutes flat - got to love HANA's bulk loader performance. Now all that remains is to build an Analytic View to combine our fact table and reference data into a readable model for Lumira. The ISH reference data also contains location information for the sensors, though it is very rough, by Country or US State.



And now we can have some fun with Lumira! First, we can connect up to our Analytic View in HANA and see our attributes and measures. I've created a bunch of calculated measures so we can do averages within Lumira and push the calculations back into HANA. This makes it super fast.


But first, let's start with Geographic Enrichment. We can add Country, and State:


Screen Shot 2013-11-03 at 6.29.45 PM.png


Very quickly, we can do a Chloropleth Chart showing Average Wind Speed by Country in 2002


Screen Shot 2013-11-03 at 6.45.14 PM.png

Let's switch to Air Temperature:


Screen Shot 2013-11-03 at 6.53.15 PM.png

And drill into the United States:


Screen Shot 2013-11-03 at 6.55.00 PM.png

And now we're going to filter by the month of August:


Screen Shot 2013-11-03 at 7.07.29 PM.png

Next Steps


I need to go ahead and load the rest of the data from 1901, which will take a while. Then, we can go ahead and do some time-based analysis.


What are the questions that you'd like to be able to answer?

Part 1 is here Integrating SAP Lumira into Your Existing BI 4 Landscape – SAP TechEd Online Notes Part 1 and discusses current state of Lumira including Cloud.


This is Part 2 covering the future.  Note that things are subject to change. This is a recorded online EA201 session Integrating SAP Lumira into Your Existing BI 4 Landscape


Future: On Premise


Figure 1: Source: SAP


“SAP Lumira for SAP HANA” is in quotes as it is future functionality.  It is not generally available.


Looking at native implementation on HANA appliance – use HANA accounts and security mechanisms


Figure 2: Source: SAP


You have today the cloud version on the left in Figure 2. 


Put in cloud, mobile device and on premise with multiple deployment options


Lumira for HANA needs a HANA system with scalability and volumes



Figure 3: Source: SAP


BI4 is not going away but for some things you need HANA


Focus on BI4 will be more on quality and innovation, less disruption


Figure 4: Source: SAP


Figure 4 shows the future…two launchpads?


SAP knows this.  Keep reading...


Figure 5: Source: SAP


Figure 5 is what SAP is working on.  SAP wants to have HANA technologies and take advantage without dealing with HANA, having it show in BI Launchpad.


Administrators do not want 2 things to administer.


They want to bring in the user experience of Lumira into the BI4 platform but does not mean bring HANA into the BI Platform.


They want to see your Lumira artifacts next to the Crystal artifacts.


They are working on granular rights.


You would publish to BI Platform and not HANA



Figure 6: Source: SAP


Vision to have Lumira in the BI Launchpad – an integration with new technologies – not a replacement


Figure 7: Source: SAP


SP13 is now available


Lumira connects to universes today


Working to bring user and administrative experience in and use Lumira now


Question & Answer

Q: Doesn’t look like that much of enterprise solution as it is disconnected from the data source

A: This is why the Scheduling Agent is there and the server is coming in the future


Q: Why not publish in Launchpad?

A:  Working on


Q: Can use an Excel spreadsheet from Analysis Office as a source for Lumira?

A: He has not tried this


Q: Can you pull up changes from HANA Views?

A: Yes – big deal about publishing back to HANA view – the big deal is the credentials


Q: Is this an offline tool that allows refreshes?

A: It is a desktop – “Explorer on the Desktop”


Q: How is the performance in writing back to the desktop?

A: This is a tough question to answer as part of the limiting factor is the desktop which has memory and depends on nature of the data; once it gets to 30 million cells – not recommended


Q: Can you deploy on a Citrix Server?

A: There is an SAP note for Lumira Desktop on Citrix

Browser side should be good


Q: Is there a scripted installer?

A: It is on the roadmap but not available today.


Q: Seems at odds with BI Platform?

A: Lumira uses universes, BI authentication, publish to Explorer


On Premise

Q: Is there a plan to have a schedule on the server to refresh data?

A: Philosophical question; if HANA is your data store – the data is already there

This is a recorded online EA201 session Integrating SAP Lumira into Your Existing BI 4 Landscape


The usual disclaimer applies that things for the future are subject to change.


Ashish Morzaria covered Lumira, which is self-service, how to enable business user and the analyst. SAP started with BusinessObjects Explorer


Lumira Cloud is hosted by SAP and is a SaaS offering. Lumira acquires data, connects to sources, take it to the analyst, and the analyst transforms data


Figure 1: Source: SAP


Using the Java Connector, Lumira can connect to SAP ECC and any JDBC source.  It can connect to HANA in an online (data does not go down to desktop) and offline (bring data to desktop – use it locally).


Figure 1 shows it can connect to BI universes.


Does Lumira connect to BW directly?  Today it does not but in SP12 it connects to the BW universe.



Figure 2: Source: SAP


Figure 2 shows how to share the information.


SVID – SAP Visualization Document format – will contain dataset and visualization can be exported


You can export datasets to SAP Lumira Cloud


You can upload dataset in Cloud and the SVID file to share


Soon will be able to publish visualizations to the cloud


You can create HANA views without HANA studio


In BI4 you can create Information Spaces in the Explorer; this mobilizes the information as well.


The visualizations in Lumira are different than Explorer.


How refresh that data? In Lumira you have a Scheduling Agent that allows you refresh and replubish the data – as Information Space, HANA View or to Lumira Cloud


It leverages BOE credentials



Figure 3: Source: SAP


On the right of Figure 3 you can see it “flattens the stack”.


It is important to take advantage of platform – HANA is a “fast execution engine”.



Figure 4: Source: SAP


Run applications on HANA such as XS and have more native capabilities


Figure 5: Source: SAP


Pushed application logic into HANA and processing as close to the data as possible with fewer layers to go through


Why discuss it with BI4 integration?  It allows you to publish.  You can go cloud.saplumira.com and get your free account (no trial no expiration)

It is a way to share content and you don’t need IT


They are committed to HTML5 includes nice visualizations and fast execution and a “touch first interface”.


Figure 6: Source: SAP


Figure 6 shows a highly simplified version of the Lumira Cloud architecture


At the base you have the HANA cloud platform, HANA XS, and above this is Lumira and it is on a multi-tenant cloud that SAP runs


It has straightforward access to HANA.


HANA is moving from fast database to application engine with an HTML5 front end you are hitting the cloud directly


It uses HANA services directly – take advantage


It is hosted by SAP and uses SAP ID (SCN ID) then you have single sign-on


The account does not expire and free GB of storage of HANA.


To be continued...see Future of Integrating SAP Lumira into Your Existing BI 4 Landscape SAP TechEd Online Notes Part 2

The Reporting 2013 Conference is coming later this month in Orlando, so I thought I would use Text Analysis from Data Services to see what is being discussed at the conference - themes, products, organizations.  I use the output from Text Analysis and take that to Lumira.



First I took the brochure and copied the text to Notepad so it was all raw text.  Please note I did NOT copy the entire brochure but just a few pages.  This is how it looks below:



Clearly the above shows it is unstructured text.  How to analyze?


Then I went to Data Services and used the Base Entity extract transform and read the file in:



In the above it is taking the text file in, using the Base Entity extract transform to determine what to read, and then it will output it to a Text file.



In Data Services I am using entity types NOUN_GROUP, ORGANIZATION, and PRODUCT




I take the file to Lumira and read it in.





I decide to look at a word cloud with product mentions; predictably HANA has top mention and then Web Intelligence and Predictive Analysis.



I look at the word cloud with organization mentions and SAP is top, with McKesson.  I am not sure why products are also included as organizations but I am not an expert in Data Services.


The above shows top noun groups mentioned and this to me shows some key themes from just a few pages - Mobile and business users.  You can also see real time, use cases, drive user adoption, all key themes this year in the analytics space.


Will you be attending?

Hello Everyone,


My post for the Data Geek Challenge is about music. I’ve picked a list of some of the best rock’n’roll history album with their sales numbers in millions.  So here is the data I used for this blog:



I have taken this information from this website, then I imported the spreadsheet data to Lumira, and this is how it looked:



Then I created some charts, like this pie bellow. Here you can see that The Beatles, Pink Floyd and Led Zeppelin have more than 50% of the sales share among the selected albums.




The tree map illustrates the same, but is ordered in a different way and the color scheme is by sales number.



The tag cloud is also interesting, the sales data is grouped by band here, so there you can see the different colors the bands were classified, in this case the same way as in the tree map.



The bar chart has the information split into albums, so you can see that some bands have more than one.



Just to complete, I create this chart board.


Hope you like it.





Filter Blog

By author:
By date:
By tag: