It is time to join the wolf pack, the Data Geek Squad. Which data do I want to explore? Which question do I want to answer? I spend quite some time to browse through international databases for statistics. My first idea was 'How many guitar players do we have in the world and which brand do they play?'. I found a lot of aggregated information, but not sufficient RAW data for exploration. A very good source is the web site Data Surfing on the World Wide Web. Some more clicks and I found a package with meteorological wind speed data from several locations in Ireland. I never managed to visit Ireland, but it is still on my to-do list. Here is my chance to learn something: Is it windy in Ireland?
The package includes two files, wind.txt and wind.data. The text in the first file explains: This is the data for daily average wind speeds for 1961-1978 at 12 synoptic meteorological stations in the Republic of Ireland. Each line corresponds to one day of data in the following format: year, month, day, average wind speed at each of the stations in the order given RPT, VAL, ROS, KIL, SHA, BIR, DUB, CLA, MUL, CLO, BEL, MAL. The data are in knots, not in m/s.
Ok, so let’s have a look into the wind.data file. See how it looks like in Notepad:
Very nice! Who needs all these preformatted data sets. I love RAW data files. Nerdy? Geeky? Yes!
The first three columns are for time information, but what’s about RPT, VAL, ROS, KIL, SHA, BIR, DUB, CLA, MUL, CLO, BEL and MAL? I need to translate these city codes into city names, if I want to use the geographical mapping in SAP Lumira. Fortunately one academic paper is using the same data set and published a nice map with the location of the meteorological stations. With this information I created the following translation table:
|City Code||City Name|
I have the daily data from 12 locations in the time frame from 1961 until 1978. This gives me 78887 data points (due to leap years). Ok so let’s upload the data set into SAP Lumira.
I could start to use the ‘search and replace’ functionality to replace the city codes by city names in the data field, but we are not in Word, so there must be another way to do it the data-geek-way. Here it is: I create a CITY data file with the translation table above. This gives me also the possibility to add a country column. Now I can use the ADD and MERGE feature.
I can go on with the next step. I ‘Create a geographic hierarchy by Names’ and … it does not work. To be more precise: It finds Dublin, but not the other 11 locations. What happened? Have I made a mistake? Yes and no. All steps in SAP Lumira have been correct, but I didn’t check the ‘data quality’ for my cities upfront. One example: Marlin Head is neither a city nor a village, it is just the most northerly point of the island of Ireland. All you will find there is a lighthouse and a meteorological station. Therefore the city mapping by names can’t work.
Fortunately SAP Lumira offers a second method: ‘Create a geographic hierarchy by Latitude / Longitude’. Where do I get the Latitude / Longitude information for my 12 locations? No problem, it is all in Wikipedia, or you can use Google Maps, as per your preference. I add two more columns to my CITY data file and restart the merge with the WIND file.
That looks really good. The data set is merged and I can create the geographic hierarchy. A ‘Geographic Enrichment’ window pops-up and I have to select the right columns for Latitude / Longitude data. There is also a very good description in the help function for ‘Creating a geography hierarchy with latitude and longitude data’ available, if you need special settings. I create a measure for average wind and we can test it using the Geographic Bubble Chart visualization.
Super, that’s what I wanted to see. Oh, I forgot to create a time hierarchy, but that’s done with a few mouse clicks. When the data preparation and acquisition is done it is a joy to play with the different charts.
- Which is the windiest month?
- Which is the windiest location?
- Is it the same for average and maximum wind speed?
- How changed the average wind speed during the 18 years of data, if at all?
I created several charts to answer these and other questions.
I have my answer: It is windy in Ireland. This has been a lot of fun and I learned a lot about the geographic hierarchy feature. There is one final step for the Data Geek challenge. I need to complete this submission form to enter the Data Geek Squad. Can’t wait to get my limited edition Data Geek 2.0 Rise of Dark Data T-shirt
Back in the office I will move on with my most current business data use cases, e.g. to use SAP Lumira connected with HANA database to analyze the Sales Pipeline funnel, or ERP performance data. Join me at the SAP Runs SAP booth at SAP TechEd Las Vegas and Amsterdam and we can share the
latest SAP Lumira tips and tricks.
latest SAP Lumira tips and tricks.
Matthias Wild - proud to be part of SAP Global IT where SAP runs SAP.