Last week, the Obama administration launched the much anticipated data.gov website. The site allows citizens to download dozens of raw data files from various federal agencies. Over the coming weeks and months tens of thousands of new data sets will be added to the site, to create a comprehensive platform for data discovery, participation and engagement. The idea is to encourage programmers and others to make new applications, mashups and visualizations based on this data. However, in order to interrogate the data and uncover meaningful trends, business intelligence and analytical tools are necessary.
Data analysis: The public knows best
The obvious question is why these datasets are being made available without context or visualisation. Making millions of lines of raw data available on a web site is not necessarily transparency. Effective transparency requires data to be provided in ways that can be understood and acted upon. The government, however, is relying on the public to contextualise the data and narrate their own stories based on this.
Clay Johnson from the Sunlight Foundation makes some good points about why visualisations of the data should be left to the public.
There are various online solutions available to analyse and visualize raw datasets. These include Swivel, Google Spreadsheet, IBM's manyeyes and SAP's BusinessObjects Explorer onDemand.
With the development of data.gov I decided to try out some of these tools to see how BusinessObjects Explorer onDemand compared. Unfortunately, many of the csv files on Data.gov are greater than the 5MB BusinessObjects Explorer file limit, and so I used a dataset from the District of Columbia's data catalog. This data catalog was used as the inspiration for Data.gov as the US Federal CIO Vivek Kundra was previously the CTO for the District of Columbia.
The dataset I used related to Purchase card transactions for calendar year 2009. This is a relatively common set of data that many organisations analyse on a frequent basis. I uploaded the data file (.csv) and used Explorer to create the analysis shown below.
Extra functionality for BusinessObjects Explorer
The onDemand application of BusinessObjects Explorer gives a good indication of the product functionality. While I appreciate BusinessObjects Explorer is not touted as a consumer based Web2.0 application, it could be improved to include extra functionality exemplified in other applications such as Swivel and Google Spreadsheet. Given that more and more raw data is going to be released by governments around the world it would be useful for SAP to develop a more consumer centric onDemand BI application. This could compete with servces such as Swivel and Google.
I uploaded the same data file to Google Spreadsheet and Swivel in order to compare some of their features. One of the most useful applications offered by Google Spreadsheets for data analysis is the Motion Chart gaget. It allows you to see trends over time, and is a powerful means of quickly analysing large volumes of data (see the motion chart for this data at Google).
From an comparison of Swivel and Google features, I have listed some improvements to BusinessObjects Explorer below, which would make it a more useful consumer BI tool:
BI OnDemand strategy
One of the questions is whether Explorer onDemand is seeking to be a consumer Web 2.0 analysis tool, or whether it is seen mearly as a demo product for the corporate version. If the objective is the latter, then the recommendations above are probably not realistic. However, a more interesting business model might be to develop these features, and offer Explorer to the public as a feature rich Web 2.0 Business Intelligence application. This would increase its profile, and could entice businesses to investigate the corporate version if they are satisfied with the consumer product. The freemium business model works in this way, and perhaps SAP should utilise this as part of their overall BI onDemand strategy.
The demand for extensive cloud based analytical solutions is increasing, as more and more governments and organisations crowdsource data analytics. If SAP sees itself as a major player in this market, it needs to create a strong consumer product that demonstrates the BusinessObjects capabilities. Explorer onDemand already does this, but its feature set and usability needs to improve to compete with other comparable Web 2.0 products.