1 2 3 4 Previous Next

Irfan Khan's Blog

58 Posts

At a recent ‘Big Data for Industries’ roundtable at our UK headquarters, my colleagues and I discussed how different sectors are challenging convention to capitalise on the opportunities offered by data analytics. Many interesting points were debated, so I wanted to take the opportunity to share some of the key take-aways from our discussion through this blog post.


In today’s data-driven economy, businesses across industries are harnessing big data analytics to cope with the economic pressures they face, innovate through new services, and to gain a competitive edge. Financial services, scientific research and retail are prime examples of sectors where data analytics is propelling fundamental changes in how things are done and generating long-term positive outcomes for organisations and the society as a whole.


In financial services, big data is freeing up businesses from the constraints of years of data hoarding by simplifying analysis and enabling more meaningful insights. This enables organisations to sharpen customer service, liquidity risk management, business predictions, claim management, fraud detection and compliance.


For example, a leading health insurance company uses SAP HANA to analyse data for more than six million hospital cases each year to identify potential health risks to its clients, improve patient care, and offer preventative measures that extend life.


One of our banking customers uses SAP HANA for profitability analysis across 31 markets in Europe. Before, the company ran its analysis on a monthly basis, covering one market and one product only. This took hours. Thanks to SAP HANA’s in-memory capabilities, the company is now able to analyse all products and 31 markets simultaneously, drill deep into individual customers, and to run the analysis on demand, in a matter of minutes. This has had a fundamental impact on the business, making it more agile and enabling faster decision making.


While many banks have embraced technology, a recent study by SAP and a group of leading academic institutions found that more needs to be done to close the gap between regulators’ expectations and banks’ ability to meet compliance and reporting requirements. The study identified mobile, in-memory computing and cloud as the biggest technology trends to shape banking in the future.


Big data is helping to advance people’s understanding of the planet’s biodiversity too. SAP is currently working with the International Barcode of Life (iBOL) project to help accelerate the building of a database containing DNA barcodes for every species on the planet. The database hosts more than 400,000 species at present, but to identify all the species on the planet (estimated at between 10 million and 100 million), iBOL is looking to expand the number of people contributing to the research. SAP and iBOL have developed the LifeScanner app, available on iTunes, to crowdsource the collection and analysis of DNA barcodes. Anyone can use the app to collect a tissue sample or whole organism, send it off for analysis and get a species identification using DNA barcodes. The published data will be made available to researchers and students for analysis through SAP HANA and SAP Lumira.


Data analytics will soon be used to tackle fraud in the food supply chain too, as mislabelling continues to plague the industry. It can be hard to tell apart dry oregano and basil, for example, because certain herbs and spices can look very similar after processing and packaging. To help address this challenge, SAP and Tru-ID are exploring solutions to increase visibility in the supply chain using SAP HANA and DNA-based verification testing.

In retail, big data is both a necessity and a strategic advantage, and data analytics plays a key role at three levels: store placement, assortment strategy and customer engagement.

Think about a high-end clothing store at an airport. At the top level, thanks to big data, the location of the store is not random – it’s close to a gate with a high proportion of affluent passengers to maximise sales. At the second level, the store uses data to choose the products sold based on the type of shoppers that walk through the doors at different times of year – for example ensuring that the store is stocked with children’s t-shirts during school holidays. Drilling deeper, the third level is how the store uses data on its customers and its stock to sell more effectively. This could include making recommendations based on what the customer bought on their last visit, or even what the weather is like in their destination. 


Big data analytics is empowering retailers to meet the expectations of their increasingly discerning customers, who expect a connected shopping experience – both online and offline. The popularity of online shopping means that consumers are used to being given recommendations based on their previous purchases, and receiving discount codes to thank them for their loyalty. Data analytics enables retailers to join the dots across their online, offline, mobile and social channels to create a real time 360° view of their customer and provide a better omni-channel customer experience.


Banks, scientists and retailers are leveraging big data to simplify how they operate, enable faster decision making and to serve their customers more effectively, demonstrating the growing value of data across industries. It’s all about challenging convention and asking yourself, “Is there a better way of doing this?” For organisations wanting to turn the masses of data they gather into meaningful, actionable insights, the opportunities are endless.

Over the past two months, I have been(re)introducing SAP HANA via a series of blog posts that talk about why HANA is more than just a database how it can massively reduce redundancy and complexity in enterprise data management environments, and how it brings real-time enterprise computing into the realm of the possible. With this entry, I am happy toreport that real-time computing is no longer just a possibility. It’s here.



At last month’s Sapphire conference in Orlando,we introduced SAP Simple Finance, which delivers a real-time financial solution set to enterprises everywhere via the HANA Enterprise Cloud. With SAP Simple Finance, users enjoy the simplicity of secure cloud delivery and a beautifully redesigned user interface, while leveraging new capabilities which promise to do nothing less than transform finance processes and finance departments.


How great can the impact of transitioning to real-time be? Consider the experience of Zurich Insurance: when they set out to improve their global processes for managing finance, they had a few top priorities in mind.  They wanted to deliver a common consolidated accounting system for all countries. They wanted to provide globally consistent accounting and operational procedures. And they wanted to accelerate their core finance processes, especially reporting and close.


Zurich is an early adopter of SAP Accounting powered by SAP HANA. The team at Zurich were able to rapidly implement the new financial suite and found that it met their requirements for a common system and consistent processes worldwide. But it did much more than that.


The new HANA-native environment is transforming Zurich Insurance’s financial operations. General ledger data, which was once variously duplicated and distributed to support reporting and other processes, is becoming a live repository and single source of truth for all finance processes. Zurich discovered that they now have the flexibility to redesign (and simplify) finance processes on the fly. They were able to develop accelerated reporting and monthly close processes in short order and without the kind of massive, labor-intensive effort normally associated with such an undertaking.


And yet, based on a recent validation project, the results they are seeing far surpass what you might expect from a standard optimization or retooling exercise. After implementation, Zurich will enjoy a 65% faster monthly close process and reporting that is 1000 times faster than the previous implementation. Zurich Insurance is experiencing first hand the true power that is real-time computing.


In the new financials suite, all the different finance applications are seamlessly united on the same live SAP HANA platform, which provides a single source of truth. The tremendous boost in reporting speed that Zurich Insurance enjoyed owes to the fact that all the finance data is now in memory. So, for example, there is no need to copy the data, no need to perform a separate extraction to get it into the embedded Business Warehouse environment.


Additionally, the HANA virtual data model provides more seamless access to data than could ever be achieved using a traditional, static model. Reporting can now be implemented in a much more flexible way. Users can run BI tools directly on the data in the production system; there is no need for a separate reporting system. And where previous Finance architectures included a large collection of built-in calculations aimed at saving time by anticipating what questions the user might need to ask, the new suite supports any possible calculation into the dataset. There is no need to try to fine-tune the results.The answers are returned immediately. 


But the shift to real-time finance processes goes beyond just cutting out the middle man (or middleware.) Typically, financial processes are implemented sequentially. So if your organization performs a monthly close, you have a full profitability picture only once a month. Any dependent or downstream processes are also constrained to that monthly timeline. But with financials running on HANA, users can accelerate and even eliminate sequential processes.


Some users are working towards a completely new model: near real-time close. In such a scenario, the closing process adds little or no additional information. The actual close becomes largely a formality. Empowered by real-time access to information, users are also moving from monthly close to a weekly one. Using traditional tools, some large financial institutions spend as many as 18 days each month on closing. Imagine the organizational transformation when enterprises such as these cut their close time to a few hours, and move to a weekly model.


And it doesn’t stop there. Many enterprises are already building a strategy around the implicit and explicit benefits they will be able to derive from closing their books everyday.


Such capabilities may sound impressive, even amazing, but these organizations aren’t adopting radically new processes just to look cool. In an era in which disruption, high volatility, and accelerating change have all become the new normal, these capabilities are becoming basic requirements. Businesses need better and faster updates, and they need processes that don’t entangle or delay each other.


Forecasting and simulation are more critical than they have ever been before. Organizations often need to model every aspect of their business, including full accounting runs. In a standard environment, such a modeling exercise would require a subset of the data, or would take a very long time to process, or both. So either you get a picture based on incomplete data or you get a more complete picture that is out of date by the time you get it. Neither choice is particularly appealing. The HANA platform does away with that choice, providing the ability to develop accurate real-timemodels for integrated planning and advanced predictive analysis.


While this story begins with talking about finance processes, it does not end there. When you move the breadth and depth of functionality that is the entire SAP enterprise portfolio into a live HANA environment, amazing things begin to happen. Suddenly you have seamless access to data coming from all the different solutions. ERP, accounting, planning — all the applications can see each other’s data. So, for example, the planning system can essentially “open” the ERP system and build models directly from live data. At any time you can compare the plan with reality. There is no latency. The comparison is updated in real time.


Real time is here, and the story has only begun. SAP Simple Finance represents not just the next generation of SAP finance applications; it is a foothold in the future of how finance processes will be expected (and assumed) to work going forward. Moreover, these innovations are giving us a glimpse into the kind of business transformation that pervasive real-time computing will enable overall.



That’s the provocative question posed by Tim Harford in his Financial Times article by the same title. His conclusion is that access to enormous volumes of data does not by itself guarantee insight – in fact, if analysis lacks rigor or is flawed, the result will be distortion on a grander scale.


As Mr. Harford aptly states, “New, large, cheap data sets and powerful analytical tools will pay dividends – nobody doubts that.” But “that achievement is built on the clever processing of enormous data sets.”


It is the analysis of the data by analysts, statisticians, and data scientists who are using the right tools and asking the right questions that leads to big insights.


While no single processing or analytics tool can address the issues that have plagued statisticians for centuries – like correlation vs. causality,multiple-comparison problems, sample error and sample bias -- advances in real-time computing, data visualization, and predictive analytics are helping scientists, entrepreneurs, and governments glean invaluable information from data. 


Yet it is not enough to stop at robust, disciplined analysis. In a similar vein to Mr. Harford’s critique of poorly executed analysis, organizations must also apply rigor connecting Big Data projects to the business imperative. Accurate insights that cannot be translated into value because they are either not relevant or cannot be applied in the day-to-day operations are costly distractions.


Yet per my previous point, technology by itself is not the silver bullet – and there is no benefit to collecting lots of data just because you can. People who focus on Big Data technology “challenges” miss the point. Big Data needs robust analysis that is relevant to the business; technology is a critical enabler only after you have figured out the first part of the equation.


Successful companies begin by understanding the business imperative and tying rigorous analytics to support it before they get to technology. That’s why Big Data projects need to begin in the business boardroom with support from trained data scientists who are also industry experts that can make the link between the business goals, potential data sources, and information technologies.


I believe that businesses will get continuously better at extracting value from big data. But a disciplined approach is critical. No one wants to be left behind as other companies work through the challenges, refine their approaches, and gain increasingly rich insights.




Irfan Khan

Days of Future Past

Posted by Irfan Khan Jun 12, 2014
"We're so excited about today, we can't even think about tomorrow."

‒ Larry Ellison, June 10, 2014

The year is 1974.

The world waits with bated breath for an announcement from the Kremlin. For weeks the print and broadcast media have been abuzz with speculation as to what the Soviets are planning ‒ something big, it seems. At last the fateful hour arrives. Flickering television screens around the globe display the image of General Secretary Leonid Brezhnev, his eyes shining, his voice confident and energetic. The English translation follows in a halting voice-over:

"And so, Comrades...it is with great joy that I announce the...triumphant next step...not just for the Soviet people, but for...all humanity. Soon, very soon...we will launch a spacecraft...which will safely deliver...the first man to the moon!"

Of course it never happened. That would be absurd: nobody loses a race and then announces five years later that they now intend to "win" it.

Unless...they happen to be a database vendor. As I pointed out not long ago, each of the major players in the enterprise database space have recently updated, or at least announced planned changes to, their long-established data management solution sets. Via options and add-ons, they are all laying claim to some version of in-memory processing. Each of these late-comers to this space have couched their announcements in language about "innovation" and "technology leadership." In fact, the most foot-dragging of the bunch just put out a webcast with the title ‒ I’m not making this up ‒ "The Future of the Database Begins Soon."

(I would send Larry a note explaining to him that the Eagle has landed already, but I’m not sure that he would get it.)

This is very exciting. The future of the database will be here soon! Let’s consider for a moment what that future might look like:

  • In the future, perhaps the database will evolve into an enterprise-wide, fully in-memory data platform (not some optional functions tacked onto a conventional database): ushering in the era of true real-time computing.


  • In the future, maybe we’ll see a data platform that eliminates the latency, complexity, and massive redundancy that have long plagued enterprise computing environments, draining them of resources and robbing them of performance.
  • In the future, we might see an approach to enterprise data management that completely transcends what has ever been possible using a database: seamlessly filling gaps that have always been taken as a fact of life for managing enterprise data.

In such a future, the possibilities will be endless.

A utility company serving millions of customers will make substantial improvements to its ability to manage energy loads through real-time correlation of 5 billion smart meter records, a year of detailed weather history, geographical location information, and other customer data. A manufacturer of aircraft infrastructure will achieve a 30% increase in productivity through real-time analysis of more than two terabytes of value-chain and workflow data. A cancer research institute will leverage analysis on massive datasets including patient medical records, genome data, clinical trials, and other sources to develop a molecular profile of patients, enabling individualized treatment unlike anything that has come before.

A multinational consumer goods company will provide real-time access to supply chain data from more than 40,000 cost centers and will transform their customer records reconciliation process with the ability to provide instant reconciliation across some 4.6 billion customer records. A major telecom provider will create a single view of customer interactions across multiple channels, elegantly transforming both its core business processes and the customer’s overall experience. A major Asian hospital will improve care through real-time tracking of more than 300 clinical indicators for tens of thousands of patients.

And those scenarios are just the beginning.

I would like to offer my personal congratulations to the winners of the SAP HANA Innovation Awards -- those listed above as well as all the other winners and, moreover, all the organizations who entered the competition. For that matter, I would like to congratulate the more than 3,300 organizations worldwide who have adopted the SAP HANA platform...so far.

Not only are these organizations achieving unprecedented results ‒ in many cases transforming every aspect of how they operate in a fundamental way ‒ they are demonstrating how unevenly distributed the future truly is. These pioneers are leading the charge into a whole new era of enterprise computing, while those who should be innovators and thought-leaders in this space attempt to "predict" a future that has already arrived.  

In case you haven’t heard, It’s a HANA World now. Please spread the word if you can; some people have a bit of catching up to do.

This is Part 2 of my (re)introduction to SAP HANA. Part 1 reviewed how HANA came to be and laid out the principal differences between it and other offerings. This time, we hone in on the one fundamental difference that makes all the difference.

In the wake of the successful roll-out of SAP HANA and its adoption by more than 3,300 customers (to date), there is a growing consensus that the real-time technology that SAP has pioneered represents nothing less than the future of enterprise computing. Gartner has declared in-memory computing to be a top strategic initiative and has predicted that 35% of large and mid-sized companies will have implemented in-memory solutions by the end of 2015. A growing body of analysis shows that full in-memory operations would enable the typical data center to store 40 times as much data in the same space, while cutting energy consumption by 80%...all while boosting performance by many thousands of times.


The question is no longer so much whether your business will adopt in-memory technology; the question is when you will do so. It is the inevitability of this paradigm shift that has driven Oracle, IBM, Microsoft, and Teradata to announce their own "in-memory" and "columnar" technologies. I ended last time by suggesting that it is perfectly legitimate to ask whether any of these offerings can compete with SAP HANA.


The simple answer is no.


To understand why they can't, let's consider an analogy from another industry. There is a growing belief  - perhaps not yet a consensus -  that in the future, cars will run exclusively on electricity. The conventional auto makers seem to have embraced that possibility by introducing hybrid vehicles. They will tell you that such vehicles represent an evolutionary step toward the future of automobiles. Maybe. But is it the right step?


Building hybrids is all about adding. You take a conventional car and add an entire extra engine so that it can run on electricity (sometimes.) Next you add an enormous, expensive battery that accounts for a large percentage of the vehicle's total interior space, weight, and cost. Then you add an array of complex infrastructure to make the two engines and the battery work together. Finally, you add systems that monitor driving efficiency and that put much of the burden of fuel saving and positive environmental impact of the hybrid on the driver rather than on the car.


And what does all that addition add up to?  Hybrids are often significantly more expensive than conventional automobiles. They can also be more expensive to maintain and repair, and they struggle to provide comparable performance and handling. While they do offer better fuel economy than gas-powered cars, the gap between the two is not always as great as you would expect.


Compare the hybrid approach with the approach taken by Tesla Motors in the development first of their Roadster and subsequently their Model S. When Tesla set out to build an electric car, they didn't take a conventional car and start bolting additional stuff onto it. They started from scratch. As a result, they produced a fully electric, zero-emissions vehicle that can blow the doors off conventional high-performance cars (much less hybrids).  A Tesla doesn't even have a transmission; it doesn't need one.


Tesla didn't add. They subtracted.


When the conventional database vendors set out to incorporate in-memory computing, they took a page from the conventional auto makers' playbook. They came up with complicated schemes for moving data in and out of memory. They bolted column-store technology on top of their traditional row-based databases. They added indexing, optimizations, and additional copies of the data throughout the organization. And all of this still sits atop the old disk-based architecture, which hasn't gone anywhere. (Remember the internal combustion engine in a hybrid car?) It's just gotten bigger, and even more demanding of resources and maintenance.


The result? These vendors report that their hybrid solutions perform 10 times faster than their traditional disk-based systems. That sounds pretty good, until you realize that real-time enterprise computing with SAP HANA enables performance thousands to tens of thousands of times faster than conventional solutions. With that in mind, you can't help but wonder whether the standard database providers aren't going to an awful lot of trouble for a pretty meager result.


When SAP built HANA, we started from scratch. Like Tesla, we didn't add. We subtracted:


  • SAP HANA removes all barriers between applications, transaction processing, and analytics: uniting all three in a single platform.
  • HANA removes unneeded copies of the data from the enterprise environment, achieving everything with just one copy.
  • HANA eliminates the need to move data between the database tier and the application tier; all processing occurs where the data is.
  • HANA removes all delays between transactions and analytics: no more batch jobs, no more ETL, etc.
  • HANA eliminates the need to deploy separate environments for specialized analytics processing, bringing geospatial, text processing, statistical analysis, and all others together in the same unified platform
  • HANA removes layers of complexity between the people who need to use the data (business users) and the people who need to manage the data (data architects.)


The result? Author William Gibson is quoted as saying, "The future is already here -  it's just not evenly distributed." If you want to see the future of real-time computing in action, look to the organizations who have adopted SAP HANA. (I will be sharing stories from some of these companies in a subsequent post.) These forward-looking organizations are realizing the benefits of real-time computing right now. They are discovering a level of simplicity and performance, plus new efficiencies and capabilities, which were never before possible.


Sadly, those benefits will remain out of reach for any organization stuck with a technology that is best described as an evolutionary dead-end. The future is waiting. And it belongs to those who embrace the power of subtraction.

Part 1 reviewed how SAP HANA came to be and laid out the principal differences between HANA and other offerings. Part 2 honed in on HANA's fundamental differentiator. This time we explore the ways that SAP HANA transcends standard database capabilities, and how it truly delivers on the promise to simplify data management and reduce landscape complexity.

In my last piece, I described how database vendors are attempting to augment their technology stacks with in-memory capabilities. They are adding new components, which are delivered as separate options - resulting in more layers of complexity on top of their existing, already quite complex, environments. The unfortunate truth is that this kind of layering and cobbling-together of disparate pieces of technology is not an innovation.


IT landscapes have always suffered from functional gaps between the core foundations and the application components necessary to deliver a business solution. For example, most of the database vendors with roots in the traditional RDBMS space have felt compelled to promote and sell multiple database offerings. This became necessary in order to reconcile the gaps encountered within the organization: transaction processing, reporting, analytical processing, etc. Moreover, there are often gaps between enterprise databases and the functional logic of the applications that run on them.


In both instances, additional layers are required between the different components to make them work together. These gap-fillers are what create complex data management environments, complex landscapes, and ultimately rigid systems that can't change in time with the business. In the new paradigm, the data platform possesses capabilities that extend far beyond what databases have ever been able to do before. SAP HANA fills the gaps between critical components within the overall enterprise computing architecture, thus solving one of the most crucial of today's computing needs.

Moving data between databases that perform different functions is rarely a straightforward exercise. You may need to move data from the OLTP database to an OLAP database, or from an enterprise data warehouse to specialty datamart. A data integration layer, usually a dedicated ETL (extract, transform, load) system, most often fills the gap between the two databases. These systems raise a host of issues:


  • They take time.
  • They consume resources on an on-going basis, both human and machine.
  • They don't always work.
  • When they don't work, correcting and reconciling systems can be a nightmare.
  • They move data in batches, creating a time lag between the data in the source and destination databases.
  • They require labor-intensive updates whenever new requirements for reporting or analysis are identified.


SAP HANA's data virtualization capability eliminates all of these downsides. In a HANA environment, there is no gap between OLTP and OLAP or between a data warehouse and a datamart. Each of these use cases leverages the same, live dataset. All of the data reformatting, cleansing, and moving from place to place, which ETL would typically manage, is done virtually (and instantaneously) in HANA. Where before there were two databases joined by a separate technology layer, now there is one data platform serving any number of different roles.

HANA not only eliminates the latency and complexity associated with traditional architectures, it provides an almost infinitely flexible environment for reporting and analysis. Whenever new reporting or analytical requirements emerge, they can be addressed immediately via the same virtualization capability, with no disruption to the normal reporting or analysis cycle, no new hardware to procure, no new ETL operations to maintain, and no new backups to pay for.

Another performance-draining gap is the one between the database and the application logic that it supports. Up till now, databases have not had the means to address this gap; even those environments that provide cache-based options can't do it. Typically a layer made up of predictive, business, and search functions fills this gap. As with the ETL layer, this extra data analytics layer (or layers) brings unwanted complexity to the environment while slowing everything down.


SAP HANA eliminates the need for an extraneous layer between the data and the application logic. All of the required business calculations, predictive algorithms, and natural language search functions can take place within HANA. HANA provides a rich set of native data analytic and search capabilities pushing processing down from the application layer to the HANA platform, thus enabling data-intensive operations to occur closest to the data.


Additionally, HANA solves one of the biggest challenges faced by enterprise application developers: joining different data types. In the standard enterprise environment, transactional, multi-dimensional, text, graph, and streaming data have to be joined together via custom application logic because they are stored in different databases. This poses any number of difficulties for the application developer, who is usually not a database expert to begin with. Handling data joins in this manner can be slow and burdensome. HANA, on the other hand, natively handles joining multiple data types simultaneously. The developer environment for this is an elegant graphical editor designed for just this purpose. This capability ensures that within an application environment there will be proper separation of data logic and application logic. It also significantly simplifies coding for the application developer, while eliminating any performance issues that would arise from performing the join via application logic away from the data sources.

The stock answer to that question is that databases can do only so much, given their legacy architectures. The SAP HANA data management platform not only outperforms conventional databases on virtually every measure that you might care to apply, it also picks up where they leave off. HANA doesn't need help in the form of extraneous layers of functionality. It thus reduces complexity, increases developer efficiency, and lowers maintenance costs. HANA demonstrates how complete and self-sufficient a reimagined data management environment truly can be.

Over the next few weeks, I will be providing an introduction (or reintroduction) to SAP HANA. I’ll begin today by exploring some of the background on how SAP HANA came to be and how it is different from what other vendors are offering. In subsequent pieces, I will dig a little deeper into this core differentiation and provide some real-world examples of how businesses are transforming themselves with HANA. 


It is hardly news that major changes are afoot in the database world. When SAP introduced the HANA in-memory platform in 2011, it kicked off a firestorm of activity among conventional database vendors. Each of the major players has updated, or at least announced planned changes to, its long-established data management solution set.  IBM DB2 now offers BLU Acceleration to provide an in-memory boost for analytics workloads.  Microsoft SQL Server 2014 includes an in-memory option; Oracle has announced a similar option for 12C . Even Teradata has come to the table with Intelligent Memory, which allows users to keep the most active parts of their data warehouses in memory. Why the sudden interest in in-memory? It would appear that everyone is following SAP’s lead.


But appearances can be deceiving. These vendors have introduced in-memory capability to accelerate analytics performance in a conventional environment, one where multiple copies of the data are scattered throughout the organization. Unsurprisingly, they have long encouraged this model. (After all, selling all those database licenses is good for business.) By contrast, with HANA, SAP has introduced a new paradigm for real-time computing: a single copy of the data, without the need for aggregates or special indexing, support for all transactional and analytical workloads and use cases. A HANA environment is never slowed down by the need to pre-arrange the data using traditional database techniques, the tricks and workarounds that the conventional database vendors have evolved over time in an attempt to keep performance in line with data growth and ad hoc access. 


In developing these solutions and bringing them to the market, these vendors are attempting to answer an important question: “How can we reduce data latency?” This is a question that SAP took on some time ago. Through the years, we have achieved dramatic progress in reducing the impact of the I/O (input /output) bottleneck between application (or analysis) and disk with such innovations as the Advanced Planning Optimizer and SAP LiveCache. This work culminated in 2005 with the introduction of SAP Business Warehouse Accelerator, which leveraged dual-core chips and in-memory processing to provide exponentially faster performance for SAP Netweaver Business Warehouse environments.  


Having addressed the question with nearly a decade ago that the conventional database vendors are only now contending, at SAP we have turned our attention away from fixing yesterday’s problems. With the advent of multicore chips and more broadly and inexpensively available RAM, we began to ask what tomorrow’s data management environment might look like.  As a result, we came up with a new, and much more daring, question:


“How do we enable the transition to real-time business?”


Built into the question that the database vendors are asking is the assumption that the core paradigm will remain the same. Some data will be in memory; some data will be on disk. To improve performance, you change the balance of power between those two fundamental facts of existence. And any time you need to do anything new with the data, you just make another copy of the database. But our question makes no such assumptions, which has enabled some radically new thinking about what kind of changes can be made. 


Let’s simplify things. Vastly. Let’s move the entire database into RAM, cutting the disk and all those disk access issues and delays out of the picture. And let’s run the entire enterprise — all applications, all transactions, all analysis, — on that single, blindingly fast, always up-to-date database.


Such a solution can provide businesses real-time analytical and transactional processing — something that has been long needed, but that has not been possible at the scale required to support the full enterprise. This model eliminates the need to distinguish between transactional and analytical data, as well as the need to populate the organization with multiple copies of the same data. All of the work cycles and resources dedicated to maintaining those multiple copies and keeping them in sync with each other? Gone. Moreover, this model does away with the need for a separate application server. A single live instance of the database replaces all that former complexity.


SAP HANA is about more than just a new kind of database, although a series of fundamental shifts in database technology are involved: the columnar data architecture means that the data is pre-optimized for any query without indexes or aggregates; on-the-fly materialization means that there is no need to spend time and resources creating, updating, and replacing a smaller, “digestible” subset of the data;  multi-temperature data management means that non-active data can be seamlessly identified and moved to the appropriate storage. Nor is the new model just about the performance, although HANA delivers exponentially greater performance than any technology that has come before. Ultimately, it is about the introduction of a whole new way to do enterprise computing.


Along the same lines, it is not accurate to say that the other players have followed SAP’s example with their own “in-memory” solutions. It is absolutely correct to speculate as to whether new entrants in this space can compete with SAP (rather than the other way around.)  But in so speculating, we have to remember that each of their solutions is firmly grounded in the old paradigm. They are not asking how to move business to real time; they are still trying to answer yesterday’s questions. Or to be fair, perhaps they have come up with a new question to ask:


  “How can we (claim to) offer the same thing SAP has?”

William Osler, an influential 19th century Canadian physician and co-founder of Johns Hopkins Hospital, once observed that "the good physician treats the disease; the great physician treats the patient who has the disease." While this standard is upheld as the goal of almost all caregivers, it is notoriously difficult to achieve in an era of large patient loads, demands for quick turnaround, and best practices derived from datasets. Technology, which has drastically improved overall patient outcomes, has paradoxically made it more difficult for physicians to focus on the individual.


Or at least, that was the case. But now Big Data solutions are making it possible for doctors to tailor care to the individual patient’s needs in ways never previously envisioned.


Dr. Hee Hwang, CIO of Korea's SNU Bundang Hospital, reports  that the adoption of a next-generation medical data warehouse built on the SAP HANA platform has transformed how doctors are able to interact with and treat patients. Bundang Hospital has long been a technology leader, going fully paperless more than a decade ago and implementing their first medical data warehouse at about the same time.  Consolidating data from a wide range of sources including treatment records and clinical research, the data warehouse proved a tremendous resource for physicians. But it was not without problems.


One such problem was speed of retrieval. Dr. Hee explains that vital queries could take an hour or more to complete, severely limiting a doctor’s ability to explore relevant data for the treatment options best suited for the individual patient.  Moreover, the existing system struggled with unstructured text and image data, which often contains the most critical information for making diagnostic and treatment decisions.


The new data warehouse, implemented in July of last year, has changed all that. The most complex of queries can now be completed in a matter of seconds. Perhaps more importantly, the availability of real-time data has fundamentally altered treatment strategies in several key areas. For example, real-time patient data has made it possible for doctors to administer pre-surgical antibiotics in a much more customized and individualized way. Within three months of adopting the new data warehouse, Bundang Hospital achieved an astounding 79% reduction in the administration of unneeded antibiotics.  This reduction not only cuts costs and improves the patient’s treatment experience (via not having to take drugs they don’t need), it produces a long-term benefit for patients by preventing the development of drug-resistant agents of infection, which would likely cause future complications.


In a similar vein, the difference that Big Data can make in helping individual patients combat a virus can be witnessed  at St. Paul’s Hospital in British Columbia, where the Centre for Excellence in HIV /AIDS is implementing a revolutionary diagnostic and treatment planning system. In the case of HIV/AIDS, treating the patient rather than disease begins with the recognition that patients are never infected simply with HIV, but with a genetically distinct strain of the virus.


Sorting through the vast amounts of genetic data to isolate both the particular strain of the virus and the best treatment plan for the patient is a task custom-made for Big Data technologies.   Dr. Julio Montaner, who heads up the Centre for Excellence, reports that the new approach will reduce the sequencing time for patient blood samples – which currently can take up to 10 days – by a factor of 100. A trial of the system planned to begin a few months from now is expected to set a new standard for individualized care in the treatment of HIV/AIDS with markedly improved patient outcomes and a sharp reduction in the number of new AIDS cases.


Another area of medical treatment where physicians face the challenge of unique signatures, and the staggeringly large datasets that they define, is in the treatment of cancer. In addition to the individual profiles of each strain of cancer, there is a vast amount of clinical data to be weighed. Doctors looking to analyze this data face the challenge of disparate sources and clumsy access – often in the form of spreadsheets.


To address these challenges, SAP has announced the first deployment of Medical Insights, which leverages a healthcare data model and semantic capabilities to extract patient data from a wide variety sources:  clinical information systems, tumor registries, biobank systems and text documents such as physician’s notes. Built on the SAP HANA platform, Medical Insights performs real-time advanced analytics on the extracted data, providing doctors the most up-to-date and reliable information possible on which to base diagnosis and a course of treatment.


I noted above that technology has previously played a role in standardizing medical practice, making it less individualized or, as Osler might have said, more focused on the disease than the patient. But in these examples we see that Big Data technologies are helping to reverse this trend. Deep within the largest and most widely dispersed medical datasets lie the specific answers needed for the treatment of specific, individual patients. Increasingly, thanks to these technologies, we have the tools to find those answers.

Why is the future so difficult to predict? It is easy enough to jot down a few paragraphs on a given future topic, say the future of the retail industry and the impact that big data will have on it, but it is very difficult to have any assurance that those projections will map to anything that actually happens. Part of the problem is that we tend to see the future as an exaggerated version of the present rather than a world in which fundamental changes have occurred.


There is an old story in futurist circles, probably apocryphal, about a city planner in New York who in the late 19th century published a dire warning about the city’s future. The prediction was an extrapolation from then-current trends. According to his calculations, the rapid growth that the city was experiencing would prove to be its undoing in a matter of decades. By 1950, he predicted, New York City would be completely unlivable. With more people and more businesses would come more horses (naturally) and with more horses would come the waste that they produce. By 1950 the city would be literally buried under a mountain of horse manure.

Apocryphal or not, that mountain of horse leavings is one of the most compelling (and apt) images ever to be associated with predictions of the future. Look back at what was predicted for retail (or for any industry, for that matter) four or five decades ago and you will see very little that matches what is occurring today because disruptive innovation is so difficult to predict. Even when technology factored into predictions, it was difficult for forecasters to get a handle on how transformative the exponential growth in data, accompanied by the exponential growth in analytical capability, would prove to be.

Why is this? Some believe our expectations are too vivid, too imaginative, and that reality -- being ever mundane and predictable -- can’t hope to deliver. But I would make the case that the problem is neither vividness nor imagination. After all, our worried city planner could never have hoped to imagine the real challenges that New York faced in the middle part of the 20th century. The reality was much more vivid, much more complex, and ultimately much more subtle than his unsettling prediction.

But that vivid, complex, and ultimately subtle reality is, in fact, reality. And reality is a thing we tend to take for granted. Even when parts of it are novel or surprising or completely amazing, it’s difficult to keep that perspective when those things become a part of everyday life.

So let’s return to our subject, the future of retail. Let’s look at it from the consumer’s side: the shopping experience. What is -- or maybe it is better to ask what was -- the future of shopping supposed to be like? New Yorkers in 1950 might have predicted much bigger stores with more choices of merchandise. And they would have been right, as far as that goes. But they probably wouldn’t foresee a future wherein very few transactions are made on a cash basis, and of course the whole idea of online shopping would have been incomprehensible to them.

Imagine going back in time and trying to explain something like the Home Shopping Europe (HSE) network to some of those mid-20th-century shoppers. We’re used to the idea of TV-based home shopping, so it doesn’t seem extraordinary to us that a television network would broadcast retail offers all across Europe (24 hours a day, seven days a week) and generate half a billion in sales in the process. The parts of the scenario that would seem dramatic to them -- television programming serving as a storefront, viewers placing orders over the phone -- would seem completely commonplace to us.

And we would know, although hardly be impressed by the fact, that the real driver of HSE’s success is neither television nor telephone technology, but rather a vast infrastructure of IT systems, network technology, and data. When HSE talks about giving customers a common experience across a wide range of access options and perfecting the art of upselling and cross-selling their customers with additional offers suited to their tastes and budgets, we are once again dealing in the complex and the subtle. The infrastructure that supports such capability is obviously highly complex, and yet it is hidden from the everyday experience of the user.

Likewise, when Home Depot sets out to provide a system that will track any item in inventory (out of tens of thousands of categories) anywhere in 2200 different locations, or when Swiss retailer Tally Weijl manages five deliveries per week to some 800 stores serving the highly dynamic and demanding market of women age 16-25 -- and manages all of this while dramatically growing revenues and cutting costs -- we know that the reality of “the future of shopping” is far more dramatic than any scenario involving sleek skyscrapers or flying cars. Yet this extraordinary capability is delivered by technology that we all too quickly accustom ourselves to, and then scarcely think about.

And of course, the future is just beginning. Same-day fulfillment of online orders is here (in some markets, anyway.) What comes next? We know to expect drone helicopters shipping to our homes, but it gets even stranger than that.  In a scenario oddly reminiscent of the visionary film Minority Report -- wherein the police “Precrime” division had officers tasked with preventing crimes before they are about to occur -- we now have the promise of “pre-shopping,” with Amazon promising to send you your order before you place it.

As with the remarkable results retailers are already achieving, these images of things to come will be made real, if at all, by the subtle power of Big Data and the Cloud. Operating quietly behind the scenes, powerful technology can bring us capabilities we can scarcely imagine but will reward those that dare to.

A year ago I took on the subject of how deeply embedded Big Data is becoming in our everyday lives in a piece with the modest title, Without Big Data Life As We Know It Would Be Impossible. I outlined a simple scenario in which we book an international flight, check the weather conditions in the intended destination city, secure accommodations, and make sure we have working wireless service upon arrival. Interestingly, but perhaps not surprisingly, Big Data plays a major role in completing each of these fairly routine tasks. 

Looking ahead, we can expect this trend to continue. In the future, we will find Big Data even more deeply embedded in even more routine activities. Not only will we be accessing Big Data more frequently, we will be significantly growing our own contribution to it.


There is an interesting analogy between the impact that we, as individuals, have on the overall data environment and the impact that we have on the real environment. When assessing how an individual’s behavior might impact climate change, we talk in terms of the carbon footprint: the total volume of carbon emissions that result from our personal lifestyle choices. Likewise, we each have a data footprint: the total volume of data generated by and on behalf of us.

The two footprints are similar in that both have experienced enormous growth in the recent past. But there is an important difference between them. Whereas there is growing societal awareness of the carbon footprint and a dedicated effort to get its growth under control, there is little awareness of the data footprint, and there is certainly no widespread effort to curb its growth.

And that may be a very good thing. Oddly enough, the exponential growth of our data footprint may be able to contribute to the reduction of our carbon footprint.

How is that possible? Consider air travel, one of the primary targets of efforts to reduce the individual carbon footprint. If you want to mitigate the damage you do caused by air travel, the most obvious solution is to cut down on the number of trips you take. But there are other ideas, too. Recognizing the environmental impact of air travel, aviation manufacturers are taking steps to reduce the overall carbon footprint of air travel.

For example, General Electric (GE) has recently announced substantial changes to the design of the CFM Leap aircraft engine, which powers the Airbus A320neo, Boeing 737 Max and COMAC C919 aircraft.  The new generation Leap is “designed to provide significant reductions in fuel burn, noise, and NOx emissions compared to the current… engine.” It is designed to generate 32K pounds of thrust, achieve a 99.87% reliability rate, and introduce a $3 million operating saving annually.


Where will these savings come from? New sensors intricately track how the engine is operating.  The use of data fundamentally transforms how the engine operates and makes it more efficient. But that efficiency requires a lot of data. The new version of the Leap aircraft engine generates 1 TB per day from those sensors alone. Add in avionics, traffic data, weather data… a massive amount of information is generated just from taking a flight. In previous versions, the Leap engine has completed more than 18 million commercial hours of operation, with some 22,000 of the engines manufactured. So we’re talking about a lot of data.

In every way but one, this engine now operates with a smaller footprint: it requires less fuel, it makes less noise, it generates fewer noxious emissions, it costs less to operate. Only in one area, data, is its footprint expanding.

And this is why we can each expect our own data footprint to grow enormously in the years to come. We aren’t creating all this data just for the sake of creating data. Big Data is bringing us big benefits, and this is occurring throughout the business world and in our personal lives.


Consider the improvements that big data is bringing about in a wide variety of business settings.  Retailers are benefiting from rapid sales analysis and response as well as a customer-driven approach to supply chain. Discrete manufacturers are enabling real-time operations and global product traceability both upstream and downstream.  Makers of consumer products are tracking customer buying behavior in real time, and radically changing their procurement and manufacturing processes. And, like the aviation manufacturers mentioned earlier, oil and gas companies are improving asset efficiency and integrity.


Closer to home, we are all accessing, modifying, and generating data all the time. Taking a drive across town, we access vast geospatial databases and generate new data in our vehicles’ (more modest) versions of the sensors in the Leap aircraft engine. Filling a prescription, we tap into an interconnected set of pharmaceutical and insurance databases. Our casual tweets and status updates feed into the global social media colossus. A trip to the supermarket might involve a quick smartphone search for product information, accessing enormous search engine and consumer databases and giving us each the opportunity to generate even more data. After all, it is our shopping behavior that provides the data points feeding into the retailer and consumer goods analyses mentioned above.

At every turn, we are accessing data, modifying and updating data, and generating new data. Between ultrasounds, social media mentions, YouTube videos, and online gift transactions, a newborn in 2014 has a substantially bigger data footprint than an adult did a couple of decades ago. As that child’s world becomes ever more dependent on and integrated with Big Data, that footprint is going to grow and grow.

How far will this go? That is difficult to predict. But as long as Big Data continues to be tied with providing new options, improving existing processes, and opening up new capabilities -- and as long as our infrastructures can continue grow to support this data -- there is no end in sight.

Blog article also posted on SAPHANA.com website.

What were the top Big Data news stories of 2013? The usual pundits have been busy over the past week or so weighing in on this question, providing lists that include software releases, corporate acquisitions, new strategic directions, new offerings in the cloud, Hadoop without MapReduce -- all the things we would expect. But are these really the top stories? Perhaps by looking through the industry lens, we’re missing the real news.


Look at the top mainstream news stories of 2013 and you will see that Big Data is becoming a part of our social fabric. Worldwide, the top news stories included the devastating typhoon in the Philippines, the civil war in Syria, the birth of a prince, the election of a Pope, and the passing of a man who led his nation into a new era of justice and freedom. Top stories in the U.S. would include the roll-out of the Affordable Care Act, the Boston Marathon bombings, and the NSA scandal. Although none of these are Big Data stories, or even necessarily technology stories, each has a very real connection to the world of Big Data. 

Increasingly, intensive analysis of social media and other communications content is driving our understanding of the major news stories of the day. Thanks to social and mass media channels, these stories become massively shared experiences.  Collecting and digging into such content is a large-scale data mobilization effort. Such analysis provided context in understanding the impact of events as diverse as the birth of Prince George and the death of former South African president Nelson Mandela.


In the intelligence community, this kind of analytical effort is referred to as “chatter analysis,” and it is a critical component of modern intelligence work. As the tragic events in Syria unfolded, chatter analysis proved indispensable as it became clear that the conflict was driven in large part by an underlying information war. While each side in the conflict used traditional channels to promote its interpretation of what was taking place hundreds of thousands of Syrians used their mobile devices to report their individual experiences and to help shape a more complete understanding of events as they occurred. That sharing of information came at a high price to some, whose digital footprints made them trackable by the opposing side. Ongoing analysis of social media and other communications content continue to drive our understanding as events unfold both in Syria and the surrounding region.


In the US, intelligence-gathering of another sort took center stage as details from the 1.7 million National Security Agency (NSA) files leaked by former contractor Edward Snowden began to be made public. The NSA scandal at least bears a strong resemblance to a Big Data story, raising as it does major questions about privacy and data ownership. The NSA scandal is driving critically important debate within the public and the courts that will, in all likelihood, lead to a whole new approach to regulating and enforcing data privacy. As the idea that everyone leaves a digital footprint gains broader understanding, questions about who (if anyone) should have access to the various pieces of data that make up that footprint are becoming more urgent. Everyone in the business of collecting and analyzing data stands to be impacted by the answers that emerge to those questions.


Meanwhile, there can be little doubt that Big Data plays a rapidly growing role for both the intelligence and the law enforcement communities. In the investigation and manhunt that followed the Boston Marathon bombing, law enforcement officials were in many ways as reliant on computer technology as they were traditional methods. However, a subsequent review of the investigation showed that enhanced Big Data capabilities might have produced faster results. So demand for increased government access to and use of digital footprints is occurring in one context, while demand that such access be severely curtailed is occurring in another. Quick and easy resolutions to these conflicting sets of priorities seem unlikely.

But not all use of Big Data raise such seemingly intractable conflicts. In the case of Typhoon Haiyan, Big Data was pivotal in enabling relief efforts throughout the Philippines. An interactive map which synthesized geo-spatial, demographic, and social media data guided relief workers to the areas with the most urgent need for help, providing the most expeditious routes to these troubled zones. And a GPS-enabled asset-tracking system helped to ensure that resources were deployed to where they were needed the most.  In a completely different vein, and demonstrating the diversity of applications for Big Data capabilities, odds-makers used advanced analytics in an attempt to predict the outcome of the Papal election in March.


Finally, in the U.S., the implementation of the Patient Protection and Affordable Care Act (ACA), commonly referred to as "Obamacare," proved to be less of a Big Data story than expected.


Obviously, launching a new national healthcare system for a country with population of more than 300 million has Big Data implications. The system had to accommodate the tens of millions who currently don’t have coverage, but would also impact the hundreds of millions who do. After all, many of those individuals would be expected to end up on the healthcare exchanges themselves one day and, in any case, it would be necessary to ensure that their existing coverage was compliant with all the new regulations. Such a system would require a whole new infrastructure for managing healthcare data. Each participant’s full history of medical conditions, treatments, and providers would have to be consolidated into one easily portable data profile, a profile made up of information that would now be accessible to more sets of eyes than ever before. The Big Data implications were enormous.


The expectation was that healthcare was about to become "the new financial services." In the financial services space, thousands of entities work together to provide an infrastructure that enables individualized credit ratings and simplified local, national, and global funds transfer. Now, in the healthcare space, thousands of entities would work together to provide an infrastructure to enable widespread analysis of treatments and outcomes and easy transfer of complete medical history from provider to provider.

But when Healthcare.gov was launched, a very different story emerged. That story centered on the basic operation and security of the site, and the tremendous difficulty encountered when attempting to get applicants through the registration process. Technical observers clustered around a consensus that the basic infrastructure of the system that was 10 years (or more) out of date -- a Web 1.0 solution for a Web 2.0 world. Clearly those issues would have to be addressed before Big Data could even enter the picture.


Not that long ago, the statement that all commerce is e-commerce would have been have been laughable. Today it is simply a straightforward description of how things have evolved. The notion that all companies are software companies would have seemed even more absurd. But now that preposterous idea is offered up as a commonplace. As technology becomes more deeply embedded in the fabric of society, we are approaching the day when it might well be said that all news is technology news, or even that all news is Big Data news.

In 2013, that was still not quite the case, although Big Data is increasingly becoming ingrained into the fabric of our societies, as evidenced by its role in many of the major stories of the year and in how we learned about and came to understand virtually all of them. We can expect these trends to continue in 2014 and beyond.

This blog has also been posted on saphana.com

At the end of October we announced the general availability of five new Big Data-enabled applications. You can read more about them in our Big Data press release. Essentially, each application handles a different aspect of an organization’s relationship with their customers.


Four of the applications are part of the Customer Engagement Intelligence family of solutions, each managing different aspects of the customer relationship through the sales and marketing process. The fifth application, Fraud Management, deals with customer relationships that are, shall we say, more acrimonious! Nonetheless, combine these five applications with Demand Signal Management and Sentiment Intelligence and you have a whole suite of solutions that leverage Big Data to transform the customer relationship.


Of course, Big Data can do more than improve customer relationships. SAP recently announced collaboration with the International Barcode of Life (iBOL) initiative to help understand our relationship with our planet.


The International Barcode of Life initiative is a consortium of institutions across 25 different nations, collecting and analyzing organism from all regions of the world. Their mission is to create a library of DNA sequences and habitat information for every multi-cellular species on the planet, facilitating their rapid identification.  This is an ambitious project when you consider there are an estimated 10 to 100 million different species. But the benefits to society are huge. Information about species is used in everything from managing water and food quality, protecting our ecosystems and agricultural land, to human disease control, monitoring transmission of viruses, and more.


But it is an urgent project at the same time. Research estimates that one third of all species will be extinct by 2100. If you assume there 100 million species in the world and that they will go extinct at a uniform rate over the next 86 years, it means a staggering 1,000 species are lost to us every single day. Yes, you read that correctly: 1,000 species going extinct, daily!


SAP and the International Barcode of Life project are collaborating to help build an application to crowd source the collection and analysis of all of this information. The goal is an application that anyone can use on a mobile device to collect a species sample, send it off for testing, and get the results back ... and to do this from anywhere on the globe.


Barcode of Life - Splash screen.pngBarcode of Life - Home screen.png
Splash screenMain menu

Barcode of Life - Sampling process.pngBarcode of Life - My samples.png
DNA collection processList of samples collected by user

Barcode of Life - Sample information.pngBarcode of Life - Sample image and barcode.png
Detailed information about a species identified by a specific DNA sampleImages and DNA barcode for a species identified by a specific DNA sample

Barcode of Life - Groups.pngBarcode of Life - samples map.png
Home page for a specific group crowd sourcing the collection of samples in a regionMap of DNA samples collected by a group



The screenshots above give you an idea of just how easy we are hoping to make this - and to be able to use by anyone, even those in regions where desktop computers are unavailable but where mobile devices often are ubiquitous.


This application designed to crowd-source the collection of a massive data set, provides a rich source of information about our world – one that we want to analyze. The second part of this co-innovation project involves mining all of this data using SAP HANA and SAP's analytic tools. Through this we will gain insights into the genetic variation of species, their evolution, and their migrations resulting from climate change.


Whether it’s with our customers, the human species, or any other kind of species, there is no doubt that Big Data gives us the opportunity to understand all of our relationships.

What companies want are real, tangible results from any Big Data project. Yes, I understand I am stating the obvious. Yet judging from the many Big Data solutions entering the market today, it’s not clear that everyone does. So many vendors seem to equate Big Data with Data Warehousing 2.0 + Data Mining. But that falls short.


For any Big Data project to deliver real results it needs to:


  1. Align to business priorities,
  2. Integrate with business processes, and
  3. Operate at the pace of business


Any approach that doesn't meet these three requirements is just "Big" or just "Data" – neither of which is particularly interesting on its own.


Business Priorities

Linking the business imperative to all your data, across all stakeholders with implicit contextual value, is the only real way customers will be able to monetize the Big Data investments. If customers are observing business misalignment, they should consider "re-booting" the project immediately or suffer the consequences of sub-optimal results.

Unfortunately, many companies start by collecting data (and lots of it), budgets get depleted and then in an attempt to justify the costs they try to more directly link it to a business problem. It’s like pushing water uphill. Instead, go with the flow and make the business priority the starting point.


The best way to get started in the right direction is to engage data scientists who know your industry, and can make the link between your business, your data, and your IT.


Business Processes

The organizations that achieve the greatest results tie Big Data insights directly into their business processes and their people, allowing them to act upon insights in day-to-day operations. Understanding Business Processes should be the next conversation, right after you understand the Business Priorities.


You want to be thinking about how to "Big Data enable" your business processes and enterprise applications, and how to equip your frontline workers with the insights they need. Understanding this will be critical to building robust Big Data architecture that delivers results.


Look for a comprehensive data platform and integrated analytics suite foundation that address all of your stakeholders and Big Data application use cases, and that can  seamlessly interoperate with existing and new business processes. Any Big Data vendor who only talks about data warehousing or data mining is falling far short of where you need to be.


Business Pace

In the context of business priorities and business processes, it becomes clear that the real IT challenge is to acquire, analyze and act on Big Data insights at the pace of business.


Employing conventional wisdom, traditional databases can indeed store petabytes of data. The problem is that when you start dealing with the expanding footprint of unstructured data and relying upon these traditional environments the conventional RDBMS system cannot keep up – in fact, it grinds to a halt! Switching on a traditional database’s in-memory feature may get the gears temporarily moving again, but it’s a duct tape solution that won’t last when data grows to the next level of complexity.


To keep pace with business Big Data architectures need to deliver instant results from infinite storage. That requires a truly in-memory platform like SAP HANA connected to massively scalable data storage like Hadoop.


Sorry but Big Data focused on analytics and data warehousing alone isn't enough. It’s nothing personal. It’s just business; and its priorities, processes, and pace!

With little fanfare, last week SAP made an announcement that it will resell the Intel Distribution of the open source Apache Hadoop project as well as HortonWorks Data Platform for Hadoop. While hardly page one news for major media, it is big news for enterprises seeking to exploit their big data opportunities. Let me explain.


First, big data, particularly unstructured data that comprise 80% of the information inside most organizations, presents new IT challenges, most of which overwhelm the vast majority of the traditional, row-based data warehouses installed today. Attempting to store and analyze big data effectively in these established DWs is, for the most part, a lost cause.


Hadoop, which is an efficient distributed file system and not a database, is designed specifically for information that comes in many forms, such as server log files or personal productivity documents. Anything that can be stored as a file can be placed in a Hadoop repository. 


Hadoop is also a young technology, not even a decade old. That’s why a mere 10% of respondents in a recent TDWI survey of enterprises say they currently have the file system running in their organizations. Most companies aren’t sure what other technologies Hadoop needs to be an effective tool in their data center.


Which brings me to my second reason why SAP's reseller agreements with Intel and HortonWorks are, well, a big deal for big data. Big Data has swamped most large enterprises, presenting CIOs with two pressing problems. The first is how to store dramatically increasing volumes of information of unknown value. Well, that can be solved with Hadoop, a proven, low-cost method for storing petabytes of data. However, once stored, trying to move the data from Hadoop into a traditional data warehouse can take weeks to process before it’s ready for analysis. And even then, because the amount of data is so vast, analysts often need to remove much of the detail so as not to bring the old DW to a grinding halt. But because of the integration work we’ve done with Hadoop and the SAP HANA platform all the data that analysts need can move seamlessly between the two systems when they want it, not after weeks of processing. By combining the all in-memory, columnar SAP HANA database with Hadoop in the data center, CIOs are able to deliver infinite storage with instant insights.


Finally, there’s one more subtle benefit of our news announcement for the Hadoop community and IT. To succeed in the marketplace, emerging enterprise systems like Hadoop need established vendors to embrace them, otherwise most CIOs will not deploy them in their data centers. With SAP fully committed to Hadoop through these reselling agreements, CIOs understand that they can embark on cutting edge big data initiatives with state of the art technologies that are fully supported by a single, trusted vendor, SAP. With these partnerships, we minimize a CIO’s risk. We eliminate the problem of whom to contact for Hadoop support. It’s SAP.


In addition to the low-risk we offer enterprises Hadoop, we deliver choice. Being open source, Hadoop has many iterations to choose from; one might say, too many. So, by delivering full support to the Intel and HortonWorks distributions, we have done the hard work for you to determine the best enterprise-class versions of Hadoop for your data center. But you can still choose based on your organization’s needs. 


At the same time, SAP has integrated Hadoop with the SAP HANA platform, SAP Sybase IQ software, SAP Data Services, and SAP Business Objects, making it possible to conduct sophisticated OLTP and OLAP operations with both structured and unstructured data. 


Sometimes it’s the little things that make life easier for IT managers. That’s why this modest news announcement is such a big thing. By SAP reselling the Intel and HortonWorks distributions of Hadoop we provide a single point of contact for a complete technology stack that delivers the best performance so every enterprise can use all of big data to improve their business.

Irfan Khan

Serious Serendipity

Posted by Irfan Khan Jul 16, 2013

Whether it’s the discovery of penicillin or the invention of the chocolate chip cookie, some notable innovations in history occur serendipitously. But in both these cases, at least, it was serious serendipity. That is, Sir Alexander Fleming was researching ways to control Staphylococcus bacteria when he stumbled upon the penicillin mold that killed it; and Mrs. Ruth Graves Wakefield was baking a batch of cookies when she created her tasty treat.


I mention this because as more and more enterprises explore the value hidden in Big Data, innovation of all types will flourish. Most of this innovation will come from the intentional analysis of the data. SAP customers do this every day by querying and exploring massive data sets in products like the SAP HANA platform or SAP Sybase IQ database and uncovering sought-after insights. But sometimes a key value in Big Data is discovered inadvertently.


Take, for example, Google’s discovery of its spell checker. As related in Big Data: A Revolution That Will Transform How We Live, Work, and Think, a new book by Viktor Mayer-Schonberger and Kenneth Cukier, engineers at the company discovered that they would not need to license spell checking software from a third party for Google Docs and other applications because it had a trove of spelling data from all the typos hundreds of millions of people had keyed into its search engine. Not only did the company have virtually every conceivable misspelling of a word, they had it dozens of languages available to them for nothing inside their vast data pool.


One of my favorite stories of serendipitous discoveries of value in data comes from one of our customers. Deloitte, which is the world’s largest consultancy with 200,000 employees, needed to comply with tax regulations related to expenses for its globe-trotting consultants. Aligning thousands of expense reports with the appropriate tax geographies was exceedingly complex and included huge data sets.


Before Deloitte deployed SAP HANA, the consultancy was able to align only about five percent of the expense reports with the various tax geographies. With SAP HANA, it’s now 100 percent. But, better still, Deloitte quickly realized that other global firms also needed to comply with the same tax authorities that it did, so the firm now plans to offer the expense and tax audit as a new service to its clients. So, serious internal analysis of its data has led serendipitously to an external revenue source.


So much of what companies do with their vast and growing data sets will be laser focused, targeting intentional outcomes for anticipated benefits. But there will be many more stories like those from Deloitte and Google, where serious value is serendipitous, delivering additional benefits than those planned. Those companies will have something extra to celebrate from their analytics work. We can hope that there will be chocolate chip cookies at the celebration.


Filter Blog

By date:
By tag: