1 2 3 4 Previous Next

Irfan Khan's Blog

51 Posts

William Osler, an influential 19th century Canadian physician and co-founder of Johns Hopkins Hospital, once observed that "the good physician treats the disease; the great physician treats the patient who has the disease." While this standard is upheld as the goal of almost all caregivers, it is notoriously difficult to achieve in an era of large patient loads, demands for quick turnaround, and best practices derived from datasets. Technology, which has drastically improved overall patient outcomes, has paradoxically made it more difficult for physicians to focus on the individual.


Or at least, that was the case. But now Big Data solutions are making it possible for doctors to tailor care to the individual patient’s needs in ways never previously envisioned.


Dr. Hee Hwang, CIO of Korea's SNU Bundang Hospital, reports  that the adoption of a next-generation medical data warehouse built on the SAP HANA platform has transformed how doctors are able to interact with and treat patients. Bundang Hospital has long been a technology leader, going fully paperless more than a decade ago and implementing their first medical data warehouse at about the same time.  Consolidating data from a wide range of sources including treatment records and clinical research, the data warehouse proved a tremendous resource for physicians. But it was not without problems.


One such problem was speed of retrieval. Dr. Hee explains that vital queries could take an hour or more to complete, severely limiting a doctor’s ability to explore relevant data for the treatment options best suited for the individual patient.  Moreover, the existing system struggled with unstructured text and image data, which often contains the most critical information for making diagnostic and treatment decisions.


The new data warehouse, implemented in July of last year, has changed all that. The most complex of queries can now be completed in a matter of seconds. Perhaps more importantly, the availability of real-time data has fundamentally altered treatment strategies in several key areas. For example, real-time patient data has made it possible for doctors to administer pre-surgical antibiotics in a much more customized and individualized way. Within three months of adopting the new data warehouse, Bundang Hospital achieved an astounding 79% reduction in the administration of unneeded antibiotics.  This reduction not only cuts costs and improves the patient’s treatment experience (via not having to take drugs they don’t need), it produces a long-term benefit for patients by preventing the development of drug-resistant agents of infection, which would likely cause future complications.


In a similar vein, the difference that Big Data can make in helping individual patients combat a virus can be witnessed  at St. Paul’s Hospital in British Columbia, where the Centre for Excellence in HIV /AIDS is implementing a revolutionary diagnostic and treatment planning system. In the case of HIV/AIDS, treating the patient rather than disease begins with the recognition that patients are never infected simply with HIV, but with a genetically distinct strain of the virus.


Sorting through the vast amounts of genetic data to isolate both the particular strain of the virus and the best treatment plan for the patient is a task custom-made for Big Data technologies.   Dr. Julio Montaner, who heads up the Centre for Excellence, reports that the new approach will reduce the sequencing time for patient blood samples – which currently can take up to 10 days – by a factor of 100. A trial of the system planned to begin a few months from now is expected to set a new standard for individualized care in the treatment of HIV/AIDS with markedly improved patient outcomes and a sharp reduction in the number of new AIDS cases.


Another area of medical treatment where physicians face the challenge of unique signatures, and the staggeringly large datasets that they define, is in the treatment of cancer. In addition to the individual profiles of each strain of cancer, there is a vast amount of clinical data to be weighed. Doctors looking to analyze this data face the challenge of disparate sources and clumsy access – often in the form of spreadsheets.


To address these challenges, SAP has announced the first deployment of Medical Insights, which leverages a healthcare data model and semantic capabilities to extract patient data from a wide variety sources:  clinical information systems, tumor registries, biobank systems and text documents such as physician’s notes. Built on the SAP HANA platform, Medical Insights performs real-time advanced analytics on the extracted data, providing doctors the most up-to-date and reliable information possible on which to base diagnosis and a course of treatment.


I noted above that technology has previously played a role in standardizing medical practice, making it less individualized or, as Osler might have said, more focused on the disease than the patient. But in these examples we see that Big Data technologies are helping to reverse this trend. Deep within the largest and most widely dispersed medical datasets lie the specific answers needed for the treatment of specific, individual patients. Increasingly, thanks to these technologies, we have the tools to find those answers.

Why is the future so difficult to predict? It is easy enough to jot down a few paragraphs on a given future topic, say the future of the retail industry and the impact that big data will have on it, but it is very difficult to have any assurance that those projections will map to anything that actually happens. Part of the problem is that we tend to see the future as an exaggerated version of the present rather than a world in which fundamental changes have occurred.


There is an old story in futurist circles, probably apocryphal, about a city planner in New York who in the late 19th century published a dire warning about the city’s future. The prediction was an extrapolation from then-current trends. According to his calculations, the rapid growth that the city was experiencing would prove to be its undoing in a matter of decades. By 1950, he predicted, New York City would be completely unlivable. With more people and more businesses would come more horses (naturally) and with more horses would come the waste that they produce. By 1950 the city would be literally buried under a mountain of horse manure.

Apocryphal or not, that mountain of horse leavings is one of the most compelling (and apt) images ever to be associated with predictions of the future. Look back at what was predicted for retail (or for any industry, for that matter) four or five decades ago and you will see very little that matches what is occurring today because disruptive innovation is so difficult to predict. Even when technology factored into predictions, it was difficult for forecasters to get a handle on how transformative the exponential growth in data, accompanied by the exponential growth in analytical capability, would prove to be.

Why is this? Some believe our expectations are too vivid, too imaginative, and that reality -- being ever mundane and predictable -- can’t hope to deliver. But I would make the case that the problem is neither vividness nor imagination. After all, our worried city planner could never have hoped to imagine the real challenges that New York faced in the middle part of the 20th century. The reality was much more vivid, much more complex, and ultimately much more subtle than his unsettling prediction.

But that vivid, complex, and ultimately subtle reality is, in fact, reality. And reality is a thing we tend to take for granted. Even when parts of it are novel or surprising or completely amazing, it’s difficult to keep that perspective when those things become a part of everyday life.

So let’s return to our subject, the future of retail. Let’s look at it from the consumer’s side: the shopping experience. What is -- or maybe it is better to ask what was -- the future of shopping supposed to be like? New Yorkers in 1950 might have predicted much bigger stores with more choices of merchandise. And they would have been right, as far as that goes. But they probably wouldn’t foresee a future wherein very few transactions are made on a cash basis, and of course the whole idea of online shopping would have been incomprehensible to them.

Imagine going back in time and trying to explain something like the Home Shopping Europe (HSE) network to some of those mid-20th-century shoppers. We’re used to the idea of TV-based home shopping, so it doesn’t seem extraordinary to us that a television network would broadcast retail offers all across Europe (24 hours a day, seven days a week) and generate half a billion in sales in the process. The parts of the scenario that would seem dramatic to them -- television programming serving as a storefront, viewers placing orders over the phone -- would seem completely commonplace to us.

And we would know, although hardly be impressed by the fact, that the real driver of HSE’s success is neither television nor telephone technology, but rather a vast infrastructure of IT systems, network technology, and data. When HSE talks about giving customers a common experience across a wide range of access options and perfecting the art of upselling and cross-selling their customers with additional offers suited to their tastes and budgets, we are once again dealing in the complex and the subtle. The infrastructure that supports such capability is obviously highly complex, and yet it is hidden from the everyday experience of the user.

Likewise, when Home Depot sets out to provide a system that will track any item in inventory (out of tens of thousands of categories) anywhere in 2200 different locations, or when Swiss retailer Tally Weijl manages five deliveries per week to some 800 stores serving the highly dynamic and demanding market of women age 16-25 -- and manages all of this while dramatically growing revenues and cutting costs -- we know that the reality of “the future of shopping” is far more dramatic than any scenario involving sleek skyscrapers or flying cars. Yet this extraordinary capability is delivered by technology that we all too quickly accustom ourselves to, and then scarcely think about.

And of course, the future is just beginning. Same-day fulfillment of online orders is here (in some markets, anyway.) What comes next? We know to expect drone helicopters shipping to our homes, but it gets even stranger than that.  In a scenario oddly reminiscent of the visionary film Minority Report -- wherein the police “Precrime” division had officers tasked with preventing crimes before they are about to occur -- we now have the promise of “pre-shopping,” with Amazon promising to send you your order before you place it.

As with the remarkable results retailers are already achieving, these images of things to come will be made real, if at all, by the subtle power of Big Data and the Cloud. Operating quietly behind the scenes, powerful technology can bring us capabilities we can scarcely imagine but will reward those that dare to.

A year ago I took on the subject of how deeply embedded Big Data is becoming in our everyday lives in a piece with the modest title, Without Big Data Life As We Know It Would Be Impossible. I outlined a simple scenario in which we book an international flight, check the weather conditions in the intended destination city, secure accommodations, and make sure we have working wireless service upon arrival. Interestingly, but perhaps not surprisingly, Big Data plays a major role in completing each of these fairly routine tasks. 

Looking ahead, we can expect this trend to continue. In the future, we will find Big Data even more deeply embedded in even more routine activities. Not only will we be accessing Big Data more frequently, we will be significantly growing our own contribution to it.


There is an interesting analogy between the impact that we, as individuals, have on the overall data environment and the impact that we have on the real environment. When assessing how an individual’s behavior might impact climate change, we talk in terms of the carbon footprint: the total volume of carbon emissions that result from our personal lifestyle choices. Likewise, we each have a data footprint: the total volume of data generated by and on behalf of us.

The two footprints are similar in that both have experienced enormous growth in the recent past. But there is an important difference between them. Whereas there is growing societal awareness of the carbon footprint and a dedicated effort to get its growth under control, there is little awareness of the data footprint, and there is certainly no widespread effort to curb its growth.

And that may be a very good thing. Oddly enough, the exponential growth of our data footprint may be able to contribute to the reduction of our carbon footprint.

How is that possible? Consider air travel, one of the primary targets of efforts to reduce the individual carbon footprint. If you want to mitigate the damage you do caused by air travel, the most obvious solution is to cut down on the number of trips you take. But there are other ideas, too. Recognizing the environmental impact of air travel, aviation manufacturers are taking steps to reduce the overall carbon footprint of air travel.

For example, General Electric (GE) has recently announced substantial changes to the design of the CFM Leap aircraft engine, which powers the Airbus A320neo, Boeing 737 Max and COMAC C919 aircraft.  The new generation Leap is “designed to provide significant reductions in fuel burn, noise, and NOx emissions compared to the current… engine.” It is designed to generate 32K pounds of thrust, achieve a 99.87% reliability rate, and introduce a $3 million operating saving annually.


Where will these savings come from? New sensors intricately track how the engine is operating.  The use of data fundamentally transforms how the engine operates and makes it more efficient. But that efficiency requires a lot of data. The new version of the Leap aircraft engine generates 1 TB per day from those sensors alone. Add in avionics, traffic data, weather data… a massive amount of information is generated just from taking a flight. In previous versions, the Leap engine has completed more than 18 million commercial hours of operation, with some 22,000 of the engines manufactured. So we’re talking about a lot of data.

In every way but one, this engine now operates with a smaller footprint: it requires less fuel, it makes less noise, it generates fewer noxious emissions, it costs less to operate. Only in one area, data, is its footprint expanding.

And this is why we can each expect our own data footprint to grow enormously in the years to come. We aren’t creating all this data just for the sake of creating data. Big Data is bringing us big benefits, and this is occurring throughout the business world and in our personal lives.


Consider the improvements that big data is bringing about in a wide variety of business settings.  Retailers are benefiting from rapid sales analysis and response as well as a customer-driven approach to supply chain. Discrete manufacturers are enabling real-time operations and global product traceability both upstream and downstream.  Makers of consumer products are tracking customer buying behavior in real time, and radically changing their procurement and manufacturing processes. And, like the aviation manufacturers mentioned earlier, oil and gas companies are improving asset efficiency and integrity.


Closer to home, we are all accessing, modifying, and generating data all the time. Taking a drive across town, we access vast geospatial databases and generate new data in our vehicles’ (more modest) versions of the sensors in the Leap aircraft engine. Filling a prescription, we tap into an interconnected set of pharmaceutical and insurance databases. Our casual tweets and status updates feed into the global social media colossus. A trip to the supermarket might involve a quick smartphone search for product information, accessing enormous search engine and consumer databases and giving us each the opportunity to generate even more data. After all, it is our shopping behavior that provides the data points feeding into the retailer and consumer goods analyses mentioned above.

At every turn, we are accessing data, modifying and updating data, and generating new data. Between ultrasounds, social media mentions, YouTube videos, and online gift transactions, a newborn in 2014 has a substantially bigger data footprint than an adult did a couple of decades ago. As that child’s world becomes ever more dependent on and integrated with Big Data, that footprint is going to grow and grow.

How far will this go? That is difficult to predict. But as long as Big Data continues to be tied with providing new options, improving existing processes, and opening up new capabilities -- and as long as our infrastructures can continue grow to support this data -- there is no end in sight.

Blog article also posted on SAPHANA.com website.

What were the top Big Data news stories of 2013? The usual pundits have been busy over the past week or so weighing in on this question, providing lists that include software releases, corporate acquisitions, new strategic directions, new offerings in the cloud, Hadoop without MapReduce -- all the things we would expect. But are these really the top stories? Perhaps by looking through the industry lens, we’re missing the real news.


Look at the top mainstream news stories of 2013 and you will see that Big Data is becoming a part of our social fabric. Worldwide, the top news stories included the devastating typhoon in the Philippines, the civil war in Syria, the birth of a prince, the election of a Pope, and the passing of a man who led his nation into a new era of justice and freedom. Top stories in the U.S. would include the roll-out of the Affordable Care Act, the Boston Marathon bombings, and the NSA scandal. Although none of these are Big Data stories, or even necessarily technology stories, each has a very real connection to the world of Big Data. 

Increasingly, intensive analysis of social media and other communications content is driving our understanding of the major news stories of the day. Thanks to social and mass media channels, these stories become massively shared experiences.  Collecting and digging into such content is a large-scale data mobilization effort. Such analysis provided context in understanding the impact of events as diverse as the birth of Prince George and the death of former South African president Nelson Mandela.


In the intelligence community, this kind of analytical effort is referred to as “chatter analysis,” and it is a critical component of modern intelligence work. As the tragic events in Syria unfolded, chatter analysis proved indispensable as it became clear that the conflict was driven in large part by an underlying information war. While each side in the conflict used traditional channels to promote its interpretation of what was taking place hundreds of thousands of Syrians used their mobile devices to report their individual experiences and to help shape a more complete understanding of events as they occurred. That sharing of information came at a high price to some, whose digital footprints made them trackable by the opposing side. Ongoing analysis of social media and other communications content continue to drive our understanding as events unfold both in Syria and the surrounding region.


In the US, intelligence-gathering of another sort took center stage as details from the 1.7 million National Security Agency (NSA) files leaked by former contractor Edward Snowden began to be made public. The NSA scandal at least bears a strong resemblance to a Big Data story, raising as it does major questions about privacy and data ownership. The NSA scandal is driving critically important debate within the public and the courts that will, in all likelihood, lead to a whole new approach to regulating and enforcing data privacy. As the idea that everyone leaves a digital footprint gains broader understanding, questions about who (if anyone) should have access to the various pieces of data that make up that footprint are becoming more urgent. Everyone in the business of collecting and analyzing data stands to be impacted by the answers that emerge to those questions.


Meanwhile, there can be little doubt that Big Data plays a rapidly growing role for both the intelligence and the law enforcement communities. In the investigation and manhunt that followed the Boston Marathon bombing, law enforcement officials were in many ways as reliant on computer technology as they were traditional methods. However, a subsequent review of the investigation showed that enhanced Big Data capabilities might have produced faster results. So demand for increased government access to and use of digital footprints is occurring in one context, while demand that such access be severely curtailed is occurring in another. Quick and easy resolutions to these conflicting sets of priorities seem unlikely.

But not all use of Big Data raise such seemingly intractable conflicts. In the case of Typhoon Haiyan, Big Data was pivotal in enabling relief efforts throughout the Philippines. An interactive map which synthesized geo-spatial, demographic, and social media data guided relief workers to the areas with the most urgent need for help, providing the most expeditious routes to these troubled zones. And a GPS-enabled asset-tracking system helped to ensure that resources were deployed to where they were needed the most.  In a completely different vein, and demonstrating the diversity of applications for Big Data capabilities, odds-makers used advanced analytics in an attempt to predict the outcome of the Papal election in March.


Finally, in the U.S., the implementation of the Patient Protection and Affordable Care Act (ACA), commonly referred to as "Obamacare," proved to be less of a Big Data story than expected.


Obviously, launching a new national healthcare system for a country with population of more than 300 million has Big Data implications. The system had to accommodate the tens of millions who currently don’t have coverage, but would also impact the hundreds of millions who do. After all, many of those individuals would be expected to end up on the healthcare exchanges themselves one day and, in any case, it would be necessary to ensure that their existing coverage was compliant with all the new regulations. Such a system would require a whole new infrastructure for managing healthcare data. Each participant’s full history of medical conditions, treatments, and providers would have to be consolidated into one easily portable data profile, a profile made up of information that would now be accessible to more sets of eyes than ever before. The Big Data implications were enormous.


The expectation was that healthcare was about to become "the new financial services." In the financial services space, thousands of entities work together to provide an infrastructure that enables individualized credit ratings and simplified local, national, and global funds transfer. Now, in the healthcare space, thousands of entities would work together to provide an infrastructure to enable widespread analysis of treatments and outcomes and easy transfer of complete medical history from provider to provider.

But when Healthcare.gov was launched, a very different story emerged. That story centered on the basic operation and security of the site, and the tremendous difficulty encountered when attempting to get applicants through the registration process. Technical observers clustered around a consensus that the basic infrastructure of the system that was 10 years (or more) out of date -- a Web 1.0 solution for a Web 2.0 world. Clearly those issues would have to be addressed before Big Data could even enter the picture.


Not that long ago, the statement that all commerce is e-commerce would have been have been laughable. Today it is simply a straightforward description of how things have evolved. The notion that all companies are software companies would have seemed even more absurd. But now that preposterous idea is offered up as a commonplace. As technology becomes more deeply embedded in the fabric of society, we are approaching the day when it might well be said that all news is technology news, or even that all news is Big Data news.

In 2013, that was still not quite the case, although Big Data is increasingly becoming ingrained into the fabric of our societies, as evidenced by its role in many of the major stories of the year and in how we learned about and came to understand virtually all of them. We can expect these trends to continue in 2014 and beyond.

This blog has also been posted on saphana.com

At the end of October we announced the general availability of five new Big Data-enabled applications. You can read more about them in our Big Data press release. Essentially, each application handles a different aspect of an organization’s relationship with their customers.


Four of the applications are part of the Customer Engagement Intelligence family of solutions, each managing different aspects of the customer relationship through the sales and marketing process. The fifth application, Fraud Management, deals with customer relationships that are, shall we say, more acrimonious! Nonetheless, combine these five applications with Demand Signal Management and Sentiment Intelligence and you have a whole suite of solutions that leverage Big Data to transform the customer relationship.


Of course, Big Data can do more than improve customer relationships. SAP recently announced collaboration with the International Barcode of Life (iBOL) initiative to help understand our relationship with our planet.


The International Barcode of Life initiative is a consortium of institutions across 25 different nations, collecting and analyzing organism from all regions of the world. Their mission is to create a library of DNA sequences and habitat information for every multi-cellular species on the planet, facilitating their rapid identification.  This is an ambitious project when you consider there are an estimated 10 to 100 million different species. But the benefits to society are huge. Information about species is used in everything from managing water and food quality, protecting our ecosystems and agricultural land, to human disease control, monitoring transmission of viruses, and more.


But it is an urgent project at the same time. Research estimates that one third of all species will be extinct by 2100. If you assume there 100 million species in the world and that they will go extinct at a uniform rate over the next 86 years, it means a staggering 1,000 species are lost to us every single day. Yes, you read that correctly: 1,000 species going extinct, daily!


SAP and the International Barcode of Life project are collaborating to help build an application to crowd source the collection and analysis of all of this information. The goal is an application that anyone can use on a mobile device to collect a species sample, send it off for testing, and get the results back ... and to do this from anywhere on the globe.


Barcode of Life - Splash screen.pngBarcode of Life - Home screen.png
Splash screenMain menu

Barcode of Life - Sampling process.pngBarcode of Life - My samples.png
DNA collection processList of samples collected by user

Barcode of Life - Sample information.pngBarcode of Life - Sample image and barcode.png
Detailed information about a species identified by a specific DNA sampleImages and DNA barcode for a species identified by a specific DNA sample

Barcode of Life - Groups.pngBarcode of Life - samples map.png
Home page for a specific group crowd sourcing the collection of samples in a regionMap of DNA samples collected by a group



The screenshots above give you an idea of just how easy we are hoping to make this - and to be able to use by anyone, even those in regions where desktop computers are unavailable but where mobile devices often are ubiquitous.


This application designed to crowd-source the collection of a massive data set, provides a rich source of information about our world – one that we want to analyze. The second part of this co-innovation project involves mining all of this data using SAP HANA and SAP's analytic tools. Through this we will gain insights into the genetic variation of species, their evolution, and their migrations resulting from climate change.


Whether it’s with our customers, the human species, or any other kind of species, there is no doubt that Big Data gives us the opportunity to understand all of our relationships.

What companies want are real, tangible results from any Big Data project. Yes, I understand I am stating the obvious. Yet judging from the many Big Data solutions entering the market today, it’s not clear that everyone does. So many vendors seem to equate Big Data with Data Warehousing 2.0 + Data Mining. But that falls short.


For any Big Data project to deliver real results it needs to:


  1. Align to business priorities,
  2. Integrate with business processes, and
  3. Operate at the pace of business


Any approach that doesn't meet these three requirements is just "Big" or just "Data" – neither of which is particularly interesting on its own.


Business Priorities

Linking the business imperative to all your data, across all stakeholders with implicit contextual value, is the only real way customers will be able to monetize the Big Data investments. If customers are observing business misalignment, they should consider "re-booting" the project immediately or suffer the consequences of sub-optimal results.

Unfortunately, many companies start by collecting data (and lots of it), budgets get depleted and then in an attempt to justify the costs they try to more directly link it to a business problem. It’s like pushing water uphill. Instead, go with the flow and make the business priority the starting point.


The best way to get started in the right direction is to engage data scientists who know your industry, and can make the link between your business, your data, and your IT.


Business Processes

The organizations that achieve the greatest results tie Big Data insights directly into their business processes and their people, allowing them to act upon insights in day-to-day operations. Understanding Business Processes should be the next conversation, right after you understand the Business Priorities.


You want to be thinking about how to "Big Data enable" your business processes and enterprise applications, and how to equip your frontline workers with the insights they need. Understanding this will be critical to building robust Big Data architecture that delivers results.


Look for a comprehensive data platform and integrated analytics suite foundation that address all of your stakeholders and Big Data application use cases, and that can  seamlessly interoperate with existing and new business processes. Any Big Data vendor who only talks about data warehousing or data mining is falling far short of where you need to be.


Business Pace

In the context of business priorities and business processes, it becomes clear that the real IT challenge is to acquire, analyze and act on Big Data insights at the pace of business.


Employing conventional wisdom, traditional databases can indeed store petabytes of data. The problem is that when you start dealing with the expanding footprint of unstructured data and relying upon these traditional environments the conventional RDBMS system cannot keep up – in fact, it grinds to a halt! Switching on a traditional database’s in-memory feature may get the gears temporarily moving again, but it’s a duct tape solution that won’t last when data grows to the next level of complexity.


To keep pace with business Big Data architectures need to deliver instant results from infinite storage. That requires a truly in-memory platform like SAP HANA connected to massively scalable data storage like Hadoop.


Sorry but Big Data focused on analytics and data warehousing alone isn't enough. It’s nothing personal. It’s just business; and its priorities, processes, and pace!

With little fanfare, last week SAP made an announcement that it will resell the Intel Distribution of the open source Apache Hadoop project as well as HortonWorks Data Platform for Hadoop. While hardly page one news for major media, it is big news for enterprises seeking to exploit their big data opportunities. Let me explain.


First, big data, particularly unstructured data that comprise 80% of the information inside most organizations, presents new IT challenges, most of which overwhelm the vast majority of the traditional, row-based data warehouses installed today. Attempting to store and analyze big data effectively in these established DWs is, for the most part, a lost cause.


Hadoop, which is an efficient distributed file system and not a database, is designed specifically for information that comes in many forms, such as server log files or personal productivity documents. Anything that can be stored as a file can be placed in a Hadoop repository. 


Hadoop is also a young technology, not even a decade old. That’s why a mere 10% of respondents in a recent TDWI survey of enterprises say they currently have the file system running in their organizations. Most companies aren’t sure what other technologies Hadoop needs to be an effective tool in their data center.


Which brings me to my second reason why SAP's reseller agreements with Intel and HortonWorks are, well, a big deal for big data. Big Data has swamped most large enterprises, presenting CIOs with two pressing problems. The first is how to store dramatically increasing volumes of information of unknown value. Well, that can be solved with Hadoop, a proven, low-cost method for storing petabytes of data. However, once stored, trying to move the data from Hadoop into a traditional data warehouse can take weeks to process before it’s ready for analysis. And even then, because the amount of data is so vast, analysts often need to remove much of the detail so as not to bring the old DW to a grinding halt. But because of the integration work we’ve done with Hadoop and the SAP HANA platform all the data that analysts need can move seamlessly between the two systems when they want it, not after weeks of processing. By combining the all in-memory, columnar SAP HANA database with Hadoop in the data center, CIOs are able to deliver infinite storage with instant insights.


Finally, there’s one more subtle benefit of our news announcement for the Hadoop community and IT. To succeed in the marketplace, emerging enterprise systems like Hadoop need established vendors to embrace them, otherwise most CIOs will not deploy them in their data centers. With SAP fully committed to Hadoop through these reselling agreements, CIOs understand that they can embark on cutting edge big data initiatives with state of the art technologies that are fully supported by a single, trusted vendor, SAP. With these partnerships, we minimize a CIO’s risk. We eliminate the problem of whom to contact for Hadoop support. It’s SAP.


In addition to the low-risk we offer enterprises Hadoop, we deliver choice. Being open source, Hadoop has many iterations to choose from; one might say, too many. So, by delivering full support to the Intel and HortonWorks distributions, we have done the hard work for you to determine the best enterprise-class versions of Hadoop for your data center. But you can still choose based on your organization’s needs. 


At the same time, SAP has integrated Hadoop with the SAP HANA platform, SAP Sybase IQ software, SAP Data Services, and SAP Business Objects, making it possible to conduct sophisticated OLTP and OLAP operations with both structured and unstructured data. 


Sometimes it’s the little things that make life easier for IT managers. That’s why this modest news announcement is such a big thing. By SAP reselling the Intel and HortonWorks distributions of Hadoop we provide a single point of contact for a complete technology stack that delivers the best performance so every enterprise can use all of big data to improve their business.

Irfan Khan

Serious Serendipity

Posted by Irfan Khan Jul 16, 2013

Whether it’s the discovery of penicillin or the invention of the chocolate chip cookie, some notable innovations in history occur serendipitously. But in both these cases, at least, it was serious serendipity. That is, Sir Alexander Fleming was researching ways to control Staphylococcus bacteria when he stumbled upon the penicillin mold that killed it; and Mrs. Ruth Graves Wakefield was baking a batch of cookies when she created her tasty treat.


I mention this because as more and more enterprises explore the value hidden in Big Data, innovation of all types will flourish. Most of this innovation will come from the intentional analysis of the data. SAP customers do this every day by querying and exploring massive data sets in products like the SAP HANA platform or SAP Sybase IQ database and uncovering sought-after insights. But sometimes a key value in Big Data is discovered inadvertently.


Take, for example, Google’s discovery of its spell checker. As related in Big Data: A Revolution That Will Transform How We Live, Work, and Think, a new book by Viktor Mayer-Schonberger and Kenneth Cukier, engineers at the company discovered that they would not need to license spell checking software from a third party for Google Docs and other applications because it had a trove of spelling data from all the typos hundreds of millions of people had keyed into its search engine. Not only did the company have virtually every conceivable misspelling of a word, they had it dozens of languages available to them for nothing inside their vast data pool.


One of my favorite stories of serendipitous discoveries of value in data comes from one of our customers. Deloitte, which is the world’s largest consultancy with 200,000 employees, needed to comply with tax regulations related to expenses for its globe-trotting consultants. Aligning thousands of expense reports with the appropriate tax geographies was exceedingly complex and included huge data sets.


Before Deloitte deployed SAP HANA, the consultancy was able to align only about five percent of the expense reports with the various tax geographies. With SAP HANA, it’s now 100 percent. But, better still, Deloitte quickly realized that other global firms also needed to comply with the same tax authorities that it did, so the firm now plans to offer the expense and tax audit as a new service to its clients. So, serious internal analysis of its data has led serendipitously to an external revenue source.


So much of what companies do with their vast and growing data sets will be laser focused, targeting intentional outcomes for anticipated benefits. But there will be many more stories like those from Deloitte and Google, where serious value is serendipitous, delivering additional benefits than those planned. Those companies will have something extra to celebrate from their analytics work. We can hope that there will be chocolate chip cookies at the celebration.

Irfan Khan

Beat the Clock

Posted by Irfan Khan Jun 14, 2013

What would you do to get an extra 30 milliseconds in your business day? Well, if you work in the capital markets, plenty.


After all, this is the industry that inspired not one, but two enterprises to lay cable across the Arctic Ocean last year in order to shave around 20 milliseconds between trading centers in London and Tokyo. And, for years now, trading firms have paid co-location fees to exchanges to be as close as possible to computers executing trades in order to cut distance and, therefore, time for each trade. No doubt that explains why in a recent survey SAP conducted of top IT infrastructure issues in the capital markets low latency led the list, trumping other critical problems like access to global markets, risck controls, and even co-location.


It’s not rocket science to understand why traders are so obsessed with saving a millisecond every chance they get. According to the London Financial Times, in 2012 the Quantitative and Derivative Strategies group inside Morgan Stanley estimated that 84% of all buying and selling on U.S. markets was done through computer-to-computer trades, up from around half in 2010. Thus, any advantage a trading firm has is not in the waving of hands by a savvy trader on an exchange floor; rather, it’s the ability to shave milliseconds during machine-to-machine communications.


But it is, in a way, rocket science to know how to save those precious clicks of the clock. You need to have an information infrastructure that is capable of processing and analyzing enormous amounts of data in real time, taking advantage of all the efficiencies and optimizations that are possible from, if you will, the co-location of a server’s memory and processor. Effective trading algorithms need to consider a wide range of information sources from news analytics and volatility conditions to multi-asset risk modeling and sentiment analytics, then act of the results; not in some report that prompts a trader to consider buying or selling, but by the system in real time when that momentary advantage is there.


Let’s take a fictitious example of high-frequency trade on the NASDAQ exchange, where computer-to-computer trading is the norm. With a 30 millisecond advantage a trading firm can buy stocks likely to be bought after the opening bell by, say, a mutual fund or any other entity without the time advantage. In this example, the trading firm’s algorithm identifies an equity and acquires 5,000 shares of it at $21 per share within 50 milliseconds after the NASDAQ opens. It then immediately sells those shares to the mutual fund for $21.01, a penny-per share profit, or $50 in this example.


For those outside the capital markets, that might not sound like much of a profit for such technology investment. However, as those in the industry understand, with nearly 8 million trades per day on NASDAQ alone, the opportunities for profits are compelling. With 3,600,000 milliseconds every hour, theoretically a trader with a 30 millisecond edge has 120,000 opportunities every hour of the trading day to make a penny or two per share. To say that it all adds up to real money is an understatement.


SAP HANA delivers that 30 millisecond advantage to traders in high-frequency markets. It also reinstates the functional elegance that many firms strived so hard to achieve and have seemingly lost due to the technology debt that most trading platforms are plagued with today. It seems logical to me, then, that if traders are willing to pay a premium for high-speed trans-Arctic networks or spend millions annually for co-location facilities to save precious fractions of a second for each trade, the benefits of SAP HANA are obvious when little bits of time equal lots of money, particularly since SAP HANA can be implemented in a non-disruptive manner.


                                                                                          *               *               *


Please join me Tuesday 18 June in New York at the Securities Industry and Financial Markets Association (SIFMA) annual conference, SIFMA Tech 2013, for my keynote, where I will discuss the next generation of intelligent trading applications using In-memory computing.

Over the past couple of years, SAP has been galvanizing its strategy to execute across all deployment fronts: on-premises, on-device, and, of course, “on-demand." This latter effort has resulted in targeted acquisitions for serving cloud computing customers in the form of SuccessFactors and more recently Ariba Network. This week we further strengthened our position with the announcement of our SAP HANA Enterprise Cloud service. This technology radically raises the bar for performance and data capacities for business applications in the cloud. For the first time customers will have the power of SAP HANA platform to run a wide range of SAP applications, such as ERP, CRM and business warehouse software, off premises. (Of course, you can still run them in your own data center.) Conventional wisdom dictates that critical applications such as these will never run faster in the cloud, especially those dependent upon Big Data workloads. But that so-called wisdom will change with SAP HANA Enterprise Cloud service’s performance. It will turn heads inside IT and among users.


That’s because the SAP HANA Enterprise Cloud foundations are built upon a next generation SAP HANA database, fit for 21st century workloads and user profiles. It was created this century for the modern multiprocessor, multicore, multithreaded world of today’s cloud computing data center. SAP HANA was designed to excel in a low-latency, Big Data environment.


That unprecedented performance in the cloud will help make SAP a leader in cloud computing.  But another key reason I believe SAP will lead the ranks among cloud vendors for business is, well, we know business applications better than any other vendor. For more than 40 years the company has relentlessly focused on delivering value to more than 288,000 business customers around the globe by providing software that streamlines internal operations while simultaneously opening up new business opportunities. It's an advantage no other cloud vendor possesses.


Our deep knowledge of business requirements has informed how we have implemented our cloud-based applications. Our experience helped us unravel the complexity of implementing true business applications in the cloud. Customers will be able to get up and running faster with the features they need to extract immediate benefit from the service. Plus, IT managers can be confident in recommending the SAP HANA Enterprise Cloud service, knowing that our data centers will be run by experts who understand the SAP environment from top to bottom, assuring not just exceptional performance but applications reliability as well.


The demand for cloud services today is strong. But the real growth is just ahead of us. Forrester estimates that the market for cloud computing will grow at least five times its current level to more than $240 billion annually in just seven years. That growth is happening because cloud computing appeals to organizations from large to small. It’s a quick way to get IT resources and applications on-demand without having to make capital investments in new hardware and software. You simply pay a subscription fee as an operational expense, point your browser at the service, and begin getting value.


The SAP HANA Enterprise Cloud service dramatically alters the landscape for cloud-based business applications because of its underlying in-memory, columnar database. As the New York Times reported, our SAP HANA “product has been a big hit” and it is at the core of our cloud strategy. It easily handles terabytes of customer data on-premises, processing it in real time for financial services, healthcare, retail, manufacturing, and every other market segment SAP supports. In our cloud data centers terabytes quickly will become petabytes, but the performance will still be blazingly fast. In fact, our pre-announcement customers already have loaded more than 750 terabytes on the 30,000 servers into the SAP HANA Enterprise Cloud service. But it is ready for even more.

Irfan Khan

Taming the Big Data Beast

Posted by Irfan Khan May 3, 2013

Your business data is seldom located in a single repository. Worse, it is increasingly stored in numerous formats—from text and spreadsheets to video and audio files, owned by multiple stakeholders and geographically distributed. Then there’s the staggering amount of data, which IDC predicts will balloon 50% this year over 2012. That, in a nutshell, is the Big Data problem companies face today. Too much information strewn about too many servers in too many formats.


Compounding the Big Data dilemma is time. To be precise, real time. Increasingly, data flowing into an organization needs to be captured, accessed, analyzed, and acted upon in real time. The business forces driving time-to-decision expectations are accelerating just as the data volume, variety, and velocity issues are increasing. For many CIOs, taming the Big Data beast is both the scariest problem they face and the biggest untapped opportunity in front of them.


The SAP Real Time Data Platform approach to Big Data is the most complete coverage model  in the market. While the SAP HANA in-memory platform is at the center of the SAP RTDP offering, its unique smart data access capability lets enterprises create a federation of disparate data sources, delivering the highest performance possible for both analytical and transactional workloads. For example, SAP HANA smart data access means queries can be optimized for and then executed on servers in the federation that are ideally suited for the task, essentially pushing down processing closer to where the data physically resides.


With hundreds of SAP HANA installations already deployed, there are plenty of real-world examples of companies who are getting the upper hand on the Big Data fiend. Let me just point to one. An early adopter of SAP HANA technology in 2011 was Minneapolis-based Medtronic. The medical device manufacturer collects massive amounts of data about product reliability. Information arrives as commentary in text form from customer interaction and employee feedback as well as structured database feeds from various sources such as government entities. Reports based on these different data sources are needed up and down the organization and are used to continuously improve product quality.


The rapid growth of data within Medtronic’s data warehouse was hurting reporting performance, which is why IT there turned to SAP HANA. Its success at handling the vast volumes of structured and unstructured data in the area of product reliability has inspired Medtronic to, among other possibilities, potentially combine SAP ERP information with CRM data held in non-SAP sources.


As I’ve written here before, the SAP RTDP architecture is not limited to SAP branded applications. So SAP HANA smart data access lets an IT team integrate a network of large and disparate data sources, even if the information is residing in databases sold by SAP competitors, and even if it’s residing in Hadoop.


To that end, later this month at the SAPPHIRE NOW conference in Orlando, SAP and Intel will be showcasing how Hadoop and SAP HANA technology further advance the SAP RTDP vision. As most have already discovered, Hadoop is being adopted to manage and analyze unstructured data in today’s enterprise. At SAPPHIRE you’ll get to see it and SAP HANA run in a modern 10GigE Intel server cluster environment.


The Intel Distribution for Apache Hadoop is the first Hadoop distribution that has been designed from the silicon layer on up to deliver industry-leading performance, security, and scalability. It’s a perfect fit for the SAP HANA in-memory, high-performance database which also shares the same functional elegance of well optimized algorithms, cache efficiencies and MPP scaling derived from the latest Intel micro architecture.


If you are able to stop by during SAPPHIRE, you’ll also see, for example, how SAP Data Services integrated ETL capabilities can mine Hadoop-resident data and quickly load it into a SAP HANA database and combine it with structured data from other sources for blazingly fast contextual analysis. It’s so comprehensive and so quick, you might begin to think Big Data is not such a monster after all.

The biggest hurdle to competitive advantage can be conventional wisdom. Organizations too often conclude that because something has “always been done this way” that it is by definition the best way, maybe even the only way to do it. Sometimes that’s true, but more often it’s simply conventional thinking.


Even in enterprise IT, where change is rampant and open mindedness a must, people fall prey to conventional wisdom. A few decades ago, for example, many companies were slow to adopt networked PCs for business users, insisting, instead, on adding extra mainframe capacity or installing another minicomputer, anything except to put a real computer in users’ hands. More recently, tablets were considered mere “toys” for business by some, while others quickly turned them into innovative enterprise tools for a mobile workforce.


Today, there is an entrenched notion that combining OLTP and OLAP on a single platform “never works.” DBAs seem to be particularly encumbered with the conventional wisdom against running both transactional and analytic workloads on a single system. And those organizations that do combine them on one platform apparently only run OLAP at night when the OLTP operations are off.


SAP HANA undermines that aging bit of data center conventional wisdom. With its pure in-memory, columnar and row architecture, SAP HANA is ideally suited for both tasks. On the OLTP side, as Robert Klopp noted in a recent blog post, it “performs writes as fast or faster than disk-based systems.” Plus, it can handle simultaneous analytic queries faster than dedicated OLAP systems with little or no impact on operations, even if the query demands real-time transactional data.


While most people see SAP HANA as the platform of choice for Big Data analytics applications, SAP sees it as cost-effective solution for OLTP environments as well. That’s why new releases of the SAP applications will be able to take advantage of native in-memory and multi core parallelism for transactional operations running on SAP HANA.


Hasso Plattner has led research on the subject and has concluded that delivering real-time data and analytics on a single platform will revolutionize the way business leaders make decisions. He says, “that the impact on management of companies will be huge, probably like the impact of Internet search engines on all of us.”


Upending the conventional wisdom about running OLTP and OLAP together has an additional benefit. It saves money, which, according to IDG Research is the number one goal of CIOs when it comes to managing transactional systems. That’s because with SAP HANA as your single OLTP and OLAP platform you won’t need a secondary database; you will have less hardware to manage; and your labor costs will be lower.


It’s understandable to go slow when you first encounter disruptive technology such as networked PCs, tablets, and SAP HANA. These technologies upend long-held, commonplace beliefs. But if you go too slow in adopting these technologies, you let your competitors race ahead, potentially leaving your conventional wisdom and, worse, your organization in the dust.

Ever since Shakespeare’s young Prince of Denmark confronted his father’s ghost on a “platform” in Hamlet that remarkable word has enjoyed a rich etymological history, elucidating everything from nautical and military contexts to geology and politics. While flexible enough to be relevant in diverse fields, it’s precise enough to bring clarity to a particular item or concept.

In computing, for example, the Random House Unabridged Dictionary includes separate definitions for both hardware platform and software platform. These terms evolved as IT managers used them to describe their preferred compute environments such as Windows and Intel, HP-UX and Itanium, or Solaris and SPARC. SAP has always been agnostic as far as CPU and operating system platforms are concerned, permitting customers to dictate their landscape of constituent pieces. We still are. Compute platforms are only relevant to us (and our users, I’d argue) because they contain the diverse data sets our customers use to run their businesses. If data were, say, bread, SAP would be the world’s greatest and most successful baker.

Sophisticated IT executives have begun to adopt a new real-time and data-centric view of their compute environments. They’re less concerned about underlying server specifications or OS versions and have become single-minded in devising strategies for business data. They’re looking for a data platform that thrives in our new business climate where instant answers are sought from vast amounts of diverse data that pour into an organization every second.

SAP’s Real Time Data Platform (RTDP) is the first comprehensive approach to ingesting, managing, distributing, and securing enterprise information that is data-centric from top to bottom. It is a truly modern technology platform to solve IT’s most challenging business problems. With the SAP HANA in-memory database at the heart of the RTDP, IT managers finally now have an inclusive data platform that offers an end-to-end solution for today’s data-driven business, delivered in a non-disruptive manner.

That’s a bold statement. One worth repeating: SAP RTDP is an all-data inclusive and end-to-end platform delivered in a non-disruptive manner.

Historically, innovative, advanced technologies, like HANA, are often disruptive. However, SAP RTDP is architected as a data platform and not relegated further down the technology stack, so it does not require costly wholesale, forklift-style upgrades to your data center. It can be rolled out incrementally. And SAP RTDP is not just about HANA. It intrinsically incorporates established technologies such as SAP Sybase ASE, SAP Sybase IQ, SAP Sybase SQL Anywhere, and, yes, even third-party databases like Oracle and DB2. Needless to say, SAP’s Real Time Data Platform will run on virtually any preferred enterprise data center server and OS system. Finally, it truly is an end-to-end solution that spans OLTP, OLAP, predicative and social workloads, a complete development environment, and comprehensive deployments from business warehouses to mobile and embedded devices like smartphones, tablets and edge appliances.

I will dive into more detail about SAP RTDP in coming entries to this blog. But I’ll close this post with another bold claim. Putting SAP RTDP at the core of your data strategy will improve operational efficiencies, sharpen your competitive edge, and introduce growth opportunities previously unavailable. While Shakespeare may not have had RTDP in mind when he observed, “Make use of time, let not advantage slip,” his words would be a perfect motto for our platform.

Savvy CIOs are beginning to come to terms with “technology debt,” the burden of outdated technology they have accumulated over the generations of IT. The need for a fresh start is being driven home by architects and developers in droves who are rejecting the quaint notion of managing enterprise information through traditional databases. Instead, they are seeking to deploy a modern real-time data platform that is flexible enough to support the dramatically changed enterprise computing landscape and powerful enough to handle new data demands in real time.


I can’t think of a single organization that is not undergoing upheaval in its compute environment. And it’s happening faster than most people realize. For example, it took more than 25 years of PC sales before Gartner estimated that one billion personal computers had shipped by 2002. A mere five years later the two billion shipment mark for PCs was surpassed.


Our mobile world is moving much faster. Last year, according to Canalys, smartphone sales exceeded PCs for the first time. And IDC says that this year the combined sales of smartphones and tablets will exceed 895 million units, more than double the predicted the 400 million PCs forecast to ship by the end of 2012.


In addition to supporting vastly more powerful desktops and countless mobile devices, CIOs now need to factor in cloud computing services as well as huge troves of unstructured data found in social media. On top of that, IT departments are increasingly being told to deliver data securely to anyone anywhere and to do so in real time.


Yet, for far too many enterprises, the database infrastructure remains stuck in the architecture of the 1980s. Too often, it’s limited to delivering static reports culled only from structured data housed in overmatched traditional databases.


To be competitive in the 21st century, companies need to start thinking about deploying an enterprise  data platform rather than simply upgrading their current database. An enterprise  data platform is much broader than traditional database technology. It integrates different data management components and services  that have been optimized for specific business tasks. For example, while many databases can run analytics on structured data, a real-time data platform is a complete technology package that includes in-memory design, a columnar architecture and integrated tools like Hadoop. The result: analytics can execute on unstructured data in real-time.


The SAP Real-Time Data Platform, with its open APIs and standard protocols, offers federated access to an enterprise’s entire information portfolio. Through a single platform, IT gets powerful transactional data management, in-memory data management, enterprise data warehouse technology, analytics and mobile data management capabilities. Plus, total information management and real-time data movement services are foundational aspects of the SAP Real-Time Data Platform.


By adopting a real-time data platform instead of a traditional database approach, CIOs can respond quickly to manage information needs for the next upheaval in computing when it arrives. And, believe me, it will arrive.

Next month voters in the United States will cast ballots for the next president of their country, among other political races and referenda. What is different in this election year than in previous presidential contests is social media and what campaigns can do with the data generated there. Although Facebook, Twitter, and other social media sites existed in the 2008 election, they did not have the massive membership that they do today, and Google+, with its growing 400 million users, did not exist.


In 2012, social media sites have jumped into the political arena with both feet. For example, Facebook has partnered with CNN for an “I'm Voting” app, both to encourage people to vote and to generate data about voter preferences. Twitter is using sentiment analysis of tweets to create its Twitter Political Index, a daily snapshot of how tweeps “feel” about the two major party candidates.


All this information from potential voters themselves is unprecedented in American politics. And it's persuaded some political analysts to salivate at its potential to sway each voter toward their candidate. As one political consultant told Forbes magazine, “Big data enables very precise narrowcasting of messages to target individual voters. That also enables one-one-one communication, and you're more likely to get a response from a targeted voter.”


The problem is that the response from that targeted voter might just be the opposite of what a campaign manager might expect. Recent survey data from the Annenberg School of Communications reveal that an astonishing 86% of Americans say they abhor political promotions tailored specifically to them as individuals. That figure is well above the 61% who reject product and service advertisement directed to their person.


Worse for campaigns awash in big data, 64% of potential voters say such individualized political advertising would decrease their chances of voting for the candidate who targeted them. Further, if a candidate sends advertisements to their friends if they themselves “like” a candidate's Facebook page, the likelihood of voting for the candidate decreases among 70% in the survey.


Privacy and trust are at the center of Americans' concern here. Many are already leery about trusting their information privacy on social networks. So, given that politicians are overwhelmingly viewed as “the least trusted profession” in the U.S., exploiting the targeting power of big data with political promotions in this election cycle looks like a lost cause.


Filter Blog

By date:
By tag: