Since I am not attending SapphireNow in Orlando this year, I could spend my blog time this week describing some of the things the SAP Co-Innovation Lab is working on with partners that will be featured at the event this year, but since this week is already overwhelmed with such news, I'm going to instead plow into providing some more content on one of several new projects here in COIL Palo Alto that is keeping us super busy this year.
I now have multiple favorite projects among those which have spun up this year in COIL Palo Alto, but let me start by talking about one in particular we are doing with SAP NS2. This was a project first described to us by its requestor, SAP NS2 CTO David Korn as “Big Data Fusion” during last year’s SAP TechEd. The project was designed from the start to drive deep insight from structured and unstructured Big Data in an on-premise or cloud-based environment that would be of high interest not only to the more popular three-letter agencies but largely to anyone with similar needs in any highly regulated industry.
I’m going to touch on some of the different dimensions of the project in this blog post but you can learn more from some of the key project participants first hand by listening to the COIL Early View Podcast.
One of the project’s main goals is to prototype some new capability to extend an existing SAP RDS solution calling for an integration of multi-source (social media, sensor, location, image, unstructured, structured, etc.) data to perform geospatial event analysis based on the aggregation of: person of interest, location of interest, activity and semantic analysis.
COIL has enabled other Big Data projects over the past 12-18 months but what I like about this project is that is features rich partner collaboration where Cisco, Critigen and Encryptics are all working with SAP NS2 to develop a solution meant to address a variety of business challenges like:
- A need to analyze large data repositories to answer the 5 W’s (who, what, when, where and why) in a trusted, secure manner
- To provide an "off-the-shelf" capability that can be attached to any data source(s) to provide this capability. An architecture and set of capabilities which lets
the client focus be to "build" their analysis instead of building a complex infrastructure and then create the analysis
- Make this capability something that can be provided in a cloud or on-premise delivery model
- Deliver this capability as an SAP RDS
- Allow SAP to demonstrate a complete life-cycle of secure, Big Data information processing and analysis
- Extend a cloud and big data solution set combining an integrated hardware and software components to solve activity based intelligence requirements
This diagram provides a good overview of the architecture underlying the solution meant to address each of these challenges-
The diagram gives us a comprehensive view of the overall architecture but we have a few others which I don't have time to share here that drill into more detail but these will certainly be included in some of the future white papers that will be published over the next couple of months.
This project effort seeks to provide a working capability that implements this reference architecture for secure distributed activity based, geospatial intelligence analysis.
What is Geospatial Intelligence (GEOINT) you might ask? GEOINT data sources include imagery, full motion video (FMV) and mapping data, collected by either commercial or government satellite, aircraft (UAVs, reconnaissance, commercial aircraft or by service subscription), or by other means, such as maps, demographic databases and open source databases, census information, GPS waypoints, utility schematics, or any data about infrastructure or events on earth.
Human intelligence (HUMINT) gathering means to collect intelligence by means of interpersonal contact. During WWII, HUMINT was factored in with Signal Intelligence (SIGINT) which as an example, might be an increase in radio communications right before an enemy’s ships left a harbor which might indicate fleet movement or some form of supply chain activity. The person listening in to this traffic may not be able decipher the encrypted transmissions but just the action of so much communication suddenly occurring could suggest an activity of interest. Similarly Communications Intelligence (COMMINT) was woven into the intelligence data gathered and intelligent communities looked for any correlations between all of the different events.
Now speed ahead to the 21st century in an age of continuous creation of petabytes of e-mail, text messaging, audio/video surveillance and now social networking, HUMINT data can be now be gathered, sifted, collated and analyzed to gather useful information to support a mission need.
Why is this so important? Within Homeland Security, the Deartment of Defense Intelligence Community (DoD IC) and across the DoD, there is a perfunctory need for these organizations to securely integrate data for intelligence analysis which can then be securely distributed to the edge where the information is needed most. (the “edge” in this instance being personnel like law enforcement and other field agents).
Today, most of the data is collected via a variety of methods where it is typically relayed back to a central processing and analysis center and then distributed for community use. In many cases it is unfortunate that where the data collected could be processed locally, there is no existing community infrastructure or it is one not capable of supporting such functions. Globally, the DoD and MoDs will spend billions of dollars extending their existing infrastructure capabilities to gradually achieve a truly mobile, distributed capability.
The main goal of the solution envisioned by the COIL project team is to answer a set of very basic questions that requires sifting through petabytes of information;
A person of interest (who)
is going to perform an activity (what)
at a given time (when)
at a given location (where)
for an unknown reason (why)
SAP NS2 has been involved in an R&D activity with one of its customers which invariably led to prototyping a virtual analysis appliance (IQ, BOBJ, Data Services, Text Analysis) integrated with open source applications driven by social media data from the internet that can be applied to any data source(s)
for intelligence analysis.
SAP NS2 looks to create through this COIL project, an SAP Rapid Deployment Solution (RDS) to address the activity based intelligence platform needs through a coupling of SAP and partner technologies. At the heart of SAP’s architecture for OLTP and OLAP processing is HANA. HANA is an in-memory database appliance that can perform high speed in-memory transaction processing (i.e. SAP Business Suite) and big-data analytics on the same data without the need for an ETL process to load a separate data warehouse and have the ability to scale to petabyte data stores.
When such an appliance further incorporates ESRI to create the concept of geospatial data marts to enrich the analysis abilities of SAP Business Intelligence, this begins to provide a true integrated capability of text analysis, geospatial analysis and traditional BI/BA in a single user interface.
The next step in the evolution of this new reference architecture and one being included as part of what this project expects to yield, is to add a complete secure mobile capability that incorporates position location and the creation of a mobile application factory that can distribute applications to the edge that will allow field analysts to perform intelligence analysis on data that is immediately collected and to process previously captured data locally instead of waiting for some other organization to perform this task.
What the SAP NS2 project leads desire as output from this COIL project is to integrate three related solutions into a single offering that can be used across DoD, Federal Intelligence Community, State/Local Intelligence communities and Aerospace/Defense community.
The integrated solution comprises the following:
- Activity Based Analysis solution (which has been previously prototyped by NS2) which consists of IQ, Data Services, Text Analysis and Business Objects integrated with ESRI (ArcGis) to provide a geospatial information analysis platform which leverages social media as the primary data source to analyze events and determine a course of action to handle these events.
This is based on the aggregation of: person(s) of interest (POI) or WHO is involved in the event, definition of the event or the WHAT, date of the pending event or WHEN; what are the locations involved with the POI(s) and events which is the WHERE; and sentiment analysis of data gleaned from social media or other data system to determine the WHY will the event occur.
- RealTime Situational Awareness (RTSA) Rapid Deployment Solution (RDS) which is a command/control appliance based on SAP HANA which has incorporated the NIEM (National Information Exchange Model) object information model into ar elational execution schema which operates within the SAP HANA appliance. The NIEM standard is sponsored by DHS, FBI, DOS, State, Local, tribal and international emergency response and intelligence communities.
- Both of the aforementioned solutions have a desktop and mobile component. SAP NS2 wants to integrate its secure mobile platform to the mobile aspect of the solution architecture. This will allow for secure communication between the server and mobile device and protect data at rest for the mobile device. The secure solution includes mobile device management (MDM) and a mobile enterprise application platform MEAP).
Situational Awareness Use Case Scenario:
One of the things about this project which first grabbed my attention was its proposed use case which centered around a capability to assess the activity of food trucks in the Washington DC area based upon classification, frequency of activity, and sentiment of customers with respect to their location(s) over a given period of time.
The use case is interesting because it is plausible that a foreign threat to the homeland could somehow leverage a food truck as a clandestine way to trigger an act of terror against US citizens and public or private property. The other fascinating thing about this use case is how easy it is to demonstrate an effective use of public data in which to help glean new actionable insights.
From gathering this information, an analyst can look to uncover patterns of behavior of food trucks, where they usually are, and “atypical” locations of food trucks. By using the predictive analytics functions within HANA, the analysts can project which food trucks and food truck classifications will yield higher sentiments and more activity using time series analysis (i.e. double, triple exponential smoothing);
Using an Apriori association algorithm the analyst can detect correlation of food truck proximity to other food trucks or pre-defined geo-fences; using the anomaly detection algorithm to detect when food trucks within a given geo-fence for a long period of time suddenly decides to move to a different location within the city. Through the use of these predictive functions, SAP HANA can create a specific analytics view that can be used to drive a native Netweaver HTML 5 application, a BOBJ dashboard and alert or a geographic overlay to be used by an ArcGIS application.
Given the proposed software architecture and implementation, Data services is used to capture data from web crawling publically available data from both Yelp and Twitter regarding the location of food trucks within Washington DC and the sentiments that the food truck patrons post on these social media sites. The use case then demonstrates how to turn this into actionable information-
a. Using the text analysis functionality within data services (in the next release, the HANA text analysis function will be used) sentiment analysis is performed on the data captured from Yelp and Twitter and ETL to HANA
b. In the next iteration of the demonstration scenario, HANA will perform continuous time series analysis (exponential smoothing) within a user defined geo-fence to predict the most popular food truck classifications and the associated food trucks; using the anomaly detection algorithm in tandem with the Apriori algorithm to detect when a specific food truck has left its “normal” geo-location to do business in a neighborhood that is not associated with the food truck classification (cuisine, i.e. an Asian food truck suddenly moves to a neighborhood where the demographics suggest that Mediterranean cuisine is preferred).
c. SAP HANA interfaces with ArcGIS to provide a geo-overlay
d. SAP HANA drives BOBJ dashboard for real-timesituational awareness analytics
e. SAP HANA interfaces with the ArcGIS mobile application on the iPAD via web service to allow a field operative to captureand log real-time sentiment events to update the HANA database and bobj analytics
There is so much interest and excitement surrounding this project that itin fact has already spurred follow on project discussions to further exploit future SAP HANA capabilities and to explore in greater depth how geospatial intelligence can be applied to other industries like retail and healthcare.
While there is always the challenge to identify the valid business case underscoring how this technology can be used, the point being made here is that this team has established a very compelling co-innovated reference architecture that is already proving what’s possible. I know from just this project alone, it is going to be a very interesting year at COIL.