Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
TammyPowlas
Active Contributor

Part 1 is here Share the Knowledge - Big Data and the Real-Time Data Platform Including SAP HANA and Apache Hadoop

Continuing on with HANA and Hadoop in this SAP TechEd recording:

Figure 1: Source: SAP

Big data starts about 500 million records – not because you can’t store it – it is when you start to query it and face issues

With HANA you can do billions of records, TB’s of data

Hadoop comes into the picture when you have 100’s TB’s of data

At some point you know, you are not putting it in HANA

HANA is real-time, and event stream processor.  You might turn to Hadoop when you have massive amounts of data to ingest.  Each machine is parallelized.

HANA has variety of data and push to Hadoop.  Hadoop gives you flexibility to handle all types of data including image processing.

Value is the “storage area” – data lake.  HANA is for High value with low volumes of low data.

You can offload historically to Hadoop.  Hadoop is not a database.  It manages blocks of data.

Hadoop vs. NLS?  On BW there is a Near-Line-Storage Sybase IQ option to unload data from HANA to guarantee data is there, consistent.  Right you now cannot do NLS in Hadoop.  Hadoop doesn’t have transactions.

Figure 2: Source: SAP

You can go from HANA out to other databases

Smart data access is the “glue”

You can create virtual tables in HANA that refer to tables in other databases

You don’t have to do syntax from other sources and you get richer semantics

You are pushing the processing down to the remote source

Smart data access will send data out to remote site

Automatic data translation is convenient as well.

Figure 3: Source: SAP

Smart data access is one way to connect the “worlds”.

On the left of Figure 3 is the consumption model, store and process, and ingest.

You can use the data in one of two ways – applications such as machine learning & predictive analytics (product recommendations).  Analytics use cases include dashboards, explorations (Lumira) – these can use HANA or Hadoop.

You can go from BusinessObjects to Hadoop

On the bottom you have ESP, replication framework, information management, and Data Services can operate with Hadoop.

Figure 4: Source: SAP

Direct HANA – Hadoop via Smart Data Access you have virtual data access.  Integration via ETL to move data but with TB’s of data you can move on a schedule but it is not interactive.  Data Services give you PIG with scripting.

You can use BI against HIVE using multi-source universes as of BI4.1 for scheduled reports.

Question & Answer

Q: How do you deal with the fact you have different response charactistics with the 2 systems?

A: With SP7 there is the remote materialization capability to cache queries – you are trading time for space (remote caching)

Looking at improvements to make it into Hive faster

Q: Smart data access works against different sources?

A: Yes, Teradata, ASE, IQ, SQL Server

Q: What distribution is certified?

A: SAP resells Hortonworks and Intel distribution

Hive .9 or greater is supported, and Hadoop 1

Q: Smart data access connection is used?

Uses ODBC; BI uses JDBC

3 Comments
Labels in this area