Data Services and Data Quality

April 2011 Previous month Next month

SAP BusinessObjects Data Services  - What is new in 4.0?


This was an ASUG webcast last month given by Sue Hay - Sr. Solutions Manager and
Ben Hofmans – Senior Product Manager of SAP.

Sue said the information and data is exploding. As a result, data becomes more fragmented and overwhelming.  Data is siloed, hard to get to, and field values are partially entered.  We need to start managing that data with a consistent foundation. 

image 

Figure 1 – Source: SAP

As shown in Figure 1, Enterprise applications, whether it be SAP or non-SAP need to come together.  We have legacy systems and new systems that have unstructured data such as content from the web, Twitter feeds and blogs.

To make sense of that data, to roll up information to assess what is going on, to understand trends across those systems, we need to filter out the "noise" that is happening in those environments.

image

Figure 2 Source: SAP

Figure 2 shows that Data Services unifies products into one single  solution set.  Data Services provides ability to move, capture, and  transform data.  It can be real time from a web site for example.  

image 

Figure 3 (Source: SAP)

Figure 3 shows support for business processes.  One example is a data migration from legacy systems.  Loading into SAP works well since Data Services understands the data models.

Data Quality Management will check for duplicates and cleanse data within the CRM and ECC/ERP environments.  Data can be checked at the point of entry to ensure addressing information is complete and valid, thereby preventing duplicate entries.

Customer and product wins include Kraft Foods, using Master Data Management and Data Services to ensure quality of system, which helped them win the Gartner MDM Excellence award. Peachtree Data won an Information Management award for their use of Data Services in a high-data-volume environment. 

image

Figure 4: Gartner Quadrant (Source: SAP)

Figure 4 shows the Gartner Magic Quadrant from last year. Data Quality has been an industry leader since Gartner started tracking the Data Quality market space.

Data Services 4.0 has key themes such as unlocking the power of information, improving and enabling information governance, and deeper integration.  A core data quality management SDK has been added to embed in your application.

image 

Figure 5 (Source: SAP)

In the past the products were separate. In 4.0, they now run on the same platform and seamlessly share metadata..

image  

Figure 6 (Source: SAP)

User Interfaces (UI) are split to serve the different users in your company. The IT Department technical users tends to use the Data Services Designer, and the Data Steward gravitates toward Information Steward. This is where sharing the same platform and metadata is key. Although the UIs are separate, the technical user and the Data Steward can now easily collaborate. In addition, there is one set of Metadata in one administrative environment, and one place for user administration.

image

Figure 7 (Source: SAP)

In Figure 7, on the bottom is platform layer, providing a common set of administrative functions to reuse those services. Set up just once and then reuse your data validation rules in Data Services. Also, Data Services is extended with incorporation of text analytics to do text data processing along with structured data.

image

Figure 8 (Source: SAP)

Text Data Processing within Data Services 4.0 leverages unstructured data combined with standard data to show what is happening in industry, track and proactively respond to significant events.  Sue said it allows you to "sort through the noise" of unstructured content - what is happening with who, when and where.

If structured data tells you “what”, then use unstructured data to track down “why”.

 

image

Figure 9 Location Awareness (Source: SAP)

 

As Figure 9 shows, with Data Services 4.0 the geocoding capabilities have been enhanced through location awareness.  Reverse geocoding identifies the closest mailing address using input latitude and longitude values. One primary use of this reverse geocoding is with mobile application. 

Geospatial search allows you to find related points of interest around geographic location within 3 miles how many coffee shops in this vicinity - retail industry, grocery...points of interest can be plotted on a map with a radius.

Geographic proximity matching will show how close these entities are to each other and find potential matches.

 

image

Figure 10 Data Management SDK (Source: SAP)

 

Data Quality Management SDK comes with the 4.0 release. Geocoding is included within the SDK.  The SDK has a light footprint.  The SDK is available in Java, .NET and C++ APIs.

What is new with Information Governance?  Information governance will provide a single data steward interface for data profiling, metadata management, data quality monitoring, allow you to create data quality cleansing rules.  Data quality monitoring is covered with data quality dashboards allowing the user to monitor information.

image

Figure 11 Information Steward - Source: SAP

 

Information Steward provides a single environment to discover through profiling, analyzing and cataloging information.  The Define phase determines business terms and establishes validation rules.  In Define data ownership, you create cleansing packages.  In Monitor & Remediate, you can monitor data quality and raise via scorecards that there may be issues.

image 

Figure 12 Cleansing Package Builder (Source: SAP)

Cleansing Package Builder is new, and uses data in your source system as a feed to understand related elements within a large description field.  Figure 12 shows a sample using drag-and-drop of how to standardize the data and help understand the different variations.  Then you publish in a cleansing package and that package is used during a data cleanse process, inside of Data Services.

image 

Figure 13 (Source: SAP)

Information Steward has scorecards to continuously monitoring what is happening with data quality levels.

Figure 13 shows a data quality metrics report, which displays what the current level of data quality compliance in your organization  Data Stewards see how data measures against quality rules, 4.0 provides encryption function to encrypt data.

Now on to platform integration (BI platform), which provides common user management between Data Services and BI. Data Services can access ECC and BW, and uses Solution Manager to monitor Data Services.  Data Services is the engine to load data into HANA.

image 

Figure 14 Common Server / Services Layer (Source: SAP)

BI Platform integration is based on the BI 4.0 platform and includes robust user management and security with one place to create users, and define what a user can do. Then a user can log on to BI or Data Services application – joint user management for BI and Data Services.  Before Data Services didn’t have user management; now it does.

Customers may be on different versions of BI and Data Services and these users will still benefit.  If you are not on BI 4.0 you can use mini-BI platform 4.0 as a free download called Information Platform Services.

image 

Figure 15 Source: SAP

Integration with Business Suite provides same access to Business Suite data as BW had in the past.

BW uses the business content extractors.  In 4.0 Data Services will also use the Business Content extractors.

This does not mean BW customers have to start using Data Services; SAP is not replacing the connectivity between ECC and BW.  Data Services is added as an option to the landscape and adds the option of accessing other 3rd party sources – one single tool to extract data.

With Data Services, you have the possibility to consolidate extraction rules into one platform. You can also make sure you can load correct and clean data into target application with the data quality components of Data Services.

You just need to know the name of the extractor and connect. No ABAP programs are generated, making it easier to deploy and maintain. 

In fact, there is a new private API on top of the ECC extractors for native connectivity.

image 

Figure 16: Source SAP

BW customers like using the BW workbench so they can use one interface.  In BW 730 and Data Services 4.0 users can create Data Services jobs without leaving the BW workbench.

Data Services is a new source system with data stores, and new connectivity to supported systems.

image 

Figure 17 Solution Manager (Source: SAP)

BI products are integrating with Solution Manager to help you monitor your whole SAP landscape.  Data Services 4.0 focused on System Landscape Directory (SLD) to register components on SLD so you can see which versions installed on which machine, and see diagnostics such as the load of server, CPU, memory use.

image 

Figure 18 Data Services with HANA (Source: SAP)

 

HANA is a way to store huge amounts of data, but you need a way to get data into in-memory appliance and this is where Data Services comes in.  There are 2 ways to load – 1) Data Sources with a batch load and 2) Sybase Replication Server, which replicates data when change happens in ERP system to HANA

Data Services is used for batch processing use cases.  HANA modeler will allow you to select a source system to load to NewDB using Data Services.  Data Services is the engine but the HANA modeler will do the work.

Summary:
Data Services is an all-in-one solution with both Data Services and Data Quality. It also is the engine for Information Steward. 


Question & Answer
Q: Is a list of new features available for Data Services 4.0 ??
A: "What's New in SAP BusinessObjects Data Services" guide for a full list.



Q: If we have config or development in our current text Analysis, will that migrate to DS 4.0 seamlessly?

A: Text Analysis 3.x standalone is still a different product that can be integrated in any application. What was integrated in Data Services 4.0 is a subset of the Text Analysis functionality - over time this will grow to provide the full functionality now available in Text Analysis.  We call this "Text Data Processing" to make the distinction with the standalone Text Analysis product.

Q: How does the licensing transfer if you don’t have some of the products that seem to be merging into Data Services are they still separately licensed?
A: SAP talks about the Data Services platform, but on the pricelist SAP still offers "Data Integrator Professional/Premium" or "Data Quality Professional/Premium". What you get is always the Data Services platform, but based on the license key some functionality will be disabled.


Text Data Processing is special in that it is included free of charge in the full Data Services license (DI + DQ) as well as DI Premium.  Data Services 4.0 only includes "Entity Extraction" from Text Analysis. In the first release, the supported languages are also limited to 6 common languages, instead of the 32 that Text Analysis supports.

Q: Does data services connect to XIr2 CMC repository for metadata analysis
A: Data Services 4.0 can still connect to a BOE XI R2 CMC to collect metadata. (but we will need a BOE 4.0 as well for our user management)

Q: Question:  Did I see correctly in a previous slide that Metadata Management is now part of Data Services 4.0 ?
A: Metadata Management is now part of Information Steward, not Data Services.


Q: When will data services 4.0 will be available to end customers?
A: Data Services 4.0 is available since last December in ramp-up. If you are interested, you can talk to your account manager. Unrestricted shipment is planned for July.


Q: Data Profiling seems to use the source DB as part of the engine to do the profiling. However, if we want to profile tables inside SAP ECC (MARA, etc), we would not want to put the load on the SAP ECC DB. So does that mean we would have to extract the MARA table from SAP & persist in another DB & then profile the data?

A: Information Steward uses the same data access mechanisms offered via Data Services, so as Data Services can read data from ECC, so can the Profiling functions of Information Steward.  This can either be performed against ECC (of course, be careful :) or extract the data and stage it for profiling needs.

The profiling offered in DS is basic, while Data Insight had more extensive profiling and assessment features.  The capabilities of Data Insight will now be available within Information Steward



Q: Is Location Awareness an add-on component with separate licensing or is it bundled with Data Services 4.0?

A: Location Awareness is an add-on feature for Data Services / Data Quality Management, and the Geocode Directories which supply the geo, points of interest, etc, reference data.

 

If you are interested in learning about another way to load data in HANA, the Sybase Replication Server, join us Thursday, April 21st, for the upcoming webcast - information is listed below.

Title: Sybase Replication Server Overview

Date: April 21st


Description: Learn how Sybase Replication Server rounds out your data management capabilities, by moving transactions (insert, updates, and deletes) at the table level from a source dataserver to one or more destination dataservers.

 

ASUG Members can register here.

Non-ASUG Members can register here:

 

I want to thank Sue Hay and Ben Hofmans for this great webcast.  Additionally, I want to to thank Ina Mutschelknaus, SAP, for setting up such great EIM 4.0 webcasts for the ASUG community.

Actions

Filter Blog

By author:
By date:
By tag: