cancel
Showing results for 
Search instead for 
Did you mean: 

MDM, PI and Data Services

Former Member
0 Kudos

Hi,

As it stands today, SAP delives PI content for MDM. However, SAP has also published a roadmap for MDM integration with BO Data Services. There are two main use cases for accessing MDM:

1. Direct call from Application: An example is that if you create a new vendor in your system, SAP calls a Data Services web service for data cleansing and MDM look up to ensure quality data and to avoid data duplication. So the link between ERP and MDM would be Data Services platform.

2. Data transfer: PI content is currently delivered for migrating data from application systems to MDM. So the link between ERP and MDM is PI.

Given that MDM is about data management, it does make sense to use Data Services as the data integration tool rather than PI. However, I have yet to see SAP roadmap on similar Data Services content which is already offered for PI. As there is a sweet spot between PI and DS whereby PI offers mass data transfer functions and DS offers synchronous web service calls, the lines between application and data integration platforms are blurred.

So my question is from an architecture point of view....does it make sense for an organization embarking on a long-term organizational restructuring program to keep application and data integration layers separate and develop own DS content for application systems? Or should the organization use pre-delivered PI content currently available, and migrate to Data Services content when available?

First option keeps the integration layers separate and results in a clean architecture. However, it also means significant investment by the organization to work on develop something which might be delivered by SAP in a year or two. Although it might also be possible to engage with SAP and do joint content development.

The second option is a more tactical approach and not as clean as I'd like...but gets the job done. So reduced investment in terms of content development, but more governance and architectural compliance headaches to ensure data quality while using multiple integration tools.

What are your views on this subject?

Thanks and regards,

Shehryar

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member
0 Kudos

From product roadmaps, it is clear that SAP will deliver DS content for SAP and non-SAP sources. This will ensure DS becomes the data integration layer for the enterprise. We'll keep an eye on latest recommendations from SAP in this area.

Former Member
0 Kudos

Adding to this thread, we recently published a document on SDN to help customers to differentiate between Data Services and Process Integrator. A lot of what is discussed in this thread is mentioned in this document too :

http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/10fbac70-c381-2d10-afbe-c3902a694eaf

Thansk,

Ben.

Mark63
Product and Topic Expert
Product and Topic Expert
0 Kudos

As a general rule of thumb, I'd say the following (seeing SAP BusinessObjects Data Services in an ETL context and SAP NetWeaver PI in an EAI context):

ETL --

Core Use Case: Data Replication and Data Synchronization in batch and real-time

Examples: Asynchronous bulk load of a data warehouse, Data Migration

Data Characteristics: Bulk Data

Transformation Characteristics: Lightweight to sophisticated transformations; embedded Data Quality Transforms (cleansing, matching)

Integration Process Characteristics: Single-step, stateless integration processes only

Point-to-point or point to multi-point connectivity with clearly identified sources and targets

Recommended Product: SAP Business Objects Data Services

EAI --

Core Use Case: Event-driven, message-based application integration in real-time

Example: Synchronous process automation across ERP & CRM system

Data Characteristics: Single messages with small payloads

Transformation Characteristics: Lightweight transformations only

Integration Process Characteristics: Multi-step stateful integration processes, requiring Workflow/ Business process management (BPM) engine to handle process status; bus architecture with publish& subscribe, and sophisticated routing rules

Recommended Product: SAP NetWeaver Process Integration

Former Member
0 Kudos

Hi Markus,

Thanks for the excellent response. Much appreicated.

Yes, as part of designing the technical architecture, it is ideal to keep the data integration layer separate from application/process integration. What will be quite helpful to know is SAP's roadmap for positioning BODS as the tool for choice for ETL activities. This requires clear architectural guidelines and content from SAP. There are several presentations which mention about future BODS usage for loading data into BW. Similarly, information on BODS content (similar to PI content) for MDM integration is not available. Do you have concrete timelines for this use case? Or is this still being planned?

I think an overview document from SAP should be published on all planned BODS integrations with SAP products so that customers can plan for the way ahead.

Thanks and regards,

Shehryar

Edited by: Shehryar Khan on Apr 23, 2010 2:06 PM

Mark63
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi Shehryar,

Thanks for your feedback. I'll recommend this to the experts in charge of how-to and best-practice information to include more info on this topic in upcoming documents. For more information on MDM-BusinessObjects Data Services integration, please check the [SDN Wiki space dedicated to this topic|http://wiki.sdn.sap.com/wiki/display/SAPMDM/IntegratingSAPNetWeaverMDMwithSAPBusinessObjectsDataServices].

Best regards, Markus

Former Member
0 Kudos

Thanks Markus. One last question:

We agree that using Data integration tools likes BODS is appropriate for mass data transfers. What about the case when this bulk data comes from external sources. Similarly, after processing in BI, the data has to be sent to external parties. How do we ensure separation of concerns between application integration and data integration in such a case? Is there a use case for BODS integration with SAP PI?

Regards,

Shehryar

werner_daehn
Active Contributor
0 Kudos

One nice distinction is push vs. pull.

A batch job starts at a given event - e.g. every hour - pulls the data and processes it. Hopefully with very complex transformations required to get the full strength out of DataServices.

With a realtime task, a listener is waiting for somebody else to push a new record into it. Doable with DataServices but as soon as low latency, absolute stability, high volumes of message per second (> 100 messages/sec) and just simple transformations are required, PI is much better suited by far.

Fully acknowledged there is a gray area between both.

Former Member
0 Kudos

Hi Werner,

Thanks for the response. Much appreciated.

You mentioned that for real-time integration needs with high volumes of messages per second, PI is appropriate. I agree with that notion to quite an extent (further qualifying it based on nature of process/data). But what about the case when the data volume is large but number of messages is low.

We have a case whereby the data is sent to a business partner twice a month. The file size can be around 4GB. Processing such high volume of data in application integration middleware might not be the best approach. However, given that data needs to be transferred outside organizational boundaries, we need to understand how the data integration platform can handle this case e.g. deployment in DMZ. Tpyically, Data Integration platforms remain internal to an organization.

For example, if PI was used in this case, we would either use (S)FTP for data transfer or break the file in smaller chunks for transmission using other protocols (HTTP, JMS, etc).

However, by the very nature of data, this is not an A2A case. So ideally, the Data Integration platform should be able to handle it without us needing to go to Application Integration layer. But I can't find any SAP guidelines on this case e.g. how can BODS be deployed in DMZ to send/receive large data volumes to/from business partners? Which protocols to use?

There are products from Information Management Platform vendors, e.g, Informatica, which explicitly talk about these use cases. Given that SAP is going to offer many more solutions in the cloud, how does SAP plan to address the issue of data management/integration across organizational boundaries.

If there are no giudelines and if this is a white space, it might be worth listing partner solutions fit for this purpose.

Thanks and regards,

Shehryar

werner_daehn
Active Contributor
0 Kudos

>

> We have a case whereby the data is sent to a business partner twice a month. The file size can be around 4GB.

In my world this statement would indicate to use DataServices. There is no need to wait for two weeks and listen every single millisecond for one message to be delivered, this is something where you would schedule a batch job, either by time - every 1st & 15th of the month, or by event - when a file arrives then start the job.

Obviously one tool that can handle both would be the best solution but there are orthogonal requirements. For an high number of rows per second you introduce multiple parallel sessions, with intermediate commits or even bypassing the transaction control of the target completely by using database provided bulkload APIs or similar.

Realtime messaging on the other side is about guaranteed delivery, low latency, etc. You can't get both, it's either this or that. Hence there would be always a difference even if it would be one tool, or the tool is not optimized for one of the two use cases.

Former Member
0 Kudos

Use case: High volume data transfer outside company boundaries using Data Services

Hi Werner,

Thanks. Yes, I agree that Data Services is more suitable for the task of high volume data transfer and a batch job will deliver the data based on time or event. But deliver data where? (Please see options below)

I also agree that either tool can do both real-time and batch, however, it can't be the best option for both scenarios given that the tool is optimized for a particular purpose. So while you can do batch transfers using PI, and you can do JMS communication using Data Services, it is about choosing the right tool for the task and avoiding the situation, "once you have a hammer in your hand, everything becomes a nail :)"

I might be missing the point but how does Data Services support sending data outside company network? Or even receive data from a business partner outside company network?

Option 1: DMZ <> Application Integration Middleware <> Data Services

Most of the customers today would typically use PI or other integration middleware for communication with the outside world. If the data volume is high, they'll break the message into smaller pieces and dump the data at a particular location. Data Services then takes over for data quality and ETL activities. For outbound, Data Services will dump data somewhere or send data to middleware using any of supported protocols. In this case, customers won't use Data Services to directly communicate with 3rd party systems and will use the app integration layer in all cases. This, in my view, might be totally unnecessary in some cases.

Option 2: DMZ <--> Data Services

Rarely, customers would deploy Data Services in the DMZ for direct communication with the outside world. This could happen using SOAP or JMS protocols. However, these protocols aren't optimized to handle high volume messages.

contd...

Edited by: Shehryar Khan on May 24, 2010 11:41 PM

Former Member
0 Kudos

Option 3: DMZ <> Data Gateway <> Data Services

Put a third party product in the DMZ specialized in high volume data transfers. Data Services only cares about talking to this gateway for all external communication. You can avoid app integration layer in this case.

Option 4: (For the sake of completeness) There is also an option to use data integration in SaaS mode. Informatica already offers this. The data integration tool lives outside company boundaries in this case. However, I am not sure if all business partners would allow this approach (depends on nature of business).

So the answer I am looking for is what is SAP's recommendation on using Data Services in scenarios when high volume data has to be sent to/received from destinations/sources outside the company network, while avoiding use of appl integration layer unless necessary?

Any pointers?

Regards,

Shehryar