cancel
Showing results for 
Search instead for 
Did you mean: 

SAP Vendor Master Data Profiling and Remediation

Binoy
Participant
0 Kudos

Hello,

I am working on a POC by using IS and Data Service to profile and remediate Vendor Master data for uniqueness and to remove duplicates. ECC 6.0 is connected to Data Services and IS . Can you help with next steps that can help building a project map? If there is a similar scenario available, that would help.

Accepted Solutions (1)

Accepted Solutions (1)

former_member186897
Contributor
0 Kudos

Binoy Thomas,


Next step may include:


1) Creating different users in Information Steward who are going to work on this project. Example: Data Insight Administrator, Data Insight Analyst etc. Grant permissions depending on the kind of work each users going to do using IS.


2) Extract all the tables from ECC which are required for profiling and Stage in a Database which will be used by both IS and DS. This database need to be connected from IS and DS. I would not recommend to access ECC directly from IS or DS.


3) Import all the tables required for profiling in Information Steward.


4) Identify the critical fields in Vendor Master which need profiling and remediation. This may include Vendor Number, Vendor Name and Address fields etc.


5) Identify a field or combination of fields which need to be checked for Uniqueness.


6) Do the column profiling in Information Steward and based on the report, you can identify fields for uniqueness checks.


7) You can write different rules in IS to add any customized business logic. Build score card and see the health of Data and accordingly make a decision.


😎 Depending on the data health report, you can write cleansing routines using data services. You can make use of address cleansing in Data Services if you want to fix the issues with vendor address.


Order may vary. But identifying things will much easier if you first do a column profiling in IS and analyse data thoroughly before you make decision for Uniqueness check or decide for any cleansing activity.


Hope this helps.


Regards,

Ansari

Binoy
Participant
0 Kudos

Thanks Mohammed for your response . This really helps .

We are focusing on two KPIs here :

Data De duplication and Consolidation.

Duplicate check is based 7 Address data fields on the vendor master .

Vendor data is residing in three different databases and are not "unique numbers" and we want to use IS to profile the data by building the validation rule in IS .

We want to load the consolidate record into MDG . Could you suggest how and where do we write the Key mapping rule ?

former_member186897
Contributor
0 Kudos

Ok, in this case I would suggest the followings.

1) Create three different connections in IS for all these source databases so that you can import the tables required for profiling in IS.

2) You will need to create a project under Information Steward - >Data Insight in CMC which will be used in Data Insight to profile your data.

3) Open the Project in Data Insight and import tables which are required for profiling from these 3 connections

4) Do a column profiling (Workspace Tab) for all the tables which have vendor data.

5) Go to Rules Tab in Data Insight and create new rules which you need for validation and assign that under one of the Quality Dimension (e.g.  Accuracy, Completeness, Conformity, Consistency, Integrity). Approve the rule so that it can be used by projects.

6) Bind every rules created to one or more tables. 

7) Once you are done with binding, setup your score card with all quality dimensions you require.

😎 Go to Rule Results Tab from Workspace Home and select the table and click on "Calculate Score" which will basically execute all the rules associated with that table and generate the score card.

9) Go to Score Card page and select "Show Score as of Now".

If you want to see all of the failed records (IS shows only 500 failed records at a time and does not allow to import more than 500 failed records) then you need to create a database and setup another connection in CMC "For Data that Failed Rule". This database will hold all your failed records in a table.

So, this is how you can do the profiling and analyse the results. To de-duplicate and consolidate the data, you need to create Jobs in DS which will eliminate unnecessary records from the output.

Hope this helps.

Regards,

Ansari

Binoy
Participant
0 Kudos

Thanks Mohammad for the quick response .

Would you know how I can accomplish this Rule in IS and get the desired output :

I have connected ECC LFA1 (Vendor Master) to IS and created a View by selecting below fields :

LFA1-KTOKK ( Account group)

LFA1-LAND1  ( Country)

LFA1-SORTL ( Search term)

LFA1- NAME1 (Name)

LFA1-MCOD3 (City)

LFA1-STRAS ( Street)

LFA1-PSTLZ ( postal Code)

LFA1-TELF1 ( telephone Number)

LFA1-TELFX( Fax number)

LFA1-STCD1 ( Tax Number 1)

LFA1-STCD2 ( tax Number 2)

If any two vendors have identical records (given the combination of all these fields) , its termed as a duplicate and needs to be extracted in the output file .

Also , Once we have a set of duplicate records say 18000 , how can I extract them for Business to validate ? Do I need a temporary Database  ? I believe IS can extrcat only 25 or 500 records at a time.

Regards,

Binoy

Former Member
0 Kudos

Yes. You run your 'Calculate Score' and write these tables to a database specified in the 'Save all Failed to' box. This writes all the failed records and you can export them to files.

-Chaitanya

former_member186897
Contributor
0 Kudos

Binoy Thomas,


You can read through this Blog to setup your database to hold all records that failed rules. This is the way to overcome limitation of 500 records view at a time in IS.


Binoy
Participant
0 Kudos

Thank You . the link is exactly what I was looking for in IS 4.2 .

can you also respond to my "Rule" question ? Should i just profile the data for uniqueness to identify duplicates ?

former_member186897
Contributor
0 Kudos

Binoy Thomas


For duplication check I would recommend the followings:

1) Perform Uniqueness advance profiling on the table for the columns you have listed. This will generate a Venn Diagram with total failed/passed records with percentage so it would easy for business to quickly understand the quality of Vendor Data.

2) To show them actual records you need to write a rule to check duplicate. If you are using IS 4.2 then there is built-in function to check the duplicates. Go to advance rule editor and refer the below screen shot on where to get Is Unique Function function. Just pass all your fields on which you want to perform the uniqueness. This will produce you all failed records which you are getting from Advance profiling.

3) Once you are done with Rule, setup the score card and Calculate the Score and push failed records to Failed Records DB for further analysis.

Hope this helps.

Regards,

Ansari

Binoy
Participant
0 Kudos

Hi Ansari , I just tried and found out that we are on IS 4.1 😞 I dont have the IS_Unique function which is part of 4.2 .

Any suggestions ?

Former Member
0 Kudos

Binoy,

Looking at the fields you are including for de duplication, you need to consider the following scenarios:

1) For Name:

   Vendor 1: Name is Joes Garage and remaining field values are A,B,C,D,E.

   Vendor 2: Name is Joe Garage and remaining field values are A,B,C,D,E.

These are potential duplicates but doing a simple string compare will show them as unique.

2) For Street:

   Vendor 1: Street is 123 Main Street and remaining field values are A,B,C,D,E.

   Vendor 2: Street is 123 Main St. and remaining field values are A,B,C,D,E.

These are potential duplicates but doing a simple string compare will show them as unique.

You may need to consider address cleansing and matching based on the type of data you are dealing with.

-Chaitanya

former_member186897
Contributor
0 Kudos

Binoy Thomas

I would suggest to use Data Services for identifying duplicates in this case as it would be much easy to implement as compared to IS.

You can use is_group_changed() function to identify the duplicates. Refer below document.

is_group_changed() function - Enterprise Information Management - SCN Wiki

Binoy
Participant
0 Kudos

Hi Mohammad , Chaitanya ,

Unfortunately we are not planning to use Data Services for Data Quality at the moment . Our focus is just on information Steward 4.1 to  start with .

Would you know if Information Steward 4.1 has any " out of the box " solution set to perform duplicate checks ? If not , what can I build in IS 4.1 to showcase a Business case .

I am looking for specific Industry specific Vendor/Supplier  master Data specific KPIs to build rules for validation . Does SAP recommends any must have KPIs for Vendor / Supplier master data .

This forum has been so much of help and Many thanks for all your responses so far .

Regards,

Binoy

Answers (0)