1 2 3 Previous Next

SAP Information Steward

36 Posts

After configuring Email notification, yet not receiving the email for failed or passed scheduled jobs. Failed schedule job error log shows message below. No indication of an error message. Configuration is incorrect. See screenshot below for correct configuration.

Error Log.PNG

Check Configuration to ensure the correct Host and Domain is in place. Only for SAP Information Steward Internal box

 

Configuration.png

 

Hope this helps. any input is greatly appreciated.

 

Regards,
Veronock

Hello Everyone,

 

Last week I was searching for how to export Users and Groups in IS but unfortunately could not found the answer and then created a thread where one of the member answered that. I believe that sharing this solution will help other developers not to struggle for the answer in future. Hope you will find it useful.

 

So, here is the step by step procedure to export and import Users and Groups in information steward:

 

Scenario:

Considering that you wan to promote Users and Groups from DEV Environment to Production Environment. You simply don't want to re-create everything in new environment and will make use of Users and Groups defined from DEV Environment.

 

Step 1: Login to Dev Environment of Central Management Console from where you want to export Custom Users and Groups and go to Promotion Management tab as shown below.

 

1.png

Step 2: Right click on the Promotion Jobs folder and select New Job as shown under.

3.png

 

Step 3: Name the Job (Ex: Test_Promotion_Job) and then select the source as 'Login to a New CMS' as shown below. You will need to enter the credentials of your DEV CMC. Select the source system and enter your user Id and Password.

 

4.png

 

Step 4: You can see a green check adjacent to Source Name if you entered the credentials correctly.

 

Then, Select 'Output to LCMBIAR File' as a Destination as shown below. Then click on the Create button.

6.png


Step 5: It will load all the jobs which can be promoted. Go to User Groups as shown below and check all the Groups you want to promote to Production environment as highlighted below. Then click on 'Add and Close' button.

 

5.png

 

Step 6: Now User Groups are ready to promote. Click on the Promote Button as shown below.

9.png

Step 7: Select the source system and enter the credentials of CMS. Then select the Output to LCMBIAR file. Then click on Export button as shown below and save the LCMBIAR file to your local machine. Now you have successfully exported to User Groups.

 

6.png

Step 8: Now, Login to your Production environment through CMS where you want to import this User Groups. Click on the Promotion Management tab.

1.png

 

Step 9: Right Click on the Promotion Jobs folder and then select Import file as shown below. Select the LCMBIAR file which you had saved in earlier step from your local machine.

11.png

 

Step 10: Then in destination select 'Login to New CMS' and pass your system name, user and password. then click on Create button as shown below.

12.png

Step 11: So you are ready to promote the User Groups from DEV to Production. Click on the Promote button as shown below.

13.png

 

So that's it. You can go to CMC home and click on Users and Groups to verify if all are successfully promoted to the new environment.

 

This is how you can export existing User Groups from existing environment to a new environment.

 

 

Regards,

Ansari MS

The Failed Data Repository

If you are looking to build some custom reports on the results of your data quality assessment - beyond what is available with the Information Steward's Data Quality Dashboards - you can leverage the Failed Data Repository as the database to meet your custom reporting needs.

The Failed Data Repository provides you information about failed data from your validation rules within a supported relational database system.  Information includes:

  • Information about the project, connection and tables which generated the failed data (IS_FD_TABLES_MAP table)
  • Execution history of all the tasks which generated the failed data (IS_FD_RUN_HISTORY table)
  • All failed rules for a given run (IS_FD_RULE_INFO table)
  • All the rows that failed one or more rules (<table_alias>_FD table)
  • All the rules which failed a given row (<table_alias>_FR table)

For detailed information about the above referenced tables, see the section on "Accessing additional failed data information" in the Information Steward User Guide.  The diagram below shows the relationships between the failed data tables.

As an example, the total rows that were validated during the run is available in the IS_FD_RUN_HISTORY table (TOTAL_ROWS column) for each IS task.  And, you can join the *_FD tables to get at the failed data counts per rule/task.

Setting Up the Failed Data Repository

To leverage a Failed Data Repository for custom reporting, you must first establish a connection to the database within the Central Management Console (CMC).  Ensure that the connection type is "For data that failed rules."  The image below shows an example of the connection parameters, many of which will change depending on the Database Type selected.  For the most current listing of supported databases, please check out the SAP Information Steward PAM.

Specifying the Failed Data Repository

When executing a rule or set of rules (per task), you can select to save the failed data to one of the Failed Data Repositories that you have previously configured.

Viewing Failed Data from Information Steward

This is the Data Quality Scorecard Detailed View, from here you can view the failed data:

This gives you the Failed Data screen.  Once you have set up the Failed Data Repository, this will give you access to "View More Failed Data" to get beyond the 500 record sample data size.

The Information Steward Repository Views

Although the Information Steward Repository is not a supported means to exact data from custom reporting, here are a few Information Steward Repository Views that may contain some additional information to meet your needs:

  • MMB_Key_Data_Domain
  • MMB_Key_Data_Domain_Score
  • MMB_Key_Data_Domain_Score_Type
    • Key Data Domain, Quality Dimension, or Rule level
  • MMB_Domain_Value
    • Quality Dimension descriptions
  • MMB_Rule
    • Rule definition/description
  • MMB_Data_Group
    • Project Names

Related Content

How to Create Detailed Failed Data Reports as Part of your Data Quality Analysis

  • How to use SAP BusinessObjects Information Steward, along with the SAP Business Intelligence platform components SAP BusinessObjects Information Design Tool and SAP BusinessObjects BI launch pad, to produce a Web Intelligence report that will help you analyze the quality of your data.

SAP Information Steward Failed Data ReportingInformation Steward Failed Data Reporting

SAP Information Steward:  DQ Reporting of the SAP HANA Live Layer with Lumira Demo

In this blog, we will explore the Business Value Analysis capability of the SAP Information Steward 4.2 release. 

 

Poor data quality can affect businesses in many ways. The impact can be financial, customer perception, operation efficiency, brand recognition, regulatory compliance, and so on.  Here we see some examples of issues that arise due to bad data and how it affects various day-to-day aspects of the business.


 

IssueImpact
Difficult to determine the right recipients for marketing campaignsOperational Efficiency
Inaccurate order information causes delayed or lost shipments and lower customer satisfaction

Financial

Customer Satisfaction

Sales representatives are not able to identify relevant accountsOperational Efficiency
Costs are high due to account duplication, while response rates are low

Financial

Operational Efficiency

Customer Acquisition

Potential customers are annoyed by redundant mail, emails, and phone callsCustomer Satisfaction
Total revenue and profitability of products and services is reducedFinancial
Reporting uses wrong data, which leads to wrong conclusions and decisions

Financial

Operational Efficiency
Customer Satisfaction

Inaccurate statutory reporting Legal
Carrier stop charges for incorrect or incomplete addresses

Financial

Customer Satisfaction

Misalignment between vendors and defined terms due to system inaccuraciesFinancial
Poor spend visibility due to unstandardized, duplicate dataOperational Efficiency
Unable to find the right product / material due to unstandardized, duplicate dataFinancial
Operational Efficiency
Items are purchased off contract at premium prices due to poor quality supplier dataFinancial
Operational Efficiency


And, there can be many more such issues that arise from specific industries and business processes. For any organization, it is important to understand and quantify this impact. By assigning a dollar amount to poor data quality, the business awareness of the downstream and bottom line impact of bad data is increased.  It puts value on clean, accurate data and can be used to justify additional funding of your information governance initiatives.  Sure, an organization may know in theory there is a cost associated with bad data, but to be able to put actual numbers behind it - this can really give the information governance cause the credibility that it needs.

SAP Information Steward's Business Value Analysis enables business-orientated data stewards (or data stewards in collaboration with LOB representatives) to connect financial ROI to the organization's data quality and information governance initiatives.

BVA.png

Information Steward's Business Value Analysis features allows the organization to see the overall trend for the cost of poor data at various levels for root cause analysis. The business can also perform ‘what if’ analysis to identify potential savings / losses if they clean the bad data and accordingly focus their data quality / information governance efforts in the areas that will benefit them most.


BVA_2.png

With the Information Steward 4.2 release, the process of validation rule development now includes the ability to define itemized cost per failure. So as you add a new set of business rules, the financial impact associated with the data that failed against the rule is immediately taken in account within Data Quality Business Value Analysis.

There are two types of costs that can be considered when you calculate the cost per failure. Some costs are incurred in terms of a human resource spending time on addressing the issue or performing root cause analysis. These are called resource-dependent costs. Then there are costs that are resource-independent in nature.  Here you can see a few examples of different cost types:

Resource-independent costs

  • Cash flow: Additional costs incurred to the organization’s cash flow, such as delays in recognizing revenue or making supplier payments
  • Fees: Costs associated with any additional expenses or direct fees, such as those resulting from regulatory compliance failures
  • Fixed overhead: The fixed overhead cost distributed per failure, such as storage costs due to returned shipments
  • Revenue: Loss of direct revenue, such as lost customers or new sales
  • Sales: Additional costs associated with selling goods, such as sales organizations following erroneous leads
  • Other: User-supplied costs

Resource-dependent costs

  • Labor - Costs associated with the loss of productivity, for example, a resource or person will have to spend a specified amount of time to capture and remediate quality issues
  • Other – Costs that are not covered by the existing types

This is by no means an exhaustive list. The idea is to provoke thinking about such costs when trying to understand impact.


If you would like to find out more about SAP Information Steward's Business Value Analysis feature, here are some additional resources available to you:

M2.png

Information Steward's Metapedia promotes proactive data governance with common understanding and agreement on business taxonomies related to business data. With Metapedia, users have a central location for defining standard business vocabulary of words, phrases, or business concepts and relating that terminology to metadata objects (e.g. systems, tables, reports) as well as Data Insight quality rules and scorecards. Once the business glossary is established, business users can easily search the content and concepts using familiar business vocabulary and synonyms as well as access related terms from BI reports, data quality scorecards, … well, from really just about anywhere (see Exploring and Accessing Metapedia terms).

 

So, why might you want to start a Metapedia initiative?  Let’s explore some use cases.

 

Business Intelligence


Think about the value of giving your BI consumers – especially if those consumers are across multiple lines of business – access to a standard definition of the data used to drive reports and dashboards.  For example, let’s say a new HR report is created that displays information about Org Divisions, Dept Units, Career Grades and Levels, Local and Global Titles, etc. Wouldn't it be great if consumers, especially outside of the Human Resources team, could gain access to the accepted and HR-approved understanding of each of these concepts represented on the report?

M6.png

 

Data Migration


Looking outside of the reporting world, how about a data migration project that involves the consolidation of two different companies’ data assets?  As you sit down and map common data elements during a data migration project, what a great asset to be able to capture the common understanding of that data as standardized business terminology.  Then, link the agreed upon definition back to the disparate data sources while maintaining associations to each company’s unique lingo through synonyms and keywords (see Metapedia Categorization, Association and Custom Attributes) to make searching more efficient for both entities.

 

Information Governance


Some organizations have used Metapedia to capture additional information about their quality rules, documenting a more robust rule definition and business impact in a manner that is easily accessible by rule owners.  Check out the following example presented during a recent ASUG presentation.

M4.png

M5.png


Beyond Business Terms to Business Concepts


Metapedia is not just for capturing the definition and categorization of words and phrases. Metapedia can be extended to meet the needs of almost any Business Concept.  For example, business measure and metrics can be managed in Information Steward Metapedia leveraging custom attributes to store unique characteristics such as the actual calculation equation, the business process where used and the system of record the calculation is derived from.  Think about measures such as “Quarterly Sales Growth” and “Operating Income by Division” as well as metrics such as “On-time Delivery” and “Time-to-Develop” and creating a central repository of not only common definitions of those measures and metrics, but also the attributes that explain how those measure and metrics are calculated.  Then, link those concepts back to the metadata object involved in the calculation as well as the reports that display results.

 

<Insert Your Use Case Here>

We would love to hear your ideas and use cases for where Metapedia does or could provide value as a central glossary of terms, business content or concepts.

 

Please share via the comments.

A new Information Steward FAQ page has been added to the Information Steward wiki.  Here is a look at the initial questions that have been posted and answered.

 

Please bookmark the page and the wiki as for future reference as questions arise as well as contribute with additional answers to commonly asked questions.

 

 

 


SAP Information Steward and SAP Data Services and Data Quality are indeed inseparable and complementary solutions.  In this blog post, we are going to cover both an internal and external view as to why.  We will explore use cases, features and architecture that make these two solutions the very best of friends.

 

Typical Use Case Scenarios

Some of the typical use case scenarios where Information Steward and Data Services work together to provide a solution, include ETL/Data Warehousing, Data Quality Management, Enterprise Master Data Management, IT/System Maintenance, Business Intelligence/Reporting and Data Migration.  The table below contains some example of how the products fulfill use case requirements.

 

Use CaseSAP Information StewardSAP Data Services
ETL / Data WarehousingAnalyze source and target data to help with mappings and transformationsCreating the data flows, extraction, consolidation/transformation and load
Data Quality ManagementInitial insight into data content to understand cleansing requirementsPerform cleansing and matching, in batch and real-time
Enterprise Master Data ManagementInitial insight and continuous monitoring of master data qualityCleanse, consolidate and load master data repository
IT / System MaintenanceUnderstand quality, impact and lineage of data - Where is data used up/downstreamMovement of data for system upgrades
Business Intelligence / ReportingUnderstand quality, impact and lineage of data - Is data fit for use?Populate business warehouse for reporting
Data MigrationIdentify different data representations and and quality across systemsMigrate data into new system, merge acquired data

 

Let's focus on two use case scenarios in particular, data warehousing and data migration.  For data warehousing, Information Steward is going to support you in analyzing your source data to understand what content is available at the source as well as the quality of that data.  Profiling results such as word, value and pattern distributions will help you understand the need for mapping tables, or perhaps standardization of the data during the ETL process.  In addition, advanced profiling can help you to identify referential integrity problems.  For example, Information Steward could highlight the fact that the ORDER_DETAIL table contains Parts IDs that do not exist in the PARTS  table.  With a data migration project, let’s say one that arose as a part of an acquisition, Information Steward will help you gain familiarity with the newly acquired source system through data profiling, helping you to understand:

  • Is the content in the new acquired source system of similar format, structure or type than your corporate system(s)?
  • Again, is there a need for mapping tables or data standardization to be used as a part of the data migration process?

 

You can also perform a data assessment by running the new source system against your already establish data standards/quality rules within Information Steward. If cleansing needs to occur on the source system due to poor quality, the Data Quality Advisor and Cleansing Package Builder can support you to quickly and easily develop the needed cleansing and matching rules.  If there are duplicate customer or product records found across systems, those records (or a portion of those records) can be manually reviewed with Information Steward’s Match Review feature.

 

In both use case scenarios, Data Services is going to provide you the broad connectivity to databases, applications, legacy systems, and file formats that is needed to support your requirements for data extraction and loading.   Then, based on the results of the data profiling and assessment, Data Services can be used to transform the data to standardize the data from multiple sources to meet a common data warehouse or system schema.  Data Services can additionally be used to cleanse the newly acquired data to meet the quality standards your organization has in place.  De-duplication can be performed when redundancy need to be eliminated when bringing together the multiple sources of similar data.   And, in the case of that data warehouse, Data Services provides you the means to capture change in order to perform delta loads on a regular basis.

 

Complementary Features

When we focus specifically on Data Quality, Data Services and Information Steward are complementary solutions.  Below is what we like to call the "Data Quality Wheel“.

IS-DS.png

 

To start the process, you are assessing the data to identify issues and determine overall health.  And, on the back end, monitoring is in place to keep an eye on the ongoing health of the data.  This is where SAP Information Steward provides your solution.  Information Steward provides the platform for the business-oriented user to gain the necessary insight and visibility into the trustworthiness of their data, allowing them to understand the root cause of poor data quality as well as recognize errors, inconsistencies and omissions across data sources.

 

The next step in the process takes action with SAP Data Services Data Quality capabilities to guarantee clean and accurate data by automatically cleansing your data based on reference data and data cleansing rules, enhancing your data with additional attributes and information, finding duplicates and relationships in the data, and merging your duplicate or related records into one consolidated, best record.  Data Services enables the technical user with broad data access and the ability to transform, improve, and enrich that data.  This process can occur in a batch mode or as a real-time point of entry quality check against business requirements.

 

And, there is some overlap.  Information Steward additionally gives that business-oriented user the tools to help with improvement efforts – with intuitive interfaces for developing data quality rules as well as cleansing rules that work within a Data Services ETL data flow to improve enterprise data quality.  Information Steward also supports those business users in manually reviewing the results of the match and consolidation process, to spot check and validate duplicates that are flagged in Data Services as low confidence matches.

 

Let's look specifically at some of the product features that support the concept of sharing, the type of sharing that we would expect with best friends.

 

Sharing Validation Rules

IS-DS-2.png

 

Validation or quality rules defined in Information Steward to access and monitor the quality of your information assets can additionally be published to Data Services to be included as a part of a batch or real-time data flow to perform the same quality or consistency checks during various ETL activities.

 

Sharing Cleansing Rules

IS-DS-3.png

Information Steward’s Cleansing Package Builder empowers data stewards and data analysts to develop custom data cleansing solutions for any data domain using an intuitive, drag-and-drop interface.  Cleansing Package Builder allows users to create parsing and standardization rules according to their business needs

and visualize the impact of these rules on their data (as rules are being developed and as changes are being made).  Once the data analyst has developed the custom data cleansing solution, the Cleansing Package is published to Data Services for use with the Data Cleanse transform to parse and standardize incoming data into discrete data components to meet the defined business and target system requirements.

 

IS-DS-4.png

Information Steward's Data Quality Advisor guides data stewards to rapidly develop a solution to measure and improve the quality of their information assets. This is done with built-in intelligence to analyze and assess the data and make a recommendation on a cleansing solution – with the simplicity to allow a Data Steward to further review and tune the rules to get even better results.  When satisfied with the rules and results, the data steward can publish the data cleansing configuration to Data Services. Allowing the IT developer to use – or consume - that solution within the context of larger production data set and ETL data flow.

 

Reviewing Match Results

IS-DS-5.png

Information Steward’s Match Review is a critical step within overall “matching” process. While the Match transform in Data Services provides a very comprehensive set of mechanisms to automatically match and group duplicate records, matching still remains a combination of art and science. While you can get close to accurate matching using the Data Quality Advisor (in Information Steward) and the Match transform (in Data Services), there may be results (a gray area) that would benefit from additional, manual review.  Information Steward provides a business user-centric interface to review suspect or low confidence match groups that consist of duplicate or potentially duplicate records.  Based on their domain expertise, the users can confirm the results of the automated matching process or make changes, such as identifying non-matching records.  In addition, business users can then review and pick and choose fields from different records within the Match group to fine tune that pre-configured, consolidated best record.  The review results are available in the staging area. You can configure whether the results should be made available at the completion of the review task or incrementally as each match group is processed. The downstream job or process can read the results from the staging repository and integrate them into the target system.

 

Discovering Data Services Metadata

Information Steward’s Metadata Management capabilities discover and consolidate metadata from various source into a central metadata repository to allow users to manage metadata from various data sources, data integration technologies, and Business Intelligence systems, to understand where the data used in applications comes from and how it is transformed, and to assess the impact of change from the source to the target, reports or application.  SAP Data Services objects show up under the Data Integration category of Information Steward’s Metadata Management.  Information Steward has a native Metadata Integrator that can discover  a vast array of metadata objects - including projects, jobs, work flows, data flows, data stores (source and target information), custom functions, table & column instances, etc. - as well as understand the relationship between these metadata objects and up and downstream systems.

 

IS-DS-6.png

Performing data lineage analysis in Information Steward, we can see how data from the source has been extracted, transformed and loaded into the target using Data Services.  In this example, you can drill into to determine how LOS, or length of stay, was calculated and what source fields ultimately make up the patient name.

 

An Architectural Perspective

IS-DS-7.png

Architecturally, Information Steward and Data Services are inseparable in that Information Steward relies on Data Services.  In addition, they have a lot in common as well, they both leverage Information Platform Services (IPS).  Information Steward and Data Services both rely on CMS services for centralized user and group management, security, administrative housekeeping, RFC Server hosting and services for integrating with other SAP BusinessObjects software (i.e. BI Launch Pad).  A dedicated EIM Adaptive Processing Server is shared between Data Services and Info Steward.  Services deployed to the EIM Adaptive Processing Server include:

  • RFC Server - Used for BW loading and reading via Open Hub
  • The View Data and Metadata Browsing Services - Provides connectivity to browse metadata (show tables and columns) and view data
  • Administrator Service - Used for cleaning up the log files and history based on log retention period

 

In terms of being inseparable, the Data Services Job Server is required for Information Steward Job Server to work.  With this need, there also comes benefit as Information Steward is able to leverage the great capabilities that Data Services has to offer.  For example, Information Steward scales by leveraging Data Services ability to distribute work load across servers as well as across CPUs.  In addition, Information Steward leverages Data Services for direct access to a broad range of source connectivity, including direct access to SAP sources like SAP ECC.  Information Steward leverages Data Services as its core engine to not only access data but also to execute profiling and validation rules against that data.  With that being said, Data Services and Information Steward’s source connectivity capabilities closely mirror each other, where it makes sense and where there are not technical limitations in doing so.

 

So, what do you say?  SAP Information Steward and Data Services: inseparable, complementary, best friends...?  How about, they complete each other?  In any event, what a match!

Hi Team,

 

I'm getting an error when I view data on IS view. This error occurs on QA and PROD environments but it is fine on DEV.

 

Please see screenshot attached.

 

Please assist.

 

Thank you.

Roli

In this blog, we will explore how Metapedia and the native Metadata Integrators bundled with Information Steward can support you in managing your SAP landscape’s metadata.

In general, Information Steward Metadata Management collects metadata information from your enterprise systems, information such as:

  • Attributes (name, size and data type)
  • Structures (length, fields and columns)
  • Properties (where it is located, how it is associated, and who owns it)
  • Descriptive information (about the context, quality and condition, or characters of the data)

 

Information Steward then organizes metadata objects to allow you to:

  • Browse, search and explore the metadata
  • Understand relationships between different objects within the same source and across different sources
  • Customize metadata objects and relationships with annotations and custom attributes and relationships

 

 

 

SAP BusinessObjects Enterprise and Some Basics on Data Impact/Lineage and Metapedia

BOE.png

In support of your SAP BusinessObjects Enterprise environment, the Metadata Management module of SAP Information Steward can discover metadata about universes, reports (including Crystal Reports, Web and Desktop Intelligence documents), dashboards, CMS folders and systems, server instances, application users and user groups and SAP BW systems.  And, for each object, there is additional associated metadata.  For example, for a universe, the associated metadata may include queries, universe classes and connections, objects and filters.  For reports, metadata could includes universe objects, InfoObjects, queries, report fields, columns, variables, SQL expression fields, formula fields and running total fields.

 

 

BOE2.png

Impact analysis will allow you to view the objects that are affect by data within a particular object.  For example, the rather simple impact diagram above shows that the Calculate Totals report field impacts the Charts and Dials report.  When you hover your mouse over an element in the impact diagram – in this case the Calculate Totals field - additional information about that metadata object appears.

 

 

BOE3.png

The impact analysis for a universe object lists the objects that are affected by the data within the universe object.  In this example, you can see that two reports are affected by the data in the “Revenue Amt Func” universe object, InvoiceSummary and Revenue.  You can also take note of the report consumers, both users and user groups, that have permissions to access these particular reports.  And, why is this important?  Simple.  It answers the question, what is the downstream impact if I change this universe or that query?  And, this includes not only what does it impact, but also who and how many?

 

 

BOE4.png

Data Lineage enables you to view a list of sources from which an object obtains its data.  In this example, the report lineage diagram shows that the universe objects in the BOE Business Intelligence system came from tables that were loaded by the DS_Repo_RapidMarts Data Service data integration job.  The dashed lines between each column name in BOE Business Intelligence and DS_Repo_RapidMarts systems indicates that the columns are the same (Same As relationship). And, you could explore the data lineage further to see the source database or business warehouse of the Data Services data flow.

 

 

BOE5.png

Adding to the potential for insight, SAP Information Steward Metadata Management information can be accessed directly from within BI Launch Pad to view the data lineage of a Crystal report or Web Intelligence document, enabling direct access for report developers and consumers to understand where the data is coming from and how that data is being transformed.

 

 

BOE6.png

And, not only can you help report developers and consumers to understand where the data is coming from, you can additionally instill a degree of trust in that data by allowing them to see how good the data really is. The lineage information provided via the BOE and Information Steward integration includes and highlights the quality scores of the specific data assets and allows the user to drill into those scores to see the details as to the data quality rules, failed data as well as profiling results, if these rules and results are available.

 

 

BOE8.png

Metapedia terms can also be associated with report metadata objects.

 

 

BOE7.png

And again, a link within the BI Launch Pad allow BI users to access the business terms that have been associated with a particular Crystal Report or Web Intelligence document directly from the BI Launch Pad.  This promotes a common understanding of business concepts and terminology through Metapedia as your central location for defining standard business vocabulary (words, phrases, or business concepts).  So, why might you want to start a Metapedia initiative?  Well, think of your report/dashboard consumers, especially if those consumers are across multiple lines of business.  For example, let’s say Human Resources has created a new report that displays information about Org. Units, Dept. Units, Functional Area, Career Grades and Levels.  Wouldn't it be great if consumers outside of the HR team could gain access to the accepted and "HR-approved" understanding of each of the concepts represented on the report?  Or, looking outside of the reporting world, think about a data migration project to bring two companies' data assets together.  As you sit down together and map common data elements, what a great asset to be able to capture the common understanding of the data – data that may be technically named differently - in business terminology and link that definition back to the disparate sources.

 

 

BOE9.png

With the data migration example, so what if you want to expose your central repository of business terms to additional applications or locations to promote a common understanding across the two newly joined companies?  Good news!  Metapedia content can also be accessed via WebServices, which includes APIs that support searching the Metapedia repository terms, descriptions, authors, synonyms, categories, etc.  Above is an example MS-Word Plugin created using the Metapedia WebService API (see the Information Steward Developer's Guide for more information).

 

 

SAP NetWeaver Business Warehouse (SAP NW BW)

BOE10.png

Okay, so we are going to dive back down into the metadata with a look at SAP NetWeaver BusinessWarehouse (SAP NW BW).  Besides relational databases and data warehouses, BOE universes and reports can also assess data from SAP NW BW.  However, the SAP NW BW is a “blackbox” for BOE BI users.  It is not possible for them to see how the data crosses between the BOE BI and SAP NW BW environments.  The Information Steward SAP NW BW metadata integrator removes the barrier between these two environments (BI to BW) by exposing the objects inside SAP NW BW environment and thus providing transparency and traceability needed.  This allows questions such as, "If I change the definition of a specific SAP NW BW object, what universes or reports are affected?" or "From what SAP NetWeaver BW source does the universe obtain its data?" to be answered.  The SAP NW BW metadata objects and relationships supported by the Information Steward SAP NetWeaver Business Warehouse metadata integrator are displayed in the light-blue boxes in the above diagram.

 

 

 

SAP HANA

The Information Steward HANA metadata integrator has the ability to collect standard relational objects and information models in HANA.  It collects all of the information about service instances, databases, packages, views, including Attribute Views, Analytic Views, and Calculation Views, tables, columns, measures, variables, etc.  It also collects relationships between schemas, tables and columns as well as attributes and measures in your SAP HANA database sources.  And, of course, all of the relationships upstream and downstream of your HANA instance.

 

 

SAP Data Services

DS.png

SAP Data Services objects show up under the Data Integration category of Metadata Management.  Metadata objects applicable for Data Services includes projects, jobs, work flows, data flows, datastores (source and target information), custom functions, table and column instances, etc.

 

 

DS2.png

With Data Services data lineage analysis, we can see how data from the source has been extracted, transformed and loaded into the target

In the example above, you can see the ETL process in action, following the data from report to source.  Specifically, you can drill into to determine how LOS, or length of stay, was calculated and what source fields ultimately make up the PATIENT_NAME.

 

 

DS3.png

While analyzing the lineage of a Data Services integrator source column, you can view the Data Services Auto Documentation report for the associated data flow objects.  The Auto Documentation report allows you to see another view of the ETL process.

 

 

DS4.png

If you click the data flow hyperlink, it will launch the Data Services Management Console and allow you to navigate to the dataflow details.

 

 

SAP PowerDesigner

The Information Steward SAP PowerDesigner integrator is new with the Information Steward 4.2 release.  With this out-of-the-box capability, users have access to all the metadata related to Power Designer, thus improving collaboration between Data Modelers and Data Stewards by extending impact and lineage analysis to the design models that are available in PowerDesigner.  Once you have aligned the current-state (the operational view) with the architectural view, Data Stewards can then “inform” the Data Modelers where the root of quality concerns come from, informing architects so that they can address these quality concerns at the source as they design the next generation business applications.  Data Stewards also have easy access to data quality rules and domains defined as part of these PowerDesigner models, which they can leverage to implement actual validation rules within Information Steward.  In addition, the business terms defined in PowerDesigner can be integrated with Information Steward’s Metapedia so that all the business concepts are captured in a central location.

 

 

PD.png

PowerDesigner metadata is collected for conceptual, logical and physical data models.  In the image above, the left side shows the view in the PowerDesigner client and the right side shows corresponding objects in Information Steward.  Note that the intent is not to replicate everything possible object from PowerDesigner to Information Steward.  Only basic properties of conceptual and logical models are captured along with the relationships between the conceptual, logical and physical models.  The connecting entity between PowerDesigner design time metadata and Information Steward operational metadata is the physical model, so that is where the focus is.  Details about basic properties, physical diagrams, business rules, domains, references, tables, and server Instances are collected for PowerDesigner.

 

 

PD2.png

The above is an example of how the impact/lineage diagram shows up.  In this example, the database was created using a script generated by PowerDesigner itself.  On the BOE side, there was a universe built on top of that database, which is being used by the report.  On the PowerDesigner side, there were domains and business rules that were associated with a few columns being used by the report fields. Hence, the lineage is shown as report > report fields > universe objects > PowerDesigner columns > domain/rules.

 

 

PD3.png

The Business Glossary in PowerDesigner is very similar to Information Steward Metapedia, so it is very easy to map concepts from one to another.  You can import the content of PowerDesigner's Business Glossary to Metapedia. If the glossary terms were associated with some other objects in PowerDesigner, that association is maintained in Metapedia as well.

 

 

SAP ECC

ECC.png

So, what about the SAP Business Suite?  Well, there is more work to be done here specific to Information Steward‘s Metadata Management capabilities.  Currently, Information Steward gives you native connectivity to SAP ECC within Data Insight for data profiling and data validation.  This gives you access to browse SAP ECC metadata down to the column level, similar capabilities exist within Data Services.  This will also allow you to relate your Metapedia business terms to SAP ECC metadata (we covered this capability, object association, earlier).  However, in terms of Metadata Management and the ability to discover objects and relationships all the way to the SAP Business Suite, this item is currently on our Information Steward roadmap (SMP Roadmaps, go to Database & Technology area).  The goal is to provide complete metadata management for your SAP landscape, from data definition (via PowerDesigner) all the way to your operational systems and business processes (SAP ECC).  Watch for more great capabilities to come with Information Steward's Metadata Management!

Hi Team,

 

How do I display a change log in SAP IS or if something has been removed/deleted, how do I recover it?

 

Example, I created a View and next thing it has been removed/deleted.

 

Please help.

 

Thank you.

 

Roli

Determining Duplicates and a Matching Strategy

Information Steward’s Data Cleansing Advisor simplifies the entire data cleansing process by intelligently recommending rules for cleansing and matching based upon SAP best practices. The matching strategy recommended can also be customized to further meet your business requirements to further define how relationships are found within the input source.  This section will focus on the aspects of how to customize the recommended match strategy to get
different results based upon the same input source.

 

 

The image below shows a test data source in Information Steward that has common party data type entities.  The source has gone through content type identification (a new feature in Information Steward 4.2) and the content type of each column has been identified.  It can clearly be seen that we have an address entity, a person entity and other attributes of person data (email, phone, etc.) within this input file.  We could identity relationships (matches) using just the address fields or just the person fields, but what would be the results if we used both address and person?

 

dd1.png

 

 

Data Cleansing Advisor can recommend many different strategies to determine duplicates within an input source.  The following are supported: Individual,
Corporate, Individual and Corporate, ID Only, Family, Other.  A brief synopsis of each strategy is as follows:

 

  • Individual: searches for matching records based on personal name data
  • Corporate: searches for matching records based on organizational name data
  • Individual and Corporate: searches for matching records based on both personal and organizational name data
  • Family: searches for matching records based on last name data
  • ID Only: searches for matching records based on an identification or SSN column that you specify
  • Other: searches for matching records based on very specific criteria (address only, phone only, email only or another
    specified column)

 

A data cleansing solution was created to determine duplicates based on an address only theme.  The results are as follows:

 

 

dd2.png

Data Cleansing Advisor was able to determine that the input source contains over 25% (677) duplicate records.  You do not need to have technical knowledge on how to configure a Data Services’ match transform; you just need to know your business requirements and Data Cleansing Advisor will create the match rules for you.  Data Cleansing Advisor at this point gives you a few options that you can use to fine tune and review the results.  The first is a chart that allows you to drill-down into the results to create filters to view the data that is most important to you.  The image below shows a filter being created to view the matching records using the address only matching theme.  The results are then further divided by the match confidence (high, medium or low).  High confidence matches are close to being exact matches and may not need to be reviewed.  Medium and low confidence matches are considered to be suspect matches and you may want to review these record groups.

 

dd3.png

 

 

Viewing the data will immediately display the match results using the filter that we just previously defined (all matching records).  Looking at group ID 238 shows us that there is a single match group with the address of 444 Highland Dr that contains multiple different people.  These are not the results that were expected.  Data Cleansing Advisor allows you to fully customize the match theme (Other, Individual, etc.), the match rules used within a theme, the threshold of how accurate a match needs to be and certain advanced options (such as initials being able to match a person’s first name).

 

dd4.png

 

 

Fine tuning the match results is done within the same user interface when reviewing the data.  Selecting “Change match theme” will display the themes that are available to be selected from.  Knowing that we want to differentiate people whom have the same address means that we need to select an “Individual” match theme.  The rules are also displayed below, meaning that duplicates will be found also using phone, address or email.  These match rules can also be selected or de-selected depending on how much you want to customize the solution.

 

dd5.png

 

 

Once these changes are applied, a what-if analysis (preview) will be displayed showing you the exact impact of the changes made.  Modifying the match strategy is as simple as knowing how you want to define relationships and having Data Cleansing Advisor create the match rules for you.

 

dd6.png

 

 

The results show that we have ~140 less matches meaning we have more unique records.  Previewing the results and filtering on ‘Kohler’ shows us that there are now 3 match groups instead of the 1 large match group that we previously had using the address only matching theme.

 

dd7.png

 

 

The selected match theme will generally have the greatest influence on the results that you will get when trying to determine a strategy to use.  Now that we’ve changed the strategy to find matches using person data, we can now dive deeper into other changes that will also affect the result set.  The image above shows us that group ID 192 is a single match group with 3 records.  Each record has the same address, but the name is slightly different.  Data Cleansing Advisor allows you to further fine tune the match rules based on person, address or firm to get the results that you want.  The image below shows a highlighted checkbox that has been de-selected to no longer have first names match with initials.  Applying this change will again immediately show the what-if analysis and a preview of the records that were impacted.  When changing the match options, any combination of changes can be made, but to fully understand the impact of each change it is recommended to do one change at a tie.

 

dd8.png

 

 

The new results now show that P.T Coleman is now a near-match (grey, italicized text) to group ID 177.  This means that it is a unique record, but just under the threshold of it being part of the match group.

 

dd9.png

 

 

Data Cleansing Advisor can easily determine the duplicates within a specified input source and gives you the tools to easily customize the matching strategy to get the results that you want.

 

 

Data Cleansing Advisor Best Practices Blog Series

Determining Duplicates and a Matching Strategy
http://scn.sap.com/community/information-steward/blog/2013/12/31/determining-duplicates-and-a-matching-strategy

 

Publishing to Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/publishing-to-data-services-designer

 

Configuring Best Record Using Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/configuring-best-record-using-data-services-designer

 

Match Review with Data Cleansing Advisor (DCA)
http://scn.sap.com/community/information-steward/blog/2013/12/31/match-review-with-data-cleansing-advisor-dca

 

Data Quality Assessment for Party Data
http://scn.sap.com/community/information-steward/blog/2013/12/31/data-quality-assessment-for-party-data

 

Using Data Cleansing Advisor (DCA) to Estimate Match Review Tasks
http://scn.sap.com/community/information-steward/blog/2013/12/31/using-data-cleansing-advisor-dca-to-estimate-match-review-tasks

 

Creating a Data Cleansing Solution for Multiple Sources
http://scn.sap.com/community/information-steward/blog/2013/12/31/creating-a-data-cleansing-solution-for-multiple

Publishing to Data Services Designer

 

 

Publish a DCA Solution

Data cleansing solutions created by DCA within Information Steward are published and can then be used immediately within Data Services Workbench (DSW).  If your preferred development environment is Data Services Designer, then this section will take you through the steps to deploy a published DCA solution to Data Services Designer.

 

Data Services Workbench will display a list of all published DCA solutions.  A DCA solution contains the rules that were tuned by the Data Steward for cleansing and matching.  It effectively contains a query transform, global address cleanse transform, data cleanse transform and a match transform.  The DCA solution within Workbench, however, is just a single transform.  The complexities were abstracted away making configuration and deployment easier, but with far less configuration options to select from as you would get with Data Services Designer.

 

pds1.png

 


Create a Dataflow

Creating a dataflow within DSW is easy once the data cleansing solution has been published.  Simply create a dataflow and add an input source, the published DCA solution and an output target.  The DCA transform in this case does not have any cleansing or matching options to configure.  This is because the Information Steward user has already reviewed the data and set those options (implicitly or explicitly).

 

If there are any cleansing rules at all that you would want to customize, then you will need to deploy the DSW dataflow to Data Services Designer.

 

pds2.png

 

 

Deploy to Data Services

Executing (Tools -> Execute) a dataflow or choosing to deploy (Tools -> Deploy…) the dataflow from within Data Services Workbench will convert the dataflow to ATL and save it within the configured Data Services repository.  Once the dataflow has been saved as ATL to the repository, you can then login to Data Services Designer and view the exported job.

 

pds3.png

 

 

The DCA generated dataflow in Data Services Designer

 

pds4.png

 

 

Global Address Cleanse Transform

Input fields for the global address cleanse transform will already be mapped.  SAP best practices are used to determine the mapping schema for the input set that was used to create the data cleansing solution.  There are no restrictions with changing the input field mappings, but results that have already been reviewed within Information Steward may be affected.

 

The output fields for the global address cleanse transform are already selected based upon what is required further down the dataflow for data cleanse and match as well as what was selected to be part of the output schema for the job itself.  You are able to make any modifications to the output schema, but be advised those modifications may create errors for the data cleanse or match transforms.

 

The options for the global address cleanse transform are already set based upon SAP best practices and the settings defined by the user that created the data cleansing solution in Information Steward.  There are no restrictions with changing the global address cleanse transform’s options, but be advised that option changes may affect the output that has already been reviewed.

 

 

Data Cleanse Transform

Input fields for the data cleanse transform will already be mapped.  SAP best practices are used to determine the mapping schema for the input set that was used to create the data cleansing solution.  There are no restrictions with changing the input field mappings, but results that have already been reviewed within Information Steward may be affected.

 

The options for the data cleanse transform are already set based upon SAP best practices and the settings defined by the user that created the data cleansing solution in Information Steward.  There are no restrictions with changing the data cleanse transform’s options, but be advised that option changes may affect the output that has already been reviewed.

 

The output fields for the data cleanse transform are already selected based upon what is required further down the dataflow for match and what was selected to be part of the output schema for the job itself.  You are able to make any modifications to the output schema, but be advised those modifications may create errors for match transform.

 

Match Transform

The match transform’s option settings (how to define a match) and input fields cannot be modified when using a dataflow created by Data Cleansing Advisor.  You are, however, able to modify the output fields that will be a part of the output schema.

 

Best record functionality can be added to the dataflow using a data cleansing solution deployed from Data Services Workbench.  Details of how to do this will be explained in the section titled, “Configuring Best Record Using Data Services Designer”, but it essentially is adding a new match transform to create best records using the match output from DCA.

 

Query Transform

The query transform contains the basic cleansing rules that were created within Data Cleansing Advisor.  There are no restrictions with how this transform may be modified, but again, the results were reviewed in Information Steward and changes to the basic cleansing rules may affect the output.

 

 

Configuring the deployed dataflow

Create a Data Services job and add the DCA generated dataflow that was deployed to it.  Nothing else needs to be modified at this time.  Execute to run the dataflow and generate the results

 

 

Considerations

Deploying a dataflow from Data Services Workbench will automatically overwrite anything that was previously deployed so caution needs to be taken.  I recommend creating a copy of the deployed dataflow immediately before opening it up for the first time in Data Services Designer.

 

Data Cleansing Advisor Best Practices Blog Series

Determining Duplicates and a Matching Strategy
http://scn.sap.com/community/information-steward/blog/2013/12/31/determining-duplicates-and-a-matching-strategy

 

Publishing to Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/publishing-to-data-services-designer

 

Configuring Best Record Using Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/configuring-best-record-using-data-services-designer

 

Match Review with Data Cleansing Advisor (DCA)
http://scn.sap.com/community/information-steward/blog/2013/12/31/match-review-with-data-cleansing-advisor-dca

 

Data Quality Assessment for Party Data
http://scn.sap.com/community/information-steward/blog/2013/12/31/data-quality-assessment-for-party-data

 

Using Data Cleansing Advisor (DCA) to Estimate Match Review Tasks
http://scn.sap.com/community/information-steward/blog/2013/12/31/using-data-cleansing-advisor-dca-to-estimate-match-review-tasks

 

Creating a Data Cleansing Solution for Multiple Sources
http://scn.sap.com/community/information-steward/blog/2013/12/31/creating-a-data-cleansing-solution-for-multiple

Configuring Best Record Using Data Services Designer

A data cleansing solution created by Data Cleansing Advisor can be used in a dataflow within Data Services Workbench and Data Services Designer.  The solution that gets published does both cleansing and matching, but not best record.  There are two options to pursue if you want best record functionality: Data Services Designer and Information Steward’s Match Review tool.  “Match Review with DCA” is the next subject covered further down the article.  This section will focus extending the concepts learned in “Publishing to Data Services Designer” by adding best record functionality to the dataflow that was created.

 

The data cleansing solution used in the previous example is a simple cleansing and matching dataflow that outputs data to a single target
and is depicted below:

 

br1.png

 

 

Adding Best Record Functionality

The match transform that gets published to Data Services Workbench is not able to be modified other than the selection of various output fields to be included within the output schema.  Best Record is a post-match process that is usually done within the same match transform that does the matching.  In order to add best record functionality to a published data cleansing solution the following will need to be completed.

 

Add a case transform to the dataflow.  The case transform will be used to route matching records (those with a valid integer value for a match group number) to
the match transform that will be performing the best record calculations.  The unique records (those with a blank match score) will be routed to a query transform since these are already considered best records since they have no other matching records associated with them.

 

br2.png

 

 

The next step is to add a match transform to the dataflow to perform the best record calculations.  The input of this transform should be the matching records that are being routed from the case transform defined earlier.  The image below shows how this best record dataflow should be designed.  A full image of the entire dataflow can be found further below for your reference.

 

br3.png

 

 

Configuring a match transform to purely do best record calculations is fairly easy if you do have Data Services experience.  Examples of the input fields required are displayed below.  MATCH_GROUP_NUMBER is required so that it can be used to form break groups; grouping records based upon group number and using those groups to perform best record on.  The other input fields listed below will be used within our best record rules and to post data to when a master or subordinate needs to be updated.

 

br4.png

 

 

You can select to use any best record strategy with the data cleansing advisor solution and it’s completely customizable to suit your business requirements.  Once best record is configured the last step that needs to be taken is to determine the output records that you want to be populated to your target.  I used a query transform (“Called Best_Records”) to filter records that were either a master record or unique record.

 

br5.png

 

 

Below is the completed best record dataflow using a data cleansing solution published from Information Steward.  The full power and flexibility of Data
Services can be used to extend the functionality of a data cleansing solution.

 

br6.png

 

Data Cleansing Advisor Best Practices Blog Series

Determining Duplicates and a Matching Strategy
http://scn.sap.com/community/information-steward/blog/2013/12/31/determining-duplicates-and-a-matching-strategy

 

Publishing to Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/publishing-to-data-services-designer

 

Configuring Best Record Using Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/configuring-best-record-using-data-services-designer

 

Match Review with Data Cleansing Advisor (DCA)
http://scn.sap.com/community/information-steward/blog/2013/12/31/match-review-with-data-cleansing-advisor-dca

 

Data Quality Assessment for Party Data
http://scn.sap.com/community/information-steward/blog/2013/12/31/data-quality-assessment-for-party-data

 

Using Data Cleansing Advisor (DCA) to Estimate Match Review Tasks
http://scn.sap.com/community/information-steward/blog/2013/12/31/using-data-cleansing-advisor-dca-to-estimate-match-review-tasks

 

Creating a Data Cleansing Solution for Multiple Sources
http://scn.sap.com/community/information-steward/blog/2013/12/31/creating-a-data-cleansing-solution-for-multiple

Match Review with Data Cleansing Advisor (DCA)

 

Summary

Match Review is a tool in Information Steward which allows Data Stewards to engage in the process of reviewing match results produced by Data Services or by a data cleansing solution published by Data Cleansing Advisor.  The process of creating a dataflow to produce match results that Information Steward can then use to create a Match Review task can be found by reading the following article: http://wiki.scn.sap.com/wiki/display/EIM/Match+Review.  The process of creating a Match Review task can be found in this article: http://wiki.scn.sap.com/wiki/display/EIM/How+to+create+a+Match+Review+task+in+My+Worklist+and+review+the+suspects

 

This section will focus on leveraging Match Review to review the results of a data cleansing solution published by Information Steward’sData Cleansing Advisor.  Data Services Workbench will be used to stage the appropriate data for a Match Review task.  Data Cleansing Advisor simplifies the entire data cleansing process by intelligently recommending rules for cleansing and matching based upon SAP best practices.  These rules can then be modified to further meet the Data Steward’s business requirements while immediately showing them the impact of the modifications.  Once the Data Steward is satisfied with the results, the solution can be published and immediately used within Data Services Workbench by an ETL Developer.

 

 

 

Create a Dataflow

The published data cleansing solution is highlighted in yellow in the dataflow pictured below.  The data cleansing solution represents the cleansing and match rules that were created and reviewed by the Data Steward in Information Steward.  The image below is a basic example of what a dataflow will look like within Data Services Workbench in order to stage the appropriate tables to create a Match Review task in Information Steward.

 

mr1.png

 

 

The first step is to create a new project, dataflow and input source in Data Services Workbench.  Add the input source that was used to create the data cleansing solution within Information Steward to the dataflow.  If a sample set of input data was used for the data cleansing solution that was created in Information Steward, then you will want to reference the full (production) set of data.  If you are unsure, then locate the “Data Cleansing Solutions” tab in Data Services Workbench and view the details of the published data cleansing solution.

 

Mapping Input Data

mr2.png

 

 

All relevant details, including the input source’s name and schema, will be available for you to review.  Add the data cleansing solution from the same “Data Cleansing Solutions” tab to the dataflow.

 

The input source will need to be connected to the selected data cleansing solution.  The next steps would be to map the input source’s fields to the input schema of the data cleansing solution and to select the relevant output fields that will be used to drive the Match Review task.

 

The published data cleansing solution will provide you details as to how the data was automatically mapped within Information Steward when viewing the “Input” tab.  You can choose to manually map the fields or use the “Detect mappings” feature which will provide you suggestions as to how the data may be mapped.  The image below how my example input data set was mapped to the data cleansing solution.

 

 

Selecting Output Fields

mr3.png

 

 

By default there is no output fields selected when the data cleansing solution is added to the dataflow.  Match Review requires the following output fields to be available within the staged data table in order to create a Match Review task:

 

  • MATCH_SCORE
  • MATCH_GROUP_RANK
  • MATCH_GROUP_NUMBER 

 

Along with the 3 required match fields noted above, you must also select fields to include as part of match review or best record creation.

 

The image below is the “Output” tab of the data cleansing solution and the output schema that will be used.  All fields were selected in this example.  You can customize the output schema in any way you want, but you do need to have the 3 match fields listed above along with some output fields for the review and best record creation process.

 

mr4.png

 

 

Preparing the Data for Match Review

Now that the data cleansing solution has been added to the dataflow and configured, the next step is to prepare the data for the appropriate staging tables.  Two staging tables are created: a job status table and the data table that will hold the match groups to be reviewed.

 

mr5.png

 

 

The first query transform (called “DCAOutput”) is used to simply allow you to modify the column names to better fit your business requirements.  For example, STD_ADDR1_LOCALITY1 may have its name changed to LOCALITY.  Use this query transform to adjust the output schema from the data cleansing solution.

 

The second query transform (called “MatchReviewPrep”) is used to add the required fields for the job status table and the data table.  The job status table requires only 2 columns: JOB_RUN_ID and JOB_STATUS.  JOB_RUN_ID will get the value of “job_run_id()” and JOB_STATUS will get the value “Pending Match Review”.  The JOB_RUN_ID column will allow newly match data to be added to a new Match Review task without modifying the status or data of a Match Review task that has already been created using the same input source.  The JOB_STATUS field’s value will be used when setting up a Match Review task (http://wiki.scn.sap.com/wiki/display/EIM/How+to+create+a+Match+Review+task+in+My+Worklist+and+review+the+suspects).

 

mr6.png

 

 

 

The output schema of the “MatchReviewPrep” query transform needs to contain all required Match Review fields along with the output fields from the data cleansing solution.  This transform will then need to be connected to your data staging table (as shown above).  A field called SOURCE_RECORD_ID
needs to be added and given the value of “gen_row_num()” if your input schema does not have a record ID field.  If it does, then make sure that this field is available within the output schema so that it is available within the data staging table.

 

The third, and final, query transform (called “JobStatusPrep”) is used to simply filter out all output fields from the “MatchReviewPrep” query transform to ensure that only the JOB_STATUS and JOB_RUN_ID fields exist.  This transform will then need to be connected to your job status staging table (as shown
above).

 

mr7.png

 

 

All that is left to do is to validate the dataflow for any issues, deploy and execute the dataflow so that the staging tables are created and populated.  Once the dataflow has successfully created the required data, you can create a Match Review task in Information Steward.

 

There are two options when designing a dataflow to stage data to create a Match Review task.  The first option (and what was described above) is to send all match groups regardless of match score to the data staging table to be reviewed within Match Review.  The second option is to use a case transform to route suspect matches that have a MATCH_SCORE less than or equal to 93 to the data staging table to be reviewed within Match Review.  This means all other matches will be sent to the final target awaiting the results of the Match Review process.

 

The image below shows a dataflow that was created to route suspect matches to the data staging table and auto matches to the final target table.

 

mr8.png

 

Data Cleansing Advisor Best Practices Blog Series

Determining Duplicates and a Matching Strategy
http://scn.sap.com/community/information-steward/blog/2013/12/31/determining-duplicates-and-a-matching-strategy

 

Publishing to Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/publishing-to-data-services-designer

 

Configuring Best Record Using Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/configuring-best-record-using-data-services-designer

 

Match Review with Data Cleansing Advisor (DCA)
http://scn.sap.com/community/information-steward/blog/2013/12/31/match-review-with-data-cleansing-advisor-dca

 

Data Quality Assessment for Party Data
http://scn.sap.com/community/information-steward/blog/2013/12/31/data-quality-assessment-for-party-data

 

Using Data Cleansing Advisor (DCA) to Estimate Match Review Tasks
http://scn.sap.com/community/information-steward/blog/2013/12/31/using-data-cleansing-advisor-dca-to-estimate-match-review-tasks

 

Creating a Data Cleansing Solution for Multiple Sources
http://scn.sap.com/community/information-steward/blog/2013/12/31/creating-a-data-cleansing-solution-for-multiple

Data Quality Assessment for Party Data

 

Summary

Data Cleansing Advisor has the tools available to do an in-depth data quality assessment for party data.  The following entities can be identified and cleansed by Data Cleansing Advisor: address, person, title, firm, phone, email, date and SSN.  A single person, address, firm and title can be cleansed per record within the input source and 0-6 emails, phones, dates and SSNs can be cleansed.  This section will explain the cleanse results and the tools included
within Data Cleansing Advisor to do a data quality assessment.

 

The initial UI when navigating to the cleanse results displays a bar chart that helps show you the quality of data that has been cleansed and the impact of the cleansing rules themselves.  The bar charts start at a high-level (rows) and can drill-down to low-level (column) details.  As you drill-down through the charts, a filter (e.g. Suspect Phone -> Phone1 -> Invalid) is automatically created to view the cleansed results.   The bar charts are displayed on an entity-by-entity basis and navigation between each entity is as simple as selecting a different one from the dropdown box.

 

Date – Cleansed Data

The Date bar chart will initially show the records that are considered to be blank, suspect and cleansed.  The same is true for Person, Title, Firm, Phone, Email and SSN.  The “Cleansed” bar chart represents rows of records where a date (or other entity type) was parsed and successfully cleansed.  The “Suspect” bar chart represents rows of records where a date (or other entity type) was parsed, but the value was either invalid, the result is of low confidence or additional data was found that could be skewing the result.  Finally, the “Blank” bar chart simple represents rows of records where a date (or other entity type) did not have a value (null/blank).  Images displayed within this section will give you a sample of the filters that can be created when reviewing the cleanse results bar charts.

 

The filter (Date: Cleansed -> Date4 -> No/minor change) displayed below will allow you to immediately see the rows of data where the input column “DATE4” had valid date’s that either had no change or a minor change to standardize the format.  Details for each date input field (up to 6) will be displayed within
this chart.

 

Being able to identify , on a column-by-column basis, where blank data exists or invalid data has been entered will help with the remediation of the issue.

 

pd1.png

 

 

Email – Blank Data

The image below shows all email columns (“EMAIL1”, “EMAIL2”, “EMAIL3”, “EMAIL4”, “EMAIL5” and “EMAIL6”) from the input source that do not have a value.  Invalid or empty personal information is a data quality issue – especially when working with party data.  You want to ensure that you customers,
suppliers or vendors can be contacted and their identifying information is complete.

 

pd2.png

 

 

After realizing that most of the email columns do not have a value, it may be a good idea to create a validation rule within Data Insight.  It may not be apparent, but a Data Steward working with this data set may very well know that the input field “EMAIL1” is used for contacting customers.  This field is only composed of blank value 6% of the time.  A validation rule that ensures this column is not blank can be created within Data Insight and bound to this input source for future profiling purposes.

 

Phone – Suspect Data

Phone entity type cleansing is supported for both U.S. and international phone formats.  The image below shows that 7 records within the data set had invalid data within the “PHONE1” field.  Again, data remediation can be used to correct this issue, a basic transformation could be added to massage the data to a correct format or a validation rule could be derived from values we (and you) deem to be invalid.

 

pd3.png

 

 

 

Address – Invalid

The bar charts to show the impact of address cleansing vary slightly from the person, email, date and phone entity types.  Address data for each row will initially be split into 3 categories: valid addresses, invalid addresses and corrected addresses.  The data is split within each category using a combination of the reference data that is being used along with what happened during the address cleansing process.

 

The same functionality exists for the address cleanse results and what’s available for non-address cleanse results; being able to drill-down into the results from the row level down to the column level.  The exception with address cleanse results is that the bar chart goes down to the component level since address information can be contained across multiple input columns of data.  The filter (Invalid Addresses -> Address Line -> Street number) will return the 360 records that have an invalid street associated with them.

 

Drilling-down into the address results will give you very specified examples of what is invalid and what is valid (both valid addresses and corrected  addresses).  Data remediation or validation rules can be discovered using the address charts to determine why there is no address line or why no street numbers sometimes do not exist.

 

pd4.png

 

 

Data Cleansing Advisor will help you fully understand the impact the cleansing rules had on your input data and also the data quality issues that exist within your data.  Being able to quickly and easily identify this information will help identify the processes that could be improved and to have a higher level of
data quality in the future.

 

 

Data Cleansing Advisor Best Practices Blog Series

Determining Duplicates and a Matching Strategy
http://scn.sap.com/community/information-steward/blog/2013/12/31/determining-duplicates-and-a-matching-strategy

 

Publishing to Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/publishing-to-data-services-designer

 

Configuring Best Record Using Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/configuring-best-record-using-data-services-designer

 

Match Review with Data Cleansing Advisor (DCA)
http://scn.sap.com/community/information-steward/blog/2013/12/31/match-review-with-data-cleansing-advisor-dca

 

Data Quality Assessment for Party Data
http://scn.sap.com/community/information-steward/blog/2013/12/31/data-quality-assessment-for-party-data

 

Using Data Cleansing Advisor (DCA) to Estimate Match Review Tasks
http://scn.sap.com/community/information-steward/blog/2013/12/31/using-data-cleansing-advisor-dca-to-estimate-match-review-tasks

 

Creating a Data Cleansing Solution for Multiple Sources
http://scn.sap.com/community/information-steward/blog/2013/12/31/creating-a-data-cleansing-solution-for-multiple

Actions

Filter Blog

By author: By date:
By tag: