Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

Publishing to Data Services Designer

Publish a DCA Solution

Data cleansing solutions created by DCA within Information Steward are published and can then be used immediately within Data Services Workbench (DSW).  If your preferred development environment is Data Services Designer, then this section will take you through the steps to deploy a published DCA solution to Data Services Designer.

Data Services Workbench will display a list of all published DCA solutions.  A DCA solution contains the rules that were tuned by the Data Steward for cleansing and matching.  It effectively contains a query transform, global address cleanse transform, data cleanse transform and a match transform.  The DCA solution within Workbench, however, is just a single transform.  The complexities were abstracted away making configuration and deployment easier, but with far less configuration options to select from as you would get with Data Services Designer.


Create a Dataflow

Creating a dataflow within DSW is easy once the data cleansing solution has been published.  Simply create a dataflow and add an input source, the published DCA solution and an output target.  The DCA transform in this case does not have any cleansing or matching options to configure.  This is because the Information Steward user has already reviewed the data and set those options (implicitly or explicitly).

If there are any cleansing rules at all that you would want to customize, then you will need to deploy the DSW dataflow to Data Services Designer.

Deploy to Data Services

Executing (Tools -> Execute) a dataflow or choosing to deploy (Tools -> Deploy…) the dataflow from within Data Services Workbench will convert the dataflow to ATL and save it within the configured Data Services repository.  Once the dataflow has been saved as ATL to the repository, you can then login to Data Services Designer and view the exported job.

The DCA generated dataflow in Data Services Designer

Global Address Cleanse Transform

Input fields for the global address cleanse transform will already be mapped.  SAP best practices are used to determine the mapping schema for the input set that was used to create the data cleansing solution.  There are no restrictions with changing the input field mappings, but results that have already been reviewed within Information Steward may be affected.

The output fields for the global address cleanse transform are already selected based upon what is required further down the dataflow for data cleanse and match as well as what was selected to be part of the output schema for the job itself.  You are able to make any modifications to the output schema, but be advised those modifications may create errors for the data cleanse or match transforms.

The options for the global address cleanse transform are already set based upon SAP best practices and the settings defined by the user that created the data cleansing solution in Information Steward.  There are no restrictions with changing the global address cleanse transform’s options, but be advised that option changes may affect the output that has already been reviewed.

Data Cleanse Transform

Input fields for the data cleanse transform will already be mapped.  SAP best practices are used to determine the mapping schema for the input set that was used to create the data cleansing solution.  There are no restrictions with changing the input field mappings, but results that have already been reviewed within Information Steward may be affected.

The options for the data cleanse transform are already set based upon SAP best practices and the settings defined by the user that created the data cleansing solution in Information Steward.  There are no restrictions with changing the data cleanse transform’s options, but be advised that option changes may affect the output that has already been reviewed.

The output fields for the data cleanse transform are already selected based upon what is required further down the dataflow for match and what was selected to be part of the output schema for the job itself.  You are able to make any modifications to the output schema, but be advised those modifications may create errors for match transform.

Match Transform

The match transform’s option settings (how to define a match) and input fields cannot be modified when using a dataflow created by Data Cleansing Advisor.  You are, however, able to modify the output fields that will be a part of the output schema.

Best record functionality can be added to the dataflow using a data cleansing solution deployed from Data Services Workbench.  Details of how to do this will be explained in the section titled, “Configuring Best Record Using Data Services Designer”, but it essentially is adding a new match transform to create best records using the match output from DCA.

Query Transform

The query transform contains the basic cleansing rules that were created within Data Cleansing Advisor.  There are no restrictions with how this transform may be modified, but again, the results were reviewed in Information Steward and changes to the basic cleansing rules may affect the output.

Configuring the deployed dataflow

Create a Data Services job and add the DCA generated dataflow that was deployed to it.  Nothing else needs to be modified at this time.  Execute to run the dataflow and generate the results

Considerations

Deploying a dataflow from Data Services Workbench will automatically overwrite anything that was previously deployed so caution needs to be taken.  I recommend creating a copy of the deployed dataflow immediately before opening it up for the first time in Data Services Designer.

Data Cleansing Advisor Best Practices Blog Series

Determining Duplicates and a Matching Strategy
http://scn.sap.com/community/information-steward/blog/2013/12/31/determining-duplicates-and-a-matchi...

Publishing to Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/publishing-to-data-services-designe...

Configuring Best Record Using Data Services Designer
http://scn.sap.com/community/information-steward/blog/2013/12/31/configuring-best-record-using-data-...

Match Review with Data Cleansing Advisor (DCA)
http://scn.sap.com/community/information-steward/blog/2013/12/31/match-review-with-data-cleansing-ad...

Data Quality Assessment for Party Data
http://scn.sap.com/community/information-steward/blog/2013/12/31/data-quality-assessment-for-party-d...

Using Data Cleansing Advisor (DCA) to Estimate Match Review Tasks
http://scn.sap.com/community/information-steward/blog/2013/12/31/using-data-cleansing-advisor-dca-to...

Creating a Data Cleansing Solution for Multiple Sources
http://scn.sap.com/community/information-steward/blog/2013/12/31/creating-a-data-cleansing-solution-...