on 10-14-2014 7:31 PM
Hello Experts,
I need your help in setting up the match score and criteria to find the duplicate ranging from 75% to 100%
I have LFA1 table data, I need to identify the vendor duplicate names from the Name1 field.
I use the EnglishIndia_DataCleanse transform before match wizard to standardize the Name1 field data.
In Match strategy I am using simple match and the Match criteria on
1. Givenname1 and Givenname2
2. Givenname1
3.Familyname1
And the match Criteria automatically takes the following mapping which for me seems Ok.
1. Person_Givenname1 - Person_Givenname1_Standardize
Person_Givenname1_Standardize_match_std1 - Person_Givenname1_Standardize_match_std1_standardized
Person_Givenname1_Standardize_match_std2 - Person_Givenname1_Standardize_match_std2_standardized
Person_Givenname1_Standardize_match_std3 - Person_Givenname1_Standardize_match_std3_standardized
2. Person_Givenname2 - Person_Givenname2_Standardize
Person_Givenname2_Standardize_match_std1 - Person_Givenname2_Standardize_match_std1_standardized
3.Person_Givenname3 - Person_Givenname3_Standardize
Person_Givenname3_Standardize_match_std1 - Person_Givenname3_Standardize_match_std1_standardized
Next step to create the break group - I created group for 3
Person_Givenname1_Standardize
Person_Givenname2_Standardize
Person_Givenname3_Standardize
In the Edit options Match Set Name I kept as INDIVIDUAL
Now my match criteria parameters shows as:
Person1_Given_Name1 - 50 (contribution to weighted score)Match score 101 - No match score 79
Person1_Given_Name2 - 30 (contribution to weighted score)Match score 101 - No match score 79
Person1_Family_Name1 - 20 (contribution to weighted score)Match score 80 - No match score 79
When I execute the above job it gives me the match score resulting 100% records, where i need the records set ranging from 75% to 100% for only Name1 duplicate.
kindly suggest, which step I am loosing hence I am only getting 100% duplicate records whereas 75% duplicate Names are available in the records.
Any help would be much appreciated.
Regards,
Neil
Hi Neil,
By seeing your steps , I am suggesting to change these below steps.
You are using the weighted match method not the rule based method. That's why you need to change below options .
1. You have to set the Match score to 100 and No match score to -1 for all three.
2. You have to set the weighted match score to 75 in level options. Please check the screenshot.
I hope this will work .
Thanks & Regards,
Ramana.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Incorporating match fields into your break key, negates the ability to utilize the match standard fields.
If theres nothing else to narrow down your possible candidates, you're probably going to want to use a substr of the family name as the break key.
If you break on first/mid/last, even if theyre all substrings you're going to end up with about the same thing as simply doing an orderby, gen_row_num_by_group(break key). If your break group is too granular, you un bucket a lot of potential matches. ;(
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
85 | |
10 | |
10 | |
9 | |
6 | |
6 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.