cancel
Showing results for 
Search instead for 
Did you mean: 

Pattern Matching suggestions for format check?

Former Member
0 Kudos

Hi All,

I am trying to create a calculated column to check whether an attribute TAX_ID conforms to any of up to five patterns, specified in lookup tables according to their regions. The lookup tables have many countries and various different patterns for each countries. The current design allows checking up to 5 patterns, but I am presented with several cases where we need to check more than five patterns. I am trying to come up with an approach to handle the following two cases.

There could be a performance concern if we increase the available patterns to 8 or more since we are calculated for each record whether patterns are valid. I am wondering if any better design is available to handle similar types of pattern matching check. Please share if you have any suggestions or thoughts on this. Thanks in advance for your input!

REGIONTYPEMAXMINFIXEDPATTERN1PATTERN2PATTERN3PATTERN4PATTERN5PATTERN6PATTERN7PATTERN8PATTERN9
RO412ROnnROnnnROnnnnROnnnnnROnnnnnnROnnnnnnROnnnnnnnROnnnnnnnnROnnnnnnnnn
VE56G-nnnJ-nnnV-nnnE-nnnG-nn-nJ-nn-nV-nn-nE-nn-n

n represent numeric number - 0,1,2,3,4,5,6,7,8,9

1. For Region VE, Tax ID will be valid if it conforms to the pattern [G/J/V/E]+nn(-)n. There is multiple prefixes available and an optional dash ( -) in the last but one digit. For example, G-123, J-124, V-21-8 and E-11-3 are valid, P-123, J-1-34 are invalid for region VE.

2. In Region RO, Tax ID will be valid if it conforms to the pattern RO+4-12 digits. For example, RO1241 is valid, RO1z3  is invalid for region RO.

Some thoughts:

a) Is there something similar to "lazy expression" available so that not all patterns have to be checked for those regions with less patterns? Some countries have one legitimate pattern only.

b) Is there any expression, functions or approach to check for patterns similar to Region VE without increasing the number of patterns in the lookup tables?

Regards,

Vicky Weng

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

Hi Vicky,

have you looked at regular expressions for this scenario?

Starting with SPS 9 you can use them for more complex pattern matching tasks.

Something like the following might be a good starting point.


create column table taxnumbers (region varchar(2), tax_no varchar(64))

insert into taxnumbers values ('RO', 'RO1241');

insert into taxnumbers values ('RO', 'RO1z3');

insert into taxnumbers values ('VE', 'G-123');

insert into taxnumbers values ('VE', 'J-124');

insert into taxnumbers values ('VE', 'V-21-8');

insert into taxnumbers values ('VE', 'E-11-3');

insert into taxnumbers values ('VE', 'P-123');

insert into taxnumbers values ('VE', 'J-1-34');


select * from taxnumbers where tax_no LIKE_REGEXPR 'RO[0-9]{4,12}';

REGIONTAX_NO
RO   

RO1241


create column table rules  ( region varchar(2), ex varchar(40));

insert into rules values ('RO', 'RO[0-9]{4,12}');

select *, locate_regexpr(r.ex IN tn.tax_no)

FROM

    taxnumbers tn left outer join rules r

        on tn.region = r.region;

REGIONTAX_NOREGIONEX          LOCATE_REGEXPR
RO    RO1241RO    RO[0-9]{4,12}1            
RO    RO1z3RO    RO[0-9]{4,12}0            
VE    G-123?    ?            ?            
VE    J-124?    ?            ?            
VE    V-21-8?    ?            ?            
VE    E-11-3?    ?            ?            
VE    P-123?    ?            ?            
VE    J-1-34?    ?            ?            

Cheers, Lars

Answers (0)