cancel
Showing results for 
Search instead for 
Did you mean: 

how to make regex_replace work in Data Services 4.2

bruce_robert
Explorer
0 Kudos

I am trying to figure out the right syntax for using regex_replace in a mapping in SAP Data Services 4.2

I need to clean the Description field from the Ledger for reporting purposes.

Example: I need to standardize certain keywords, one of which is INVENTORY.

The data includes:

INVY

INVTY

INVENTY

INVTRY

I want to change    INV[A-Z]*    (regexp for INV and 0 or more letters after it) into INVENTORY

SAPDS keeps telling me that {A} is not a valid parameter.

I tried other regex variations, including :alpha:, and it tells me {:} is not a valid parameter.

so what is the right way to make this work without any errors?

regex_replace(QRY_Clean.ACCT_DESC, 'INVEN[A-Z]*$','INVY')

gives me the error above

Message was edited by: Bruce Robert

Accepted Solutions (0)

Answers (7)

Answers (7)

bruce_robert
Explorer
0 Kudos

Okay,

I have a workaround in case anyone reading the board needs it.

In my query extract transform, I right pad with an unused character, like '*'.

rpad(QRY_Clean.ACCT_DESC, 200, '*').

In the next transform query, I do all my regex's.

regex_replace(

regex_replace(

'INV\\[A-Z\\]*, 'INVENTORY'),

'AGR\\[A-Z\\]*', 'AGRICULTURE')

In the query transform before the data goes into the target table, I trim the *.

rtrim(QRY_Regex.ACCT_DESC, '*')

So far so good, no unwanted characters showing up because of a buffer issue.  And hopefully they'll fix this in the next version.

bruce_robert
Explorer
0 Kudos

Unfortunately a DECODE has the same drawback as a CASE... the assumption that only one correction has to be made to a value. In my example from above,

DMGD INVTY MATL has 3 corrections to be applied at once. With the decode, only the first one in the list would be applied.

It looks like it's definitely a buffer issue. I tried sorting the input by the length of the field, so the shortest records would be modified first and longest last. It was fairly effective, except in the few cases where the modification shortened the record. Then the next record had an extra unwanted character added at the end.

bruce_robert
Explorer
0 Kudos

It seems like the buffer is not being cleared correctly after each record. Because I am standardizing, it's not just INVENTORY but about 30 phrases.  Having 30 staging tables is not useful, nor is writing database code. Are there any buffer settings/config settings I need to tweak?

I have

substr(

regex_replace(

regex_replace(

QRY_Clean.ACCT_DESC,

' INV\\[A-Z\\]*', ' INVENTORY'),

' MAT\\[A-Z\\]*, ' MATERIAL'),

1,255)

If I only have one regex and run it, everything's fine. But when I have two or more, then I get the weirdness like AGRICULTURE PRODSYLVANIA instead of the correct AGRICULTURE PROD.

The match doesn't work for us, because there's INVENTORY IN STOCK, DAMAGED INVENTORY, ACCRUED INVENTORY EXPENSE, etc so I have to standardize all variations.

And because I'm correcting for humans, I may have DMGD INVTY MATL, so in one record I have to change DMGD to DAMAGED ... AND... change INVTY to INVENTORY.... AND.... change MATL to MATERIAL.  Again, easy peasy in other programs, but this extra letters showing up is a pain/disappointing.

Message was edited by: Bruce Robert

Former Member
0 Kudos

Instead of 30 staging tables,  you could try a case transform to split out by the string you want to match, perform the regex_replace, then merge back together and write to the Target table.

0 Kudos

you can use decode function to achieve this functionality

0 Kudos

Dear Bruce,

Greetings for the day..!!

Could you please try the below mapping formula to standardize any description value which start "INV"

to INVENTORY


ifthenelse(match_regex(QRY_Clean.ACCT_DESC, 'INV.*', NULL)=1,'INVENTORY',QRY_Clean.ACCT_DESC)

match_regex Function results whether the Description column is Starting with "INV" or not..

By using the result of match_regex , ifthenelse will update the proper description value.

Hope this is Helpfull..

Regards,

Sumanth.R

bruce_robert
Explorer
0 Kudos

Thanks!

If only they'd provide help for that in the product itself, like for webi and universe designer. It would never have occurred to me to double-backslash the braces. I'm used to javascript and perl where it's not done that way.

Got another question.  Don't know if it's a bug or an undocumented feature...

When I run the job containing my regexs, it's adding seemingly random character segments at the end of some strings.

So for regex_exp(QRY_Clean.ACCT_DESC,'INV\\A-Z\\]*$','INVY')

I'll get BEFORE as MATERIAL INVTY and AFTER as MATERIAL INVENTORY, which is good.

But I also see BEFORE as AGRICULTURE VENDOR and after as AGRICULTURE VENDORSYLVANIA.

Why would it be appending SYLVANIA to the end of the string when I'm not doing anything about PENNSYLVANIA?  Is there an undocumented G, I, M, or other flag I need to add? Or is this a bug that I need to ask SAP about?

mageshwaran_subramanian
Active Contributor
0 Kudos

Here's the recent discussion on this topic that might help you

Former Member
0 Kudos

Looks like the syntax is not quite right, try:

regex_replace(QRY_Clean.ACCT_DESC,'INV\\[A-Z\\]*$','INVY')

This will also match rows that start with something other than INV as long as they end with INV plus one or more letters (case sensitive). If in addition you want to ensure the string starts with INV, add a ^ at the beginning.

Check the examples here: http://help.sap.com/download/documentation/bods_42_sp/ds_42_reference_en/frameset.htm?663bf2aa3c1546...

See also: https://regex101.com/#pcre  (not an exact match for the syntax due to escaping of [ and ], but can help build the rest of the string).

Former Member
0 Kudos

Hi All,

I am also facing the same issue. My requirement is different where I need to find the occurrence of lower case letters and replace with blanks

I used regex as below

regex_replace( table1.field,'\\[a-z\\]','')

This code produces unwanted new characters. Hence I tried

translate(table.field,'abcdefghijklmnopqrstuvwxyz','')

This also produces unwanted new characters if there are no lowercases in the string

can you please help ?