cancel
Showing results for 
Search instead for 
Did you mean: 

Use of function "regex_replace"

Former Member
0 Kudos

Hello everybody,

I've been looking at the "SAP_BODS_42_Tech_Manuals.pdf" but I can't find any explanation about the built-in function "regex_replace", which seems to be is the equivalent to "REGEXP_REPLACE" in Oracle.

In the own SAP BODS, the wizard gives you a short explanation of the function and each parameter but I would like to know some more details.

First of all, I would like to know the Flags you have available to use.

Also, I would like to perform something like (where PALABRA_IN = 'HELLO COOL WORLD'):

  • REGEXP_REPLACE(PALABRA_IN, '^([A-Z]+) ([A-Z]+) ([A-Z]+)$', '\2'); --> 'COOL'

So, therefore, being able to get the part of the input string I want according to the regular expression I use, in this case the second word.

I've tried with "\2" and "$2" but any of them work and it's not clarified in the documentation.

I would like to know as well if there is a equivalent built-in function for the Oracle one "REGEXP_SUBSTR".

Thanks and regards

Accepted Solutions (0)

Answers (6)

Answers (6)

Former Member
0 Kudos

Ok, thanks for replying.

I've tried this in a simple job just with a script box:

  • regex_replace($Word_In, '(A)\1(B)\2(C)\3', '\2', null);

And the job aborts and don't even say why (no error log appear). I guess the best option is a migration of code to Teradata...


Anyway, I think that it would be interesting if SAP BODS Staff worked on this issue to improve the "regex_replace" functionality so that less code had to be done out of SAP BODS.

Regards

Former Member
0 Kudos

I've worked with regular expressions in several environments and I've never seen an implementation like Oracle's. I took some time to read up on it and I see now what you are trying to do. If Oracle's implementation is unique, then it wouldn't make sense for BODS to implement it exactly. BODS is a generic tool and tries hard not to be specific to a single platform, although maybe has a bit of an extra nod to SAP data sources.

I don't know if you've looked into how Teradata implements this functionality, but if I were you I'd do that ASAP. You may find you can't mimic the functionality exactly in Teradata either. I hate to say it, but your team may have a lot of code rewriting in your future.

Former Member
0 Kudos

The thing is that we are planning to do a migration from Oracle to Teradata and I'm trying to see the viability of migrating PL-SQL code to SAP BODS Scripting (to have the code independent from any database we use and avoiding a migration to Teradata Stored Procedure Language).

The example I am exposing is not an exact case I want to implement, but I need to get that functionality for stuff much more complex than that.

As I see that the SAP function "regex_replace" has "Capturing parentheses":

( ... )

Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match.

And the documentation says that "Range of input that matched the parenthesized subexpression is available after the match", I want to know how to invoke that piece of regular expression I've already "captured" with the "Capturing parentheses".

So then I could write something like:

  • regex_replace($Word_In, '^([A-Z]+) ([A-Z]+) ([A-Z]+)$', '\2', null);
  • # where $Word_In = 'HELLO COOL WORLD'
  • # and '\2' means the second "captured" regular expression: ([A-Z]+)

And then obtain the content of the second expression in parenthesis: 'COOL'.

But '\2' doesn't work and I just want to know the sintaxis to get the second expression in parentheses in that case (and then extrapolate it to my real work).

Thanks and best regards

Former Member
0 Kudos

I don't see how to do it in BODS, but Microsoft implements capturing and allows the use of a label (?<name> subexpression). However, this would only allow for the subexpression to be used later in the regular expression. Their example shows (?<double>\w)\k<double>). I haven't use the BODS regex much, but I think you are going to have a very difficult time getting it to mimic Oracle's advanced functionality in all instances.

Microsoft Regex Reference

https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

Former Member
0 Kudos

I want to get the second expression in parenthesis. For the case of $Word_In containing the value 'HELLO COOL WORLD' it would be:

  • regex_replace($Word_In, '^([A-Z]+) ([A-Z]+) ([A-Z]+)$', '\2', null); --> 'COOL'


Because:

  • the first ([A-Z]+) is 'HELLO'
  • the second ([A-Z]+) is 'COOL'
  • the third ([A-Z]+) is 'WORLD'.

So I just want to know how to write it in SAP BODS (in Oracle, for example, the expression would be '\2').

Thanks and regards

Former Member
0 Kudos

You would need to come up with a regular expression that only selects the section you want, and then it will be replaced with what is in the third parameter. The Oracle function  you describe doesn't make a lot of sense to me. You have a regular expression that matches exactly the entire string in your example, but then you are selecting out only the second group of the regular expression.

It sounds like you want the function to return "COOL", but regex_replace is for replacing text in a string, not returning a substring. What is the actual requirement? Do you always want to return the second word in a group of words? I don't know how regex_replace works in Oracle, but it doesn't sound like there is a compatible function in DS and you'll need to come up with your own way to implement the logic. If you can tell me specifically what the logic is, without referencing the Oracle function I'm not familiar with, maybe we can come up with something.

Former Member
0 Kudos

I suppose another option is you could create a stored procedure in Oracle and then import it to DS as a function. Depending on the number of records you are processing, this might be a bottleneck, but it might allow you to use the Oracle functionality directly.

Former Member
0 Kudos

Thanks Robin for your answers.

I see that, theorically, I can capture a part of the regular expression with parenthesis to use it as the replacement string:

( ... )

Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match.

As I see in the documentation, the <replacement string> is the 3rd one but I still can't find how to represent it to work, what to put at the left of the "2":

  • regex_replace($Word_In, '^([A-Z]+) ([A-Z]+) ([A-Z]+)$', '\2', null);

I've tried some symbols ("/2", "\2", "$2", ...) and it doesn't work. I think the documentation doesn't mention it...


Any idea to represent it?

Thanks in advance and best regards

Former Member
0 Kudos

I'm not sure I understand what you are trying to do. If you put a '\2' in the third parameter, it will replace every regex match with '\2'. What do you want to accomplish by putting something to the left of the 2?

Former Member
0 Kudos

I believe the flags parameter accepts the few flags that are pretty standard for regular expressions, although the values are different than the typical ones used. For example, 'i' means case insensitive matching, and the one flag value I found was 'CASE_INSENSITIVE'. I don't believe you can use the flags parameter to get a specific token (match) as you are trying to do. The DS regex_replace is simply going to replace every match it finds in your string.

http://help.sap.com/download/documentation/bods_42_sp/ds_42_reference_en/frameset.htm?663bf2aa3c1546...

I found the flag values, it is in the match_regex function description.

http://help.sap.com/download/documentation/bods_42_sp/ds_42_reference_en/frameset.htm?576341dc6d6d10...

Former Member
0 Kudos

I'm not entirely certain how this function works, but if I understand what you have here I think you may be missing a parameter. The function regex_replace takes four parameters. If the intent of the '/2' parameter is to select the second word, you are passing it as the parameter for the replacement string. Try passing it as the fourth parameter.