cancel
Showing results for 
Search instead for 
Did you mean: 

[RS 15.7.1] MSA between ASE 15.5 & 16.0 - TEXT replication problem

marcin_najs
Explorer
0 Kudos

Hi.

I am testing MSA replication between ASE 15.5 and 16.0 (sp02) (linux 64bit)

After upgrading target server to 16.0 (sp02) I came across this error:

The text column descriptor returned from the execution of rs_get_textptr function string output commands is NULL. The text descriptor must be valid.

DSI goes down for transactions with UPDATE statements (for TEXT columns)

Problem occurs even after full data resync. using dump_marker.


I found some information about this error:

http://service.sap.com/sap/support/notes/2096754

and

http://service.sap.com/sap/support/notes/2210654

CR#770513

From the first link:

--------------------------

Symptom

MSA setup.

When target ASE is 1571 SP130, RS can encountered ERROR #5128 and replication stops at target.

From RS errorlog:

E. 2014/09/19 13:20:28. ERROR #5128 DSI EXEC(249(1) SYBASEDB12.deltapay) - /dsiexec.c(11935)

The text column descriptor returned from the execution of rs_get_textptr function string output commands is NULL. The text descriptor must be valid.

I. 2014/09/19 13:20:28. The DSI thread for database 'SYBASEDB12.deltapay' is shutdown.

Environment

Replication version  1571 SP102

Target  ASE 1571 SP130)

Resolution

Referenced by CR# 770513

Fixed in in versions: ASE 15.7 SP131 / ASE 15.7 SP132

Upgrade to ASE 15.7 SP131 or higher

Problem still occurs in version 16.0 (sp02).

Mabye someone came across this error before:

1. Is there a workaround?

2. Does it apply only to MSA? (warm-standby should work?)

3. Is it fixed (or planned..) for versions higher than 16.0 sp02?

I would be grateful for any information.

Best Regards.

--

Marcin

Accepted Solutions (0)

Answers (1)

Answers (1)

marcin_najs
Explorer
0 Kudos

Referring to qestion #2:

I 've made several tests and compared the behavior of MSA & WS setup.

Error occurs only for MSA configuration.

Warm-Standby works fine.

Regards

--
Marcin

Former Member
0 Kudos

Hi Marcin,

Is it occurring for one specific table? or all tables with TEXT columns?

what commands are being executed against these tables?
Is HVAR enabled? can you try to set dsi_compile_enable to off if already on?

ALso I previously encountered issues with tables having TEXT\IMAGE datatypes to MSA only which is close to your case.

Try creating the below replication definition for the table and test the outcome(queue should be drained when creating the repdef)


create Replication Definition "text_table_repdef"

with Primary at LDS.db_name

with All tables named "text_table" (

--list columns

--HERE


Primary Key ("PK_column")

Send standby replication definition columns

Replicate Minimal Columns

replicate_if_changed ("text_column")

Regards,

Bilal

marcin_najs
Explorer
0 Kudos

Hi Bilal.

Error occurs on all tables with TEXT columns (for UPDATE statements) and only for MSA setup (Warm-Standby works fine).

HVAR is disabled ( dsi_compile_enable is set to 'off').

Your solution (with table repdef) works. There are no errors.

It seems to be very good workaround for MSA setup.

Thank you for your advice.

Regards,

Marcin

former_member182259
Contributor
0 Kudos

Unfortunately, you never provided enough details - but one COMMON reason why repdefs work is due to providing the pkey columns.  ASE 15.7 ESD#1 and higher can do this automatically as part of the RepAgent processing.   Without the pkey, SRS will use all non-LOB columns as pkey columns.   Where this becomes problematic is with inexact numeric datatypes, timestamps, etc.    For example, if you have a table similar to:

create table testA(

     col_01 int,

     col_02 float,

     col_03 text null

)

Then insert the values (1, 1.2456789, "some text here")

What might actually be stored in ASE is 1.2456788 or similar.   When RA sends that to SRS, it is translated into a string value for the SQL statement to be sent to the RDB - so SRS sends a string containing "1.2456788".   Unfortunately, again, ASE calls the IEEE math libs in the OS - which invokes the HW chip - slight differences in math lib versions or HW revs could result in a new interpretation of 1.2456790 being stored.

When the update happens (e.g. update testA set col_3="some longer text value"), without the pkey specification in the repdef nor from the RA, SRS is forced to construct a string of:

update testA

     set col_03="some text here"

     where col_01=1

          and col_02=1.2456788

Which with the replicate column for col_02 actually having the value fo 1.2456790 - "misses" the row if you will - 0 rows affect.    This is a bit of oversimplification - in reality, for text/image, SRS uses writetext with textpointers so it has a series of functions rs_get_textptr & rs_init_textptr that are called - rs_init_texptr does the update to the textcol similar to the above and then rs_get_textptr retrieves the text pointer from that update.    If it can't find the row (same reason) then you get the error you got.    It can also happen for other reasons such as:

- you have sp_chgattribute 'deallocate_first_textpg' set......reason why is that the default expression (in the docs) for rs_init_textptr is to set the text col to null (which normally, setting a text col to null initializes a text pointer) - however, if ASE is fast enough, it might deallocate the just allocated text page before the rs_get_textptr executes

- differences in charsets/sort orders between servers and characters not able to be translated correctly (usually, this throws an error, but not always if the RDB accepts it)

Would WS have the same issue - yes - you may have just gotten lucky there.   Alternatively, by default the IBQ of SRS is binary and the OBQ is ASCII, so one other possibility is that the translation from binary to ASCII between the queues used by MSA (vs. WS only using IBQ) resulted in a float/real datatype getting manipulated there....however, generally SRS sticks to string representations internally for all data, so not sure this could happen - but it is possible and it may be why you saw this for MSA more often then you had so far for WS.

marcin_najs
Explorer
0 Kudos

Hi Jeff.

Thank you for your detailed and comprehensive answer.

It helped me to understand the essence of the problem.

As you mentioned, it is quite possible that is was just lucky to get no errors for MSA/WS before.

I have never used table repdefs (exept Sybase --> Oracle replication) and I have never experienced any TEXT replication probelms (for both MSA and WS)

In my case target servers are "1:1 copies" of primary servers:

- the same ASE versions (15.5)

- the same OS versions & architectures (RHEL linux 64bit)

- the same charset/sortorder

- replication is initialized by full dump/load of database (using dump_marker for connection or db subscription)

For ASE 16.0 (as target server) I performed simple tests like:

----------------------------------------------

create table tab1(

a int null,

b text null

);

insert into tab1(a) values(1);

update tab1 set b="sometext" where a=1;

----------------------------------------------


1. For WS no errors

2. For MSA rs_get_textptr error (it occurs for any text col in any table) - DSI is down.

I will have to consider using repdefs for all table with text columns...


BTW: There must be some reason why CR#770513 (which matches my case) has been officially reported...


Best regards,

--

Marcin

terry_penna
Participant
0 Kudos

Marcin,

The SRS CR 770513 that you list is related to an ASE CR 763162 that is fixed in ASE 16.0 sp02. 

The SRS CR just references the ASE CR where the fix needs to occur. In the targeted CR list you show for ASE 16.0 sp02 it shows the referenced ASE CR as being fixed.  Do you use the dboption "deallocate first text page"?  By default it is set to 'off'.

If the dboption is set to default 'off' then the SRS or ASE CR's do not apply.  If you have the dboption set to 'on' then please use a workaround of setting it to 'off' to see if the DSI error goes away.  If it does then you would need to open an incident so this can be investigated further by support.

-Terry

marcin_najs
Explorer
0 Kudos

Hi Terry,

Database option 'deallocate first text page' is set to OFF.

Regards

--

Marcin

terry_penna
Participant
0 Kudos

Marcin

With the dboption 'deallocate first text page' set to 'off' you are not encountering the CR that you reference in your post and should follow the recommendations to create a table repdef with a good pkey column for this table which should resolve the issue.

Just wanted to make sure that you were not focusing on the CR when it is not the issue in this case.

If creating a table repdef does not resolve the issue I recommend opening an incident with support for further investigation.

Regards

Terry