cancel
Showing results for 
Search instead for 
Did you mean: 

Data Services 4.2 Text Data Processing Bug?

Former Member
0 Kudos

For one of our customers, we have developed a social media analysis/monitoring application using SAP Data Services. This solution continuously extracts data from Twitter for analysis, alerting and reporting. This solution was developed in SAP Data Services 4.1 and has been running fine for a long time now.


However, our customer recently upgraded to SAP Data Services 4.2 SP3 (14.2.3.549) and is now experiencing issues with their correctly updated Social Media Analysis Jobs. (The repositories were correctly upgraded to DS 4.2 using the repository manager and everything else works fine.)


It seems that at random times, the Text Data Processing Transformation spits out this error:

6840      7364      DQX-058306      20/10/2014 12:56:04 p.m.            |Sub data flow DF_Twitter_Text_Analytics_1_2|Transform Base_EntityExtraction

6840      7364      DQX-058306      20/10/2014 12:56:04 p.m.            Transform : Internal format conversion error processing . (That latter number is an internal ID of the Tweet being processed.)


Further studies shows that the error is not random but always happens with specific tweets/text content. If we re-process the same data in our development environment, we can reproduce the same error again and again, with the same tweets. The issue only occurs with a relatively small number of the tweets being processed - perhaps less than 5% of the data. Here are some of the text samples of these tweets:


@Calfreezy love your vid with W2S in apartment tour ! sick set up as weell !!!!!

@docfreeride I’m sure it eases the pain. Whether it fixes the problem or even adds another one (hangover) is another matter 😄

@mcnfreedom good feel sick 😕

RT @picture_window: the @nzherald is really on form today publishing all of the chronically irrelevant opinions of chronically irrelevant p…

The free RDU app and RDUnited - $2 off house beer, wine and spirits at Dux live.  Android https://t.co/PrFcHfDcZY

The free RDU app and RDUnited - $2 off house beer, wine and spirits at Dux live.  iPhone https://t.co/lxddJ3KDZh

The free RDU app and RDUnited – 20% off all bottled beer at @threeboysbrew  Android https://t.co/PrFcHfDcZY

The free RDU app and RDUnited – 20% off all bottled beer at @threeboysbrew Android https://t.co/PrFcHfDcZY

The free RDU app and RDUnited – 20% off all bottled beer at Three Boys Brewery.  iPhone https://t.co/lxddJ3KDZh

The free RDU app and RDUnited – Free Tequila or beer with med Mexi Lime Chicken pizza at Winnies City.  iPhone https://t.co/tuN1AxBvDA

The free RDU app and RDUnited– Free Tequila or beer with med Mexi Lime Chicken pizza at Winnies City.  Android https://t.co/PrFcHfDcZY

The picture they used was of her at a club in Auckland, taken over a year ago, while she was drunk. Personally, that's disgusting ethics.

Win free tickets to @DocEdge festival at @MiramarTheRoxy (+ wine specials for all documentary festival-goers) - http://t.co/0ZLFvlbYdz

Win free tickets to @DocEdge festival at @MiramarTheRoxy + wine & lunch specials for all documentary festival-goers - http://t.co/0ZLFvlbYdz


To see if this was an upgrade/conversion issue, I created a new Data Flow in DS 4.2 and created a new TDP transformation without referring the existing Rule and Dictionary files that we are using and used a randomly selected tweet as source. And I got the exact same error as above.


The tweet I used was: "The picture they used was of her at a club in Auckland, taken over a year ago, while she was drunk. Personally, that's disgusting ethics. "


I ensured that no special characters were in the tweet but still got the error. I then only used the first four words and STILL got the error. However, when I changed "The picture[...]" to "A picture" or "Them picture" or anything else BUT "The picture"... the error went away!


Is this a bug of sorts that was introduced in DS 4.2? I was able to reproduce the very same error on our own SAP Data Services 4.2 SP2 environment and I cannot reproduce this error in SAP Data Services 4.1 SP2 (14.1.2.378). Has anyone any idea what is causing this problem?

Accepted Solutions (1)

Accepted Solutions (1)

former_member18162
Active Participant
0 Kudos

Hello,

SAP Note 2110602 refers to the specific example Erik reported above (he created a SAP Support Incident).

Prior to entity extraction, the content goes through a conversion process which includes the determination of MimeType. The engine which does this conversion is different than the one originally used in DS 4.1. It is this engine which had the issue with the content.

The SAP Note above states the fix was delivered in DS 4.2 SP3 P3 and DS 4.2 SP4.

I've just tested Erik's example in DS 4.2 SP6 and the behavior is corrected.

It is recommended if users see this sort of issue (which is usually specific to the content you're extracting from), a new SAP Support Incident should be created. When doing so it is very important to provide ALL of the following:

1. Recreate the issue in a simple job which uses only the entity transform.
2. Export this job to an atl file.

3. Provide the EXACT input which triggers the behavior (usually in a .txt file is fine).

4. Zip the atl and .txt file up together and attach to your incident.
5. Make sure to provide the correct DS version information as well:

Log into Designer > Help > About Data Services > 14.2.x.xxx

Cheers!
Julie

Answers (3)

Answers (3)

former_member198401
Active Contributor
0 Kudos

Please raise a ticket with SAP Support Team.

This seems to be a unknown bug

Regards

Arun Sasi

Former Member
0 Kudos

HI Erik,

I am facing one issue while fetching data from Twitter using BODS.

I am using the sample job provided by SAP Blueprint.

Can you please look at below link and provide me your input.

http://scn.sap.com/thread/3507081

Please help!

Thanks,

Swapnil

Former Member
0 Kudos

Did you ever find an answer for this?  I am getting the same error on the same incoming record, with 14.2.5.

Former Member
0 Kudos

The other thing is - this is raised as a full error (red stop sign) in the job, but the job continues running and will actually complete OK, skipping over the records it couldn't handle.  So it behaves more like a warning.  Maybe that's a quicker fix -- make sure this gets logged as a Warning rather than an Error.