Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

Convert PDF File to ASCII / Text File

0 Kudos

Hello,

I have a PDF File that was created from a Spools (Output of Smart Form to begin with and run in background) using the SAP Function module - CONVERT_OTFSPOOLJOB_2_PDF and I have the pdf file archived.

Now, I need to do Reverse process. The original Spool is gone, and I need to extract some data from the pdf file - say, for example:

I need to extract A/c numbers where the Label is "Account No:" followed by value of the Account No.

I opened the pdf file in Binary mode, stored into an internal table of type Hex and then LOOP ed through this table.

Moved the Hex data to a long enough data field of Type C to use "CS" operator in an IF Statement to look for the String "4163636F756E74204E6F3A" which is Hex representation of  'Account No:' (without the single quotes).

I can see 'Account No:' in the PDF File but my IF Statement with CS fails. I even tried to find the Zip Code using CS Statement - it fails too.

Some sample codes that I used are as follows:

  LV_TAG_1_C = '4163636F756E74204E6F3A'.

 

refresh: T_DATA_FILE_C.

  LOOP AT T_DATA_FILE_X.

    ASSIGN T_DATA_FILE_C-DATA TO <FS_C>.

    T_DATA_FILE_C-DATA     = T_DATA_FILE_X-DATA.

    IF T_DATA_FILE_C-DATA CS LV_TAG_1_C.

* The Above IF Statement NEVER became TRUE - even though I can see the "Account No:"

      LV_VAL_1_C = <FS_C>+SY-FDPOS(6).

      APPEND T_DATA_FILE_C.

    ENDIF.

*

  ENDLOOP.

Outside LOOP, T_DATA_FILE_C is still EMPTY.

Please let me know if you know of any example where I can extract data in ASCII format from a PDF File.

Please NOTE - I do NOT need to convert a spool to PDF File (whole website is full of this) - I need the opposite !!!

Help will be much appreciated.

Regards,

Tarun

Message was edited by: Matthew Billingham - email address removed

1 ACCEPTED SOLUTION

Former Member
0 Kudos

Writing the code in ABAP would be a nice academic exercise, but it would take too much effort.

You can use free/open source libraries and execute the OS command using SXPG* FM to convert pdf to text or html (not OCR).

One such example would be pdftotext

The xpdf project is ported to linux and windows. So you can do it using command line on presentation/application server. I am sure some java library will also be present to extract text.

6 REPLIES 6

Former Member
0 Kudos

FIND IN BYTE MODE should work for you. I think CS is a text operator which doesn't work for binary objects. For example FIND LV_TAG_1_C IN FT_DATA_FILE_C-DATA IN BYTE MODE and then you can use sy-fdpos and the other option of FIND to extract your data.

Former Member
0 Kudos

Also, an alternate solution have you considered the possiblity of not convert the spool to pdf but to plain text in the first place? I mean converting to pdf seems to be an intermediate step and one that is making your life quite complicated by trying to read text inside a binary file. Would reaplicing CONVERT_OTFSPOOLJOB_2_PDF with some FM like RSPO_DOWNLOAD_SPOOLJOB possible help.

Former Member
0 Kudos

Writing the code in ABAP would be a nice academic exercise, but it would take too much effort.

You can use free/open source libraries and execute the OS command using SXPG* FM to convert pdf to text or html (not OCR).

One such example would be pdftotext

The xpdf project is ported to linux and windows. So you can do it using command line on presentation/application server. I am sure some java library will also be present to extract text.

0 Kudos

Hello Manish,

Thanks a lot for your suggestion. This pdf file is really not a 1 to 1 representation of ASCII to Hex and vice verse. Your suggestion is really good.

I think, we are very close. Do you know the Function Module name that I need to use - this is what we have in ECC / R/3. An example with which Function module I need to use with some sample code - basically, what I need to supply and where does the result go - will be great.

Thanks again Manish.

INCLUDE LSSXPU01.   "SXPG_STEP_XPG_START

INCLUDE LSSXPU02.   "SXPG_STEP_END

INCLUDE LSSXPU03.   "SXPG_STEP_COMMAND_START

INCLUDE LSSXPU04.   "SXPG_JOB_END

INCLUDE LSSXPU05.   "SXPG_STEP_START_UPDATE

INCLUDE LSSXPU06.   "SXPG_STEP_END_UPDATE

INCLUDE LSSXPU07.   "SXPG_JOB_END_UPDATE

INCLUDE LSSXPU08.   "SXPG_COMMAND_CHECK

INCLUDE LSSXPU09.   "SXPG_DUMMY_COMMAND_CHECK

INCLUDE LSSXPU10.   "SXPG_APPSERV_RFCDEST_GET_INT

INCLUDE LSSXPU11.   "SXPG_RFCDEST_OPEN_INT

INCLUDE LSSXPU12.   "SXPG_COMMAND_CHECK_INT

0 Kudos

Have you installed the library on application server?

If yes, you need to first check on OS command-line mode (with help of Basis team perhaps) and figure out exact command that is able to get text out of pdf.

Then ask Basis to map to command to logical command using SM49, so that it can be called using FM SXPG_COMMAND_EXECUTE.

See this wiki too.

Creation of External Commands with the help of UNIX Coding in SAP - ABAP Development - SCN Wiki

0 Kudos

Thanks a lot Manish

We use External commands and I will work with basis team

Thanks