cancel
Showing results for 
Search instead for 
Did you mean: 

How to read PDF file and convert into XML

Former Member
0 Kudos

Hi,

My scenario is Email to IDOC. To my email PDF attachment would be coming in . I have to read PDF file in PI and convert it into IDOC. I read the blogs and wikis. However if Idont want to use conversion agent , is it possible to read PDF file and convert into corresponding xml.

can i use PayloadSwapBean for this?

Regards,

Danish

Accepted Solutions (0)

Answers (6)

Answers (6)

Former Member
0 Kudos

Hi All,

We can convert PDF to XML using Java mapping. Below is the basic code for that:-

*
* To change the template for this generated file go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
package Shubham1;


import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.HashMap;
import java.util.Map;

import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.parser.PdfTextExtractor;
import com.sap.aii.mapping.api.AbstractTrace;
import com.sap.aii.mapping.api.StreamTransformation;

/**
* @author shubham.e.agarwal
*
* To change the template for this generated type comment go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
public class Shubham2 implements StreamTransformation{
private Map map = null;
private AbstractTrace trace = null;
public void setParameter(Map arg0) {
  map = arg0; // Store reference to the mapping parameters
  if (map == null) {
   this.map = new HashMap();
  }
}


/*public static void main(String[] args) { //FOR EXTERNAL STANDALONE TESTING

try {
  FileInputStream fin = new FileInputStream ("C:\\test.pdf"); //INPUT FILE (PAYLOAD)
  FileOutputStream fout = new FileOutputStream ("C:/Users/Shubham.e.agarwal/My Documents/pdfXML.xml"); //OUTPUT FILE (PAYLOAD)
  Shubham2 mapping = new Shubham2();
  mapping.execute(fin, fout);
  }
  catch (Exception e1) {
  e1.printStackTrace();
  }
}
  */
public void execute(InputStream inputstream, OutputStream outputstream) {
  try {
      
   String msgType = "MT_shubham"; //A dummy Message type, please change it as per your requirement.
   String nameSpace = "http://Shubham"; //A dummy namespace, please change it as per your requirement.
   String str;
   str="<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+ "<ns0:"+msgType+" "+"xmlns:ns0=\""+nameSpace+"\">";
   str = str + "\n<Record>";
   PdfReader reader = new PdfReader(inputstream);
   PdfTextExtractor pdf = new PdfTextExtractor(reader);
   str = str + pdf.getTextFromPage(1);
   str= str+"\n</Record>"+"\n</ns0:MT_shubham>";
   byte by[] = str.getBytes();
   outputstream.write(by);
   reader.close();
   outputstream.close();
   System.out.println(str);
  }
  catch(Exception e){
   e.printStackTrace();
  }
}

}

anupam_ghosh2
Active Contributor
0 Kudos

Hi Danish,

                 Here is a sample java code to read from and write to pdf file. The library you need to include is itext.jar.

import java.io.*;

import com.itextpdf.text.Document;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
public class WritingPDF
   {
    public static void main(String arg[])throws Exception
      {
       Document document=new Document();
       PdfWriter.getInstance(document,new FileOutputStream("c:\\apps\\hello.pdf"));
       document.open(); 
       document.add(new Paragraph("Hello Pdf"));
       document.close();
       PdfReader reader = new PdfReader("c:\\apps\\hello.pdf");
       PdfTextExtractor p=new PdfTextExtractor(reader);
       String str=p.getTextFromPage(1);
       System.out.println(str);
     }
}

External library files  required for compiling the code can be downloaded from here .

In the above PDF I am writing to a file and again reading from it.

You need a java mapping code within your scenario which will read the pdf file and produce a XML using DOM parser etc.

Regards

Anupam

Former Member
0 Kudos

How do you want to convert the PDF into idoc/xml ? Do you want add it to a custom field with base64 content ? This is what we did in many projects. You can do that with a Java mapping (not graphical).

CSY

Shabarish_Nair
Active Contributor
0 Kudos

refer http://scn.sap.com/community/pi-and-soa-middleware/blog/2009/05/17/trouble-writing-out-a-pdf-in-xipi

it would be a starting point for you to understand what API's can help you with this.

iaki_vila
Active Contributor
0 Kudos

Hi Danish,

Have you read this document http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/9913a954-0d01-0010-8391-8a3076440...?

There a lot of steps and you need to have de SAP Netweaver Developer Studio and the necesaries roles to do a deploy but it has a good explanation. I hope it could be useful.

Regards.

Former Member
0 Kudos

Hi,

You have to use adapter module to convert PDF file into XML. Search sdn there is one article on the same.

Payloadswap bean will help u in swapping the mail attachment with ur main document.

Tanks

Amit Srivastava