Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
engswee
Active Contributor

Update 2 Oct 2018: Now available in CPI as well.

Update 10 May 2016: Add new optional parameters headerRow and onlyValidCharsInXMLName (courtesy of jacob.vandborg5).

Update 7 Aug 2015: Add new parameter columnOffset.

Update 25 Mar 2015: ExcelTransformBean has been refactored to be part of FormatConversionBean. Parameter conversionType replaced with converterClass. Parameter indentXML replaced by indentFactor.

Update 11 Feb 2015: Updated source code links to shared module repository on GitHub. Previous GitHub repository will be deleted.

Update 27 Jan 2015: Updated Source code section to indicate EAR deployment file released for download at GitHub repository.

Update 3 Nov 2014: Refactoring of source code to cater for SimpleExcel2XML and SimpleXML2Excel conversions.

Introduction

There are already a handful of blogs on SCN dealing with Excel to XML conversion - so why another one??

Here is my wish list for a comprehensive solution:-

  • Able to read all kinds of Excel format (XLS and XLSX)

  • Behaves in a similar way as MessageTransformBean

  • Highly configurable - develop/deploy once, use multiple times

  • Able to handle XML special characters


In the reference section below are some of the more popular approaches, however below are some of the (non-exhaustive) limitations/drawbacks

  • XLSX files stores string contents in a separate sharedStrings.xml file in the zipped XLSX file

  • JExcel API does not support Excel 2007 XLSX formats

  • Limitations in handling formulas, formatting and special characters in cells


ExcelTransformBean (ETB) aims to provide a generic adapter module solution (a la MessageTransformBean) that is highly configurable and reusable. It is based on the Apache POI API. Utilizing the combined SS interface of the API, it uses a single logic to read all kinds of Excel files (XLS and XLSX).

This first part covers Excel to simple XML conversion, while the second part covers simple XML to Excel conversion.

Source Code

Refer to following blog on location of source code and/or EAR deployment file.

FormatConversionBean - One Bean to rule them all!

This converter is based on Apache POI 3.9 library. In order for the Java project to compile and build successfully, the following JAR files need to be referenced/imported into the project.

  • poi-3.9-20121203.jar

  • poi-ooxml-3.9-20121203.jar

  • poi-ooxml-schemas-3.9-20121203.jar

  • xmlbeans-2.3.0.jar

  • dom4j-1.6.1.jar


The library files can be downloaded from Apache's website, direct link to the ZIP file is provided below.

Apache POI 3.9 ZIP file

 

Note: CPI version has been updated to Apache POI 3.17 - refer to FormatConversionBean arrives in CPI for details on library dependencies.

 

Usage of Module in Communication Channel

Module Processing Sequence

















Number Module Name Type Module Key


<Depending on position of module in chain>

1) Asynchronous scenario,

Normally before the last module in channels

2) Synchronous scenario,

Before last module to convert request payload

After last module to convert response payload
Custom_AF_Modules/FormatConversionBean Local Enterprise Bean <Any Arbitrary Value>

Module Parameter Reference

Below is a list of the parameters for configuration of the module for Excel to XML conversion. Certain parameters will automatically inherit the default values if it is not configured.



































































































































Parameter Name Allowed values Default value Remarks
converterClass PI - com.equalize.xpi.af.modules.excel.Excel2XMLTransformer
CPI - com.equalize.converter.core.Excel2XMLConverter
Required field. Determines conversion class.
sheetName The name of the active Excel sheet to extract. Either sheetName or sheetIndex must be populated.
sheetIndex Integer values beginning from 0 The index of the active Excel sheet to extract (starts from 0). Either sheetName or sheetIndex must be populated.
skipEmptyRows Y, N Y Empty rows to be skipped or not
rowOffset Integer values beginning from 1 0 Starting row to begin extracting content from (i.e. 0 = start from first row, 1 = start from second row). If processFieldNames = 'fromFile' and rowOffset = 0, first line will be automatically be headerRow+1
headerRow Integer values beginning from 1 0 Available only when processFieldNames = 'fromFile'. Determines which row to retrieve header column names from (0-based so 0 is first row, 1 is second row)
columnOffset Integer values beginning from 1 0 Starting column to begin extracting content from (i.e. 0 = start from first column, 1 = start from second column)
processFieldNames fromFile, fromConfiguration, notAvailable Required field. Determines the naming of each column of the rows, and the number of columns to extract:

  • fromFile = Column names and number of columns are determined from header line of the sheet

  • fromConfiguration = Column names and number of columns are determined from parameter fieldNames

  • notAvailable = Column names will be set as ColumnX, where X = 1,2,3,4. Number of columns will be determined from parameter columnCount


fieldNames Name of columns. Required field when processFieldNames = 'fromConfiguration'
onlyValidCharsInXMLName Y, N N Available only when processFieldNames = 'fromFile'. Removes invalid XML characters from column names retrieved from header row of sheet
columnCount Integer values beginning from 1 Number of columns for extraction. Required field when processFieldNames = 'notAvailable'
recordName Record XML element name for row of record in output
documentName Required field. Document name of root element of XML output
documentNamespace Required field. Namespace of root element of XML output
formatting excel, raw excel Controls how the cell contents are formatted in XML output

  • excel = Cells are displayed the same way as Excel formatting of corresponding cell

  • raw = Raw value of cells are displayed


evaluateFormulas Y, N Y Controls how cell contents with formulas are displayed in XML output

  • YES = Cells are displayed with result of formula evalution

  • NO = Cells are displayed with actual formula


emptyCellOutput suppress, defaultValue suppress Controls how empty cells are displayed in XML output

  • suppress = Empty cells are not displayed (no corresponding XML tags for empty cells)

  • defaultValue = Empty cells will be displayed with default value


emptyCellDefaultValue <blank> If emptyCellOutput = 'defaultValue', all empty cells will be populated with value in this parameter
indentFactor Integer values beginning from 1 0 Determines the number of indentation spaces for each level in the XML output
debug Y, N N Displays contents in Audit Log of each cell extracted. WARNING: Use this only for debugging in non-productive systems

Example Scenarios

Here are some example scenarios of the behavior of the conversion based on different configuration options.

Scenario 1

Excel 2007 XSLX file format.

Extract Sheet1 with column names determined directly from header line of file.

Special character & automatically converted

Module parameters





























Parameter Name Parameter Value
converterClass com.equalize.xpi.af.modules.excel.Excel2XMLTransformer
sheetName Sheet1
processFieldNames fromFile
documentName MT_Order
documentNamespace urn:equalize:com

Result











Input
Output

Scenario 2

Excel binary XLS file format.

Extract sheet at index 0. Column names are provided from configuration.

Row offset provided to skip first two lines.

No formatting of cells, so raw values displayed.

Module parameters













































Parameter Name Parameter Value
converterClass com.equalize.xpi.af.modules.excel.Excel2XMLTransformer
sheetIndex 0
processFieldNames fromConfiguration
fieldNames Order,Date,Material,Quantity
rowOffset 2
recordName Line
documentName MT_CustomOrder
documentNamespace urn:equalize:com
formatting raw

Result











Input
Output

Scenario 3

Excel 2007 XSLX file format.

Extract sheet at index 0.

Column names are not available. Number of columns = 5.

Row offset provided to skip first line.

Empty rows are included.

Cells with formula are displayed with formula instead of result.

Empty cells are displayed with default value "space".

Module parameters





















































Parameter Name Parameter Value
converterClass com.equalize.xpi.af.modules.excel.Excel2XMLTransformer
sheetIndex 0
processFieldNames notAvailable
columnCount 5
rowOffset 1
documentName MT_CustomOrder
documentNamespace urn:equalize:com
skipEmptyRows N
evaluateFormulas N
emptyCellOutput defaultValue
emptyCellDefaultValue space

Result











Input
Output

Reference

Part 2 - ExcelTransformBean Part 2: Convert simple XML to various Excel formats easily

This article does not cover the steps for creating a custom adapter module. This can be found easily via SCN search. It is also listed in the reference section of the article below regarding adapter module testing.

Standalone testing of Adapter Module in NWDS

Alternative methods for Excel conversion

PI/XI: Reading MS Excel's XLSX and XLSM files with standard PI modules - easily...

Excel Files - How to handle them in SAP XI/PI (The Alternatives)

A Simple approach in Reading Excel File

53 Comments
Labels in this area