on 11-18-2014 5:43 PM
I'm running into an issue with the Linux Excel adapter in which some .xls files containing blank lines at the end throw a null pointer exception. I'm processing a directory containing about 40 .xls files and three of them contain empty lines at the end. Of those three, one of them throws the null pointer exception pasted below. I've tried running this with both the stock apache poi jar files as well as using the latest stable release from apache with the same results.
I guess I'm hoping someone can help me with one of three things. Is there a direct fix for this issue? Is there a workaround (some sort of pre-processor/scrubber)? Or, is there a way to be notified which file out of the 40 are failing so that it can be manually remediated (I found the bad file by process of elimination this time)?
Log files, with all trace messages enbaled, are attached, but here is a paste of the error:
RUN-058107: |Data flow DF_Project_Budget_Init|Reader Project_Budget
Error reading from <Project_Budget>: <java.lang.NullPointerException: while trying to invoke the method
org.apache.poi.ss.usermodel.Row.getFirstCellNum() of a null object loaded from local variable 'lastRow'
at com.acta.adapter.msexceladapter.MSExcelAdapterReadTable.getTotalRows(MSExcelAdapterReadTable.java:1324)
at com.acta.adapter.msexceladapter.MSExcelAdapterReadTable.ReadAllRows(MSExcelAdapterReadTable.java:1096)
at com.acta.adapter.msexceladapter.MSExcelAdapterReadTable.readNext(MSExcelAdapterReadTable.java:1232)
at com.acta.adapter.sdk.StreamListener.handleBrokerMessage(StreamListener.java:178)
at com.acta.brokerclient.BrokerClient.handleMessage(BrokerClient.java:406)
at com.acta.brokerclient.BrokerClient.access$100(BrokerClient.java:53)
at com.acta.brokerclient.BrokerClient$MessageHandler.run(BrokerClient.java:1559)
at com.acta.brokerclient.ThreadPool$PoolThread.run(ThreadPool.java:100)
And here are screenshots of the source config:
I finally heard back from SAP support on this one and they've informed me that its a bug and is planned to be fixed in DS 4.2 Sp4 Patch 1 and DS 4.2 Sp3 Patch 4.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Santosh,
Thanks for posting that. While not an exact match (my offending rows are near the end of the file) it does have the same solution I could use to fix this on an individual basis. I can get the bad files to work if I delete the offending rows. The problem with that solution is across our environment there are hundreds of user generated excel files being imported and there is no feasible way for anyone to manually correct every bad file. I'd need some sort of automated scrubbing process or a way to get the linux excel adapter to ignore those blank rows near the end of the file.
Nikolas Horvat,
1. Did you try with single excel file?
2. Did you try with .xlsx file instead of .xls file?
Hope this will help to figure out the issue.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Sorry for the long delay in the reply.
1. Yes, it still errors when just running the single excel file. I've since narrowed it down further. The offending file has 10 blank rows at the end. Of those 10, there are 3 rows that seem to cause the job to crash. Those three rows appear blank, so I'm not sure what is causing the issue.
2. I've tried saving the file as an .xlsx file and I still get the same error.
If there is any additional data that I can supply that would help in troubleshooting, please let me know.
Arun,
I took your .xlsx suggestion a bit farther and unzipped the file to look at the base XML for the worksheet. here is an excerpt covering the blank lines at the end of the file:
<row r="282" spans="1:69" x14ac:dyDescent="0.2">
<c r="A282" s="4"/>
<c r="B282" s="4"/>
<c r="C282" s="8"/>
</row>
<row r="283" spans="1:69" x14ac:dyDescent="0.2">
<c r="H283" s="13"/>
</row>
<row r="286" spans="1:69" x14ac:dyDescent="0.2">
<c r="C286"/>
</row>
<row r="287" spans="1:69" x14ac:dyDescent="0.2">
<c r="C287"/>
</row>
<row r="288" spans="1:69" x14ac:dyDescent="0.2">
<c r="C288"/>
</row>
<row r="289" spans="3:3" x14ac:dyDescent="0.2">
<c r="C289"/>
</row>
<row r="290" spans="3:3" x14ac:dyDescent="0.2">
<c r="C290"/>
</row>
As you can see, the row numbers skip from 283 to 286. If I put data ( a space) in 284 and 285 in the spreadsheet the process completes successfully. It appears the Linux excel adapter is choking on those missing rows. There is no way that I could manually intervene for all of these files on a regular basis as there are 40 excel files in this process alone and many more in other processes. Does anyone have a though on a way to get around this? Is it possible to get the adapter to handle these rows more gracefully? Is there a way to preprocess the files to clean them up? I'm at a loss. Any help would be appreciated. I've opened a ticket with SAP on this issue, but I haven't heard back from them in days.
Hi,
This happens very often when you use excel.I had this issue but number of source files were very less, so i manually corrected it. Through DI we could catch the error and exceptions but not fix the error, i think so.
Arun
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
83 | |
24 | |
12 | |
9 | |
7 | |
6 | |
5 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.