Wednesday, April 10, 2013

generate very large excel spreadsheets using apache poi and jdbc

In our last project we had a requirement to generate an excel report. Looks like a straight forward requirement, right? Even though the requirement looks simple on paper, we ran into quite a few issues. The number of records in the excel sheet varied between 1L and 6L. We had a logic in place to put these records across sheets with each sheet containing not more than 60k records. But we faced a few issues while generating the excel.

- The system ran out of heap space during excel generation and we ran into OOM errors several times.
- We couldn't paginate the query that we used to pull the data as it was generated dynamically. We had to run the query all at once and pull the data.

Couple of observations:
- setFetchSize - to pull more records from the DB.
The default number of rows that Oracle returns when you query the DB is 10. So, in our case, with a dataset containing well over 2L rows, it meant 20,000 DB hits. We were able to gain a significant improvement in performance by overriding the setFetchSize(int n) before running the query. In our case, we'd set the fetch size to 2000. 
- Using SXSSF implementation instead of the usual XSSF/HSSF implementations.
When we are dealing with excel sheets containing large data sets it's always better to use Apache POI's SXSSF Implementation so that we don't run into OOMs.

Do let me know if there are any better ways to deal with such requirements.

Happy coding.
~ cheers.!

No comments:

Post a Comment