Lately I have been building a contact manager in coldfusion 8 for a customer with hundreds of thousands of records. They needed to be able to import, search, sort & export their contacts from excel .xls files. Using Ben Nadel's POI utility is awesome for excel files with hundreds of rows. When you start to deal with more than that it becomes very troublesome, lots of timeout errors, 500 jRun memory stack issues etc. I need to add a part 2 of this post for the Import and how I got around those problems.
If you are able to, use MsSQL DTS packages instead to do the import/export (much more efficient). In a shared environment this is impossible, we have to do it asynchronously (*challenge accepted!*).
First of all I tried to use a cfcontentvariable to save the string of outputted text to write the file using cffile. Even with 1000 records this process ran too long and failed. After some googling I found Raymond Camden's blog about writing large amounts of data to a file which added a nice solution (Read It NOW). In this article he compares and explains several different ways of writing data. I chose to use the method #2 fileOpen, because its really FAST!! (well alot more than cffile).
Ill attach the example in full, but here are some excerpts that show the key pieces.
On the first run, the export creates the initial file, adds the header html table that is used to write the .XLS file.
<cfset filepath=ExpandPath( "tmp/Contacts Export.xls" )/> <cfset contentLine=""/> <cfif isFirstRun eq true > <!--- write the header and inital X records to the xls file ---> <!---ensure that on the first run the file is Overwritten---> <cfif FileExists(filepath)> <cfset FileDelete(filepath)/> </cfif> <!---open the file object---> <cfset fileObj = fileOpen(filepath,"write")/> <!---create the xls header---> <cfset contentHeader ='<table border="1"><thead align="center"><th>FirstName</th><th>LastName</th><th>email</th><th>City</th><th>State</th><th>categoryList</th><th>entity</th><th>phone</th><th>type</th><th>resource</th></thead>'> <cfset fileWriteLine(fileObj, contentHeader)> <cfset fileClose(fileObj)> </cfif>
After the Header is written, 10,000 records are then appended as a string and then written to the file. Once this process is completed AND there are more records to process, the page is automatically refreshed either by an html button or javascript window.location. Each iteration appends 10K records to the excel file. Depending upon the server, it may be able to handle 20K or more.
<!---re-open up the file for writing the rest of the contacts---> <cfset fileObj = fileOpen(filepath,"append")/> <!---create the string of data from the query---> <cfloop query="qcontacts" startrow="#url.recordsProcessed#" endrow="#maxRows#" > <cfset contentLine &= '<tr align="left"><td>#FirstName#</td><td>#LastName#</td><td>#email#</td></tr>'/> </cfloop> <!---write this bulk entry into the xls file---> <cfset fileWriteLine(fileObj, contentLine)> <cfset fileClose(fileObj)>
This process works pretty well for hundreds of thousands of records, but not millions. It takes about 3-4 minutes to write 200K records to an excel sheet. Not ideal and not the most elegant, but it works. If you have improvements Please Feel Free to Share!!
Full Source example
*note this is a hacked up version from the project; it has not been tested.
1 comment:
Please share your code.
anujguptasre@gmail.com
Post a Comment