So I've been putting together some code examples for RubyCodeCamp in Columbus in a couple weeks. Quite awhile back I ran into a situation where I wanted to read an Excel spreadsheet from Ruby. At the time, there was no good way to do it in Ruby (MRI) and so I used Java and the nifty POI library. I thought this might be make for a nice example in JRuby. So I fired up jirb and started coding. In a few minutes I was reading data from a spreadsheet:
require 'java'
require '/home/ccnelson/java/poi/poi-2.5.1-final-20040804.jar'
include_class 'org.apache.poi.poifs.filesystem.POIFSFileSystem'
include_class 'org.apache.poi.hssf.usermodel.HSSFWorkbook'
include_class 'org.jruby.util.IOInputStream'
File.open("/home/ccnelson/documents/Invoice15.xls") do |f|
workbook = HSSFWorkbook.new(IOInputStream.new(f))
puts "cell 0,0: #{workbook.getSheetAt(0).getRow(0).getCell(0).getStringCellValue}"
end
There's some nice stuff going on. I really like being able to just include a jar and start using it. Still, using the POI API in Ruby looks rather icky -- (actually, it started even worse than this, but Charles Nutter was on #jruby and was kind enough to tell me about IOInputStream). After looking at this code for a minute I started thinking and it came to me: "Hey this is Ruby. We can do anything we want!"
I started with that ugly getStringCellValue at the end. It would be nice if we could give cell a to_s and do away with that. Well, guess what, since this is Ruby, I can just open HSSFCell and do it:
require 'java'
require '/home/ccnelson/java/poi/poi-2.5.1-final-20040804.jar'
include_class 'org.apache.poi.poifs.filesystem.POIFSFileSystem'
include_class 'org.apache.poi.hssf.usermodel.HSSFWorkbook'
include_class 'org.jruby.util.IOInputStream'
include_class 'org.apache.poi.hssf.usermodel.HSSFCell'
class HSSFCell
def to_s
getStringCellValue
end
end
File.open("/home/ccnelson/documents/Invoice15.xls") do |f|
workbook = HSSFWorkbook.new(IOInputStream.new(f))
puts "cell 0,0: #{workbook.getSheetAt(0).getRow(0).getCell(0)}"
end
Cool! Even though HSSFCell is a java class, because I am using it from JRuby I can open it up and stick methods on it. This got me excited. What do I really want this API to look like? I think I'd really rather be able to access my spreadsheet cells using a simple 2 dimenional array. I mean, that's really what they are for my purposes. And guess what? In Ruby, the [] operator, like everything else in Ruby, is just a method. And if it's just a method, well I can abuse... err.. bend it to my own nefarious purposes. Bwah ha ha hah ha hah ha ha..
Oops, got a little carried away there. Ruby coding will do that sometimes. Back to the task at hand:
require 'java'
require '/home/ccnelson/java/poi/poi-2.5.1-final-20040804.jar'
include_class 'org.apache.poi.poifs.filesystem.POIFSFileSystem'
include_class 'org.apache.poi.hssf.usermodel.HSSFWorkbook'
include_class 'org.jruby.util.IOInputStream'
include_class 'org.apache.poi.hssf.usermodel.HSSFCell'
include_class 'org.apache.poi.hssf.usermodel.HSSFRow'
include_class 'org.apache.poi.hssf.usermodel.HSSFSheet'
class HSSFCell
def to_s
getStringCellValue
end
end
class HSSFSheet
def [] (index)
getRow index
end
end
class HSSFRow
def [] (index)
getCell index
end
end
File.open("/home/ccnelson/documents/Invoice15.xls") do |f|
workbook = HSSFWorkbook.new(IOInputStream.new(f))
spreadsheet = workbook.getSheetAt(0)
puts "cell 0,0: #{spreadsheet[0][0]}"
end
Nice! This for me is one of the killer features of JRuby. A lot of scripting languages will let you access the Java APIs in a more light weight language. But because of the features of the Ruby language, it becomes so easy to modify APIs to suit your needs, desires, and insane urges.