Thursday, July 30, 2009

XML with Scala and Java

The web application I build at work stores user permissions in XML. Each component of the application can be assigned a set of permissions like view, create, update and delete. Our documentation includes an Excel spreadsheet version of the same data.

The spreadsheet looks like this:

And the corresponding XML looks like this:

I discovered that my co-workers were updating this spreadsheet manually whenever the XML changed and at the end of each software release. So I wrote a simple utility in Java to convert the XML into the spreadsheet format. I used SAX to process the XML and wrote the results to a tab-delimited text file that could be copied and pasted into a spreadsheet. You can see the Java code here.

Then I finished reading Programming in Scala and wanted to write some Scala code. So I translated my Java utility into Scala, and you can see the Scala code here. I couldn't find any examples that used the scala.xml.pull package, so I had to figure it out using the API documentation. One segment of the Scala code is below:

def getPermissions(file: File, results: ListBuffer[String]) {
val er = new XMLEventReader()
er.initialize(io.Source.fromFile(file))
val sb = new StringBuilder
var atEnd: Boolean = false
while(!atEnd) {
var next = er.next
next match {
case EvElemStart(_, "Resource", _, _) => {
sb.append(getAttributeValue(next, "resourceName", "", "\t"))
sb.append(getAttributeValue(next, "description", "", "\t"))
}
case EvElemStart(_, "Permission", _, _) => {
if (!sb.isEmpty) {
sb.append(getAttributeValue(next, "permissionName", if (sb.endsWith("\t")) "" else ", ", ""))
}
}
case EvElemEnd(_, "Resource") => {
results += sb.toString
sb.clear
}
case EvElemEnd(_, "Application") => {
atEnd = true
er.stop
}
case _ =>
}
}
}

I learned the following about Scala's pull parser:
  • If you don't call XMLEventReader.stop when you are finished parsing a file, then the thread stays alive and your application never exits.
  • XMLEventReader.hasNext always returns true (in version 2.7.5.final), so I couldn't use it for the while() loop above. Instead, I had to create the atEnd Boolean variable and look for the ending XML element.
  • It's ten times slower than using SAX in Java.
On this last point, both versions write timing information to the console.

Scala:

parsing app1.xml took 422 milliseconds.
parsing app2.xml took 156 milliseconds.
parsing app3.xml took 68 milliseconds.
parsing app4.xml took 203 milliseconds.
writeResults took 8 milliseconds.
Completed in 888 milliseconds.

Java:

parsing app1.xml took 68 milliseconds.
parsing app2.xml took 14 milliseconds.
parsing app3.xml took 5 milliseconds.
parsing app4.xml took 14 milliseconds.
writeResults took 5 milliseconds.
Completed in 127 milliseconds.