Thursday, July 30, 2009

XML with Scala and Java

The web application I build at work stores user permissions in XML. Each component of the application can be assigned a set of permissions like view, create, update and delete. Our documentation includes an Excel spreadsheet version of the same data.

The spreadsheet looks like this:

And the corresponding XML looks like this:

I discovered that my co-workers were updating this spreadsheet manually whenever the XML changed and at the end of each software release. So I wrote a simple utility in Java to convert the XML into the spreadsheet format. I used SAX to process the XML and wrote the results to a tab-delimited text file that could be copied and pasted into a spreadsheet. You can see the Java code here.

Then I finished reading Programming in Scala and wanted to write some Scala code. So I translated my Java utility into Scala, and you can see the Scala code here. I couldn't find any examples that used the scala.xml.pull package, so I had to figure it out using the API documentation. One segment of the Scala code is below:

def getPermissions(file: File, results: ListBuffer[String]) {
val er = new XMLEventReader()
val sb = new StringBuilder
var atEnd: Boolean = false
while(!atEnd) {
var next =
next match {
case EvElemStart(_, "Resource", _, _) => {
sb.append(getAttributeValue(next, "resourceName", "", "\t"))
sb.append(getAttributeValue(next, "description", "", "\t"))
case EvElemStart(_, "Permission", _, _) => {
if (!sb.isEmpty) {
sb.append(getAttributeValue(next, "permissionName", if (sb.endsWith("\t")) "" else ", ", ""))
case EvElemEnd(_, "Resource") => {
results += sb.toString
case EvElemEnd(_, "Application") => {
atEnd = true
case _ =>

I learned the following about Scala's pull parser:
  • If you don't call XMLEventReader.stop when you are finished parsing a file, then the thread stays alive and your application never exits.
  • XMLEventReader.hasNext always returns true (in version, so I couldn't use it for the while() loop above. Instead, I had to create the atEnd Boolean variable and look for the ending XML element.
  • It's ten times slower than using SAX in Java.
On this last point, both versions write timing information to the console.


parsing app1.xml took 422 milliseconds.
parsing app2.xml took 156 milliseconds.
parsing app3.xml took 68 milliseconds.
parsing app4.xml took 203 milliseconds.
writeResults took 8 milliseconds.
Completed in 888 milliseconds.


parsing app1.xml took 68 milliseconds.
parsing app2.xml took 14 milliseconds.
parsing app3.xml took 5 milliseconds.
parsing app4.xml took 14 milliseconds.
writeResults took 5 milliseconds.
Completed in 127 milliseconds.