About
XML Scala libraries are kind of neglected. That stands in stark contrast to JSON for which Scala has dozens of projects.
Of course, JSON is much more popular while XML is regarded a legacy standard. Still, there are many situations in which
you need to work with XML. xml-optics
is an attempt to make this experience as painless as possible.
Main focus of xml-optics
is traversing and transforming XML trees - something scala-xml
is not
really good at. To make those operations easy to express the concept of Optics
has been used.
Monocle has been used as an implementation of Optics
.
xml-lens
is available for both Scala 2.11 and 2.12. It’s cross published for both JVM and Scala.js.
How to read this document
If you’re interested what is xml-lens and when it may be useful to you go ahead with reading current page.
If you’re already convinced that you want to use it start by reading quickstart.
Goals and non-goals of the project
Goals of the project:
- provide optics API for transforming XML. It should be stressed out here: while
scala-xml
provides API sufficient for many operations it’s designed mostly withread-transform-write
scenario in mind. - provide easy to use API built on top of optics API
- reasonable performance (should not be much slower than e.g.
scala-xml
) - scala.js support (differences between platform should be limited as far as it is possible)
- good documentation
- serve as experimentation field of what can be achieved with optics in regards of XML transformations. There is nothing fancy in current implementation but there are many interesting direction for future development.
Non-goals of the project:
- provide state of the art performance
- be DTD-aware
- be XSD Schema-aware
Motivational example
Let’s assume that you need to transform text node a/interesting/special
in the following XML to be
formatted with upper case.
val motivationalExample =
"""<?xml version="1.0" encoding="UTF-8"?>
|<a>
| <boring>
| <special>text</special>
| </boring>
| <interesting>
| <special>text</special>
| <boring>text</boring>
| </interesting>
| <special>text</special>
|</a>""".stripMargin
Instead of presenting you reference implementation in scala-xml
first, we will start by taking
a look at solution with xml-lens
. Such order will make it easier to understand the problem.
It will be easier to start with solution to motivational problem written with xml-lens
.
With xml-lens
you can solve motivational problem with:
import pl.msitko.xml.parsing.XmlParser
import pl.msitko.xml.printing.XmlPrinter
import pl.msitko.xml.dsl._
val parsed = XmlParser.parse(motivationalExample).right.get
val modify = (root \ "interesting" \ "special").hasTextOnly.modify(_.toUpperCase)
val res = modify(parsed)
XmlPrinter.print(res)
The code is straightforward and declarative.
Actually it’s quite difficult to write the same logic with scala-xml
. There is scala.xml.transform
API but it focuses on global transformation and here we want to transform just specific special
node.
You can access that specific node with "interesting" \ "special"
syntax in scala-xml
too but as soon
as you do that you “lost” the rest of the document - in our case we want to return the whole XML document
with remaining nodes untouched.
Therefore you end up manipulating AST directly which is very cumbersome and error-prone:
import java.io.StringWriter
import scala.xml.{Elem, Text, XML}
def transform(el: Elem): Elem = {
if(el.child.size == 1) {
val replaceWith = el.child.head match {
case t: Text =>
Text(t.text.toUpperCase)
case a => a
}
el.copy(child = List(replaceWith))
} else {
el
}
}
val xml = XML.loadString(motivationalExample)
val transformed = xml.map {
case el: Elem if el.label == "a" =>
el.copy(child = el.child.flatMap {
case el: Elem if el.label == "interesting" =>
el.copy(child = el.child.flatMap {
case el: Elem if el.label == "special" =>
transform(el)
case a => a
})
case a => a
})
case a => a
}
val writer = new StringWriter
XML.write(writer, transformed.head, "UTF-8", true, null)
writer.toString
As you can see it is really convoluted way of transforming XML. Of course, that code has a lot of
repetition and can probably be written more concisely with additional custom abstractions but the
point here is that scala-xml
does not provide anything outside of the box.
Peculiarities
Ignoring comments and processing instructions following root element
According to specs XML document is:
document ::= prolog element Misc*
But for sake of simplicity xml-lens defines it rather as:
document ::= prolog element
That means that comments and processing instructions that are placed after the root element cannot be
expressed using xml-lens
AST. Mind that it does not apply to comments and processing instructions which
are placed outside of root element but before it. Those items can be expressed in terms of xml-lens
AST
as part of Prolog
.
License
All code is available to you under the MIT license, available here.
Acknowledgements
Many thanks to scalac that funded early development of xml-lens
.