Parsing

You need to have xml-lens-io included.

For JVM platform xml-lens-io uses javax.xml.stream.XMLStreamReader. To include it in your build add the following to your build.sbt:

libraryDependencies += "pl.msitko" %% "xml-lens-io" % xmlLensVersion

For JS platform slightly modified version of sax-js is used underneath. To include it in your build add the following to your build.sbt:

libraryDependencies += "pl.msitko" %%% "xml-lens-io" % xmlLensVersion

After you included io module to your project parsing XML boils down to:

import pl.msitko.xml.parsing.XmlParser
// import pl.msitko.xml.parsing.XmlParser

val input = "<a><b>this is xml</b></a>"
// input: String = <a><b>this is xml</b></a>

XmlParser.parse(input)
// res0: Either[pl.msitko.xml.parsing.ParsingException,pl.msitko.xml.entities.XmlDocument] = Right(XmlDocument(Prolog(None,Vector(),None),LabeledElement(ResolvedName(,,a),Element(Vector(),List(LabeledElement(ResolvedName(,,b),Element(Vector(),List(Text(this is xml)),Vector()))),Vector()))))

Differences between JVM and JS

Parsing entity references

On the JVM for the following input:

val input =
    """<?xml version="1.0" encoding="UTF-8"?>
      |<!DOCTYPE html
      |    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
      |    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
      |[
      |    <!ENTITY test-entity "This <em>is</em> an entity.">
      |]><html><body><p>abc &test-entity; def</p></body></html>""".stripMargin

Element p will have 3 children:

(Text("abc "), EntityReference("test-entity", "This <em>is</em> an entity."), Text(" def"))

With scala-js for the same input p will also have 3 children but the content of the second child differs:

(Text("abc "), EntityReference("test-entity", ""), Text(" def"))

As you can see with scala-js EntityReference’s second field (namely replacement) is not being filled. That’s due to the fact that JS parser does not read entities declarations.

This behavior can be configured further on JVM. Read more about configuring this behavior at parsing configuration.

Parsing configuration

At the moment only JVM parser is configurable. Configuration is done by passing implicit parameter of type ParserConfig to XmlParser.parse method. If no configuration is accessible in scope ParserConfig.Default is used.

replaceEntityReferences

As of now the only ParserConfig has only one property - replaceEntityReferences. It controls how entity references are parsed. The default value is false. What result is expected in that case was described in Parsing entity references. Here we focus on replaceEntityReferences = true case.

val input =
    """<?xml version="1.0" encoding="UTF-8"?>
      |<!DOCTYPE html
      |    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
      |    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
      |[
      |    <!ENTITY test-entity "This <em>is</em> an entity.">
      |]><html><body><p>abc &test-entity; def</p></body></html>""".stripMargin
      
import pl.msitko.xml.parsing.ParserConfig

implicit val cfg = ParserConfig.Default.copy(replaceEntityReferences = true)

XmlParser.parse(input)

When parsed, element p will have just one child:

Text("abc This <em>is</em> an entity. def")